A Neural Network is a function, we input x and get an transformed value of it as output say y, this input and output might be complex vectors represented in n dimensions
Let’s understand with an example, let’s say we want to build housing price predictor and we’ve a dataset which has size(ft^2) to price correlation data, and when we plot it and learn from it and predict the price of the house which is given that is not present in the dataset, suppose the plot forms a straight line(linear function), and as discussed earlier, neural networks is just a function
now the function to predict a house price will be y=mx+b and the size will be x and the output will be y, this is not a neural network but it’s a simplest form of a neural network
Now let’s add more neurons, means more dependencies, means the price rather only depending on the size, it also depends on no.of bedrooms, zip code, average neighborhood income and the formula will be another set of neurons like family size, commute, school quality,… and it outputs a price
It will be like this
flowchart LR Intermediate features with formulas family["family size<br/><span style='color:red'>y = mx + b</span>"] commute["commute<br/><span style='color:red'>y = mx + b</span>"] school["school quality<br/><span style='color:red'>y = mx + b</span>"] Connections size --> family bedrooms --> family zip --> commute zip --> school income --> school family --> price commute --> price school --> price
The above one is sparse neural network, as all the nodes in the input not connected to all the nodes in the function/hidden layer, but if they’re connected they’re called dense/fully connected neural network
Also there is only one hidden layer in between so we call it Shallow neural network, if there are more than one hidden/function layer we call it Deep Neural Network
Now see real fully connected neural network, with multiple hidden/function layers. here the functions are mix of weights and bias of the neural network and these are called parameters of the network, here is where the knowledge of the neural network leaves
flowchart LR Input Layer ===================== ===================== subgraph H1["Hidden layer 1"] h1["h₁<br/>z₁ = W₁·x + b₁"] h2["h₂<br/>z₂ = W₂·x + b₁"] h3["h₃<br/>z₃ = W₃·x + b₁"] h4["h₄<br/>z₄ = W₄·x + b₁"] end Hidden Layer 2 (3 neurons) ===================== ===================== subgraph Y["Output layer"] y["y<br/>y = W₈·g + b₃"] end Fully Connected: Input → Hidden 1 ===================== ===================== h1 --> g1 h1 --> g2 h1 --> g3 h2 --> g1 h2 --> g2 h2 --> g3 h3 --> g1 h3 --> g2 h3 --> g3 h4 --> g1 h4 --> g2 h4 --> g3 Fully Connected: Hidden 2 → Output %% ===================== g1 --> y g2 --> y g3 --> y
Here is one catch that we’re missing, there is chance to generate negative housing prices, that’s why we need a guardrail in place, a function that will rectify the error, we call it ReLU(Rectifying Linear unit) which restricts output from the node to be negative and this will looks like this ReLU(w_i * x_i + b_i), here we apply linear function at first and non-linear later, this is called Activation functions. here we apply a linear function to the neuron at the start and this hands to the value to the non-linear(ReLU) activation function this is called as neuron activating or neuron firing and this makes the model to think and give better output for more complex problems and having this non-linear functions as activation functions is called Logistic Regression
For understanding this better, let’s take a another example of logistic regression, let’s say we want to classify given image is of cat or not, so for this we take the input image, strip of the RGB colors in it and transform them into array of values, pass it to neurons with some weight and bias and predict the output, but here the output will be some random number in space, and logistic regression should only output in between 0 and 1. for this we apply sigmoid activation function on the hidden layers, this condense the output to be in between 0 and 1, through this we can classify the given image is cat or not by seeing how near the output is to each extreme
During training, we take input vectors and pass them through the networks with same or different activation function per layer or sometimes even per neuron this processing layer by layer is called forward propagation and at last we get an output y and this is compared with ground truth and get the accuracy with loss function and wrt this we let the model learn “how far of it’s from the actual output” and we propagate back from output to input layer by updating the weights and bias of the network to minimize the loss function, this is network learning it’s mistakes, and this process is called back propagation
And the algorithm neural network uses to update it’s weights and bias is called as gradient descent, and once the network learn multiple times, and we’re satisfied with output and loss function, we deploy in real-world model only does forward propagation in real world