All you need to know about: The Artificial Neural Networks (Part I)

June 18, 2020

All you need to know about: The Artificial Neural Networks (Part I)

Researchers have developed a way to label neurons when they become active, essentially providing a snapshot of their activity at a moment in time.

An artists depiction of the working of a brain

-Andrew Ng"Artificial Intelligence is the new electricity."

Since the time humans have started using computers for their daily needs, we have always wondered; what if one day computers become more intelligent than humans? Surely such things have motivated everyone including the folks at Hollywood and which is why people sometimes use examples from movies like Terminator to assert their point about why Artificial Intelligence (AI) is everything but good. Let's not dive into this discussion, rather let's talk about how simple coding and a lot of mathematics can be used to develop an AI brain! In this article we are going to look into the working of artificial neural networks (ANNs) and how awesome tasks a machine can do with this level of technology. So let's just jump into it!

The simplest network

Firstly it's very important to describe what do we mean by a neuron. A neuron is a fundamental entity that processes input, linearly or non-linearly, to give an output. Consider a simple neural network on only 2 layers; input layer and an output layer. Each of these layers contain only one neuron as shown in Figure 1. So what happens here is, the input neuron is given an input which is passed to the output neuron. The channel, or the path or the link which connects these 2 neurons is called as weight, which can take any value. The weight simply gets multiplied with the input to give a result, as shown in the following equation:

\(y = wx\)

Where \(y\) is the desired result, \(w\) is the weight and \(x\) is the input value. It's pretty easy to figure out that this is an equation of a line passing through origin in a Cartesian space. To further generalise, we add a bias term so that it becomes easy to manipulate the intercepts of this line in the entire Cartesian space. Therefore our equation becomes:

\(y=wx+b\)

Now the final ingredient that remains is the activation function. The output \(y\) we get, must be processed in such a way that it gives 1 or 0 (in case of binary classification task) or any other value with respect to a threshold value. There are many activation functions that can be employed (like linear, step, tanh, etc), however here we are going to talk about the Sigmoid activation, i.e. an activation with an equation:

\(f(y) = \frac{1}{1+e^{-y}}\)

The speciality of this activation is that it gives the result as 1, whenever \(y>0\) and 0 otherwise. So we now arrive at an equation for our simplest neural network, which is:

\(f(wx+b) = \frac{1}{1+e^{-wx-b}}\)

that is, for condition \(wx+b>0\) we get the result 1 or 0 otherwise.

Figure 1: A neural network with 2 neurons (1 input and 1 output), where w is the weight, x is the input, b is the bias and f(y) is the output

Congratulations! Your very first simple neural network is done with. However life is never this easy and so are the problems we solve in real life. So we need to device a more complicated network to bring in the non-linearity, which is what we are going to do next.

Network with multiple neurons in a layer

There is absolutely no restriction on number of neurons in any given layer. Consider an example as shown in Figure 2. The two input neurons contribute to the output neuron as per their weights. The equation of the entire network can be given as:

\(f(y)=f(w_ox_o+w_1x_1+b)\)

where \(f\) is an activation function. Let's now trace the flow of this network. \(f(y)\) will be greater than 0.5 only when its argument is positive.

Figure 2: A neural network with 2 neurons in the input layer.

This means for a constant value of bias \(b\), \(w_ox_o+w_1x_1>-b\). This can simply be understood by assigning arbitrary values to the network and plotting a heat map, as shown in Figure 3.

Figure 3: A decision boundary created by the ANN as shown in Figure 2. Here yellow colour depicts the value +1 and deep blue colour signifies the value 0.

Note that when we set both the weights equal, we are instructing the network to consider both the inputs fairly. If we encounter any input neuron with very turbulent values, we just minimize the weight of that neuron so that it doesn't affect our output that wildly. Figure 5 gives a basic intuition as to how complicated our decision boundary can be made as we increase the number of neurons and layers in the network. This point also highlights the fact that number of neurons in the input layer is equal to the dimensions of the data. It is impossible to think about decision boundaries above the 3rd dimension.

Network with multiple neurons and multiple hidden layers

Now it's time to take things to the next level. So far we have studied that a neural network has just 2 layers, input and output layers. However, we can couple another layer to the output layer such that a new output layer is generated. The contents of previous output layers will now remain hidden from the user and thus it will be called as a hidden layer. We can create multiple hidden layers for a network as shown in Figure 4.

Figure 4: ANN with 1 hidden layer. Although weights connecting the input layer to the hidden layer and hidden layer to output layer are not mentioned here, these do exist. This network can be represented as (4:3:1) kind of network.

The main fundamental reason to increase the number of hidden layers is to achieve non-lineartiy in the tasks of classification, regression, prediction, etc. Following equation depicts how complicated our decision boundary turns out:

\(f(y) = f(x_0'w_0' + x_1'w_1' + x_2'w_2' +b)\)

\(f(y) = f(f(x_0w_{00}+x_1w_{10}+x_2w_{20}+x_3w_{30}+b_0)w_0' + ...)\)

Figure 5: Decision boundary created by a neural network with an ANN of (2:30:10:2) network.

Note that \(x_0'\) and \(w_0'\) are not the derivatives, these are just the conventions for the parameters in the hidden layer. Always remember, greater the number of hidden layers in a network, greater is the power of computation it can achieve. And greater the power of computation, greater is the detail with which we can simulate or process any event or problem. However this comes with a great cost which is worth a discussion in future articles.

So I hope this post was helpful to you in creating a small but strong foundation regarding the Deep Learning. But wait a minute, if we say that we are creating an ANN (an artificial brain for instance) how can we miss the most important function of the brain; the learning? This is exactly what we are going to talk about in the next part i.e. how to train our nueral network (make it learn). We will understand how giving freedom to our network to tune in the weights and biases according to the environment turns out to be a very clever step in establishing a powerful AI which will dominate the world.

Just kidding.

Search This Blog

Palmarium | Universe

All you need to know about: The Artificial Neural Networks (Part I)

Comments

Post a Comment