Decoding the magic of ANNs

1751022147 Malaya Rout
Share the Reality


I have come across various ways of defining Artificial Neural Networks (ANNs). Many of them miss a fundamental characteristic of theirs. An ANN is a machine learning model. Like all machine learning models, an ANN algorithm determines the relationship between the inputs (a bunch of independent variables) and the output (one dependent variable) when provided with an input dataset containing historical examples of inputs and the corresponding outputs. 

ANNs can be used for classification and regression tasks. A classification task predicts the “class” of the dependent variable. A regression task predicts the “value” of the dependent variable. Another way of looking at it is that the dependent variable in a classification task is categorical by nature, while it is continuous by nature in a regression task. It is easy to infer that the number of neurons in the output layer is equal to the number of classes of the dependent variable for a classification task. In most cases, there is just one neuron in the output layer for a regression task. An LLM is a multiclass classification machine learning model. The number of neurons in the output layer is equal to the number of distinct tokens in the English language.

ANNs have seven key components. First, they have neurons, which are the basic computational units that produce outputs from inputs. Second, there is an input layer, which receives the raw data and where each neuron corresponds to one independent variable (called a feature). Third, there are hidden layers. The primary computation takes place at these layers, and this is what we fondly say as the learning by machine. Fourth, the output layer produces the prediction. 

Fifth, weights are the parameters that get adjusted during the learning process. They denote the connection strength between neurons. Remember, when we say an LLM has seven billion parameters, we mean there are seven billion connections or weights. Sixth, activation functions introduce non-linearity into the network, enabling it to model complex relationships between the inputs and outputs. Seventh, biases are additional parameters that allow neurons to make predictions even when all inputs are zero. 

In an ANN, the network transforms input data by applying non-linear functions to weighted sums of inputs. The output from one neuron is transmitted to other neurons through these weighted connections. The job of the algorithm is to determine the values of the set of weights that describe the relationship between the input to the input layer and the output from the output layer in the best way possible. That is nothing but what we call the learning process. In conversational terms, we say the model gained experience from the historical examples.

Various network topologies exist to address specific computational needs. Information flows in one direction from input to output without feedback loops in feedforward networks. Recurrent Neural Networks (RNNs) are networks with memory or feedback loops that can process sequential data. Convolutional Neural Networks (CNNs) process grid-like data such as images. 

Before I forget, there are a few more things to highlight. Deep Learning is a subset of ANNs. When there is only one hidden layer (or a total of three layers, including the input and output layers), it is an ANN that is not a deep learning model. When the number of hidden layers is more than one, it is called a deep learning ANN architecture. When every neuron in one layer connects to every neuron in the subsequent layer, it is called a fully connected or dense neural network. Otherwise, it is called a partially connected layer. 

Have fun exploring artificial neural networks. Have a happy Diwali.



Linkedin


Disclaimer

Views expressed above are the author’s own.



END OF ARTICLE





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *