Neural network (Artificial Neural Network, ANN, or NN) is very popular and powerful method, which is used for modelling the relationship between multivariate input variable x a multivariate output variable y. NN is generally considered to be non-linear regression model, which can make the network structure. The inspiration for the neural network structure of brain tissue was higher organisms, which neuron is connected with the so-called synapses to several other neurons. Electrical current (or information signal) flows through synapses, is processed by a neuron and transmitted by other synapses to other neurons.
Neural network - Pdf manual
Biological neuron: cell body, input dentrites, output axon, synapses-connection to other neurons
The artificial neural network tries to copy the structure and functionality of the biological neural structure and models the structure mathematically. Neuron core is represented In ANN the nodes are, by analogy, called neurons, each input variable xi entering the j-th neuronu multiplied by a weighting factor of wji. The sum of the weighted input variables zj = w0j + Σwjixi is then transformed by neuron using an activation function. Activation function expresses the intensity of the neuron response to the input change. The most commonly used activation functions include logistic functions,
σj (z) = 1 / (1 + e–z),
which is similar to the biological function of sensory response, for example: there is practically no difference if you tauch temperatures 50K or 150K (both are too cold) or temperatures 2000K of 4000K (both too hot). But you will very precisely distinguish between 90 and 100°F (35 and 40°C), because here is the vital information. Weights wji represent the intensity of information flow between the variable and neuron or, in the case of multi-layer networks between neurons in layers, these links are sometimes called synapses by analogy to the bio-neurons and can be interpreted as significance of variables and visualized in a plot.
Structure of an artificial neuron
|
|
Possible architecture of an ANN with 1 hidden neuron layer
|
Commonly used activation function of a neuron σ(z) |
Output variables are predicted as weighted linear combination of outputs from the last hidden layer neurons. Neural network is therefore formally a special case of multiple nonlinear regression, neural network can be practically considered non-parametric regression. If the neural network did not contain any hidden layer neurons – only input and output variables, it would be a linear regression model. Neural network is optimized to satisfy least residual squares criterion. This means that the network is set so that the squares of the differences between prediction and the measured output variables value was minimal. This is the aim of iterative optimization process, which is called learning or training neural networks by finding the best values of all weights. QCExpert uses an adaptive derivative Gauss-Newton algorithms to optimize the net. Trained network can then be used for prediction of output variables, for new specified input variables. Neural network model is local, that means that its prediction ability is sharply declining outside the range of the independent variables used to train it.
Prediction capability of an ANN drops dramatically in the areas where no training data are available
A typical procedure for using neural network may be as follows.
-
Select group of predictors (independent variables, X) which we believe that may affect the dependent variables Y. Select a group of dependent variables, which should depend on the predictors. In each line must always be all values of dependent, and independent variable. Number of rows is denoted N.
-
Select the architecture of the neural network, the number of layers and numbers of neurons in each layer. There is no straightforward rule for the best network architecture, usually it is appropriate to use a number of neurons very roughly corresponding to the number of variables. Single hidden layer networks are recommended where we assume a linear, or slightly non-linear relationship. Two-layer network can be suitable for strongly nonlinear relationships. Using more then 3 layer networks is usually not very effective. It is necessary to keep in mind that for very complex network there is high risk of overdetermined ambiguous and unstable models or models wthich are difficult to optimize. Number of data (lines) should be at least ten times greater than the number of neurons in the network, otherwise there is a risk of overdetermination and the ability of prediction may decrease. Usual architectures for middle to large-scale problems are networks with 2 to 20 neurons, and 1 to 3 layers.
-
Optimizing parameters of the network, or the so-called "learning" neural networks. During this process, the optimization algorithm tries to find a set of weights, so that the predicted values are in the best accordance with entered dependent variables. This consistency is measured by the sum of squares, as in other least squares regression methods. In general, it can not be guaranteed that the found solution is the best possible. Therefore, it is advisable to run optimization several times to get better residual sum of squares (RSS). Optimization starts with random values of the weights, it is therefore natural that each solution found by optimizing is completely different. Even completely different combination of the weights in the network can provide virtually identical prediction model with the same RSS.
-
If an information about the reliability of prediction is required, we can use cross-validation. In this technique we select the so-called training, or learning subset of data , say P.N lines (0 < P < 1) to be used to train the network. The rest of the data, the remaining (1 – P).N lines of testing or validation data, are then used to validate network, i.e. check if the predicted values for the validation data are close to the actual dependent data.
-
The success of neural networks can be assessed according to the decrease of squares sum during the optimization process, according to fit plots of prediction and by the thickness of the lines connecting neurons (the thickness is proportional to the absolute value of the weight, which is interpreted as the intensity of information flow downward from predictors to response).
-
Prediction: A trained network may be used for predicting response variable. Put new values of the independent variables on input of the network. The structure of the variable must be the same as used to train the ANN and values should be in the same range. The network will predict the output values.
|
|
The ANN training process
|
Using trained ANN for predicting unknown response
|
Module Neural Networks has several consecutive dialog boxes, which can set the parameters of calculation. In the first dialog box, the columns of independent and dependent variables are selected.
|
|
Step 1: Select predictors and responses, optionally choose the data for prediction. Check „Display weights“ do visualize importance of predictors and predictability of response
|
Step 2: Design the ANN architecture, choose number of hidden layers and number of neurons in each layer. Optionally, specify the part P of training data (rest will be used for cross-validation
|
|
|
Optional: Define special predictor transformation
|
Optional: Modify the termination conditions
|
|
|
Step 3: Run the optimization (training) process and observe how the ANN is successful. Left: no cross-validation, Right: with cross-validation. Validation data errors in green. Afterwards, you may save the model for later use, train the net again with different initial weight set, run interactive Prediction panel or press OK to get the results.
|
Optional: Interactive prediction panel. Type new predictor values, or select a row of original data by „^“ or „v“, alter the predictor values and observe the changes in predicted response values.
Graphical output
|
Y-prediction plot. Plot of agreement between measured response and prediction for each of the response variables. The closer the points are scattered to the line the better the prediction of this variable. Quality of prediction usualy vary from variable to variable. This plot is an overall assessment of success of the ANN model. The plot on the left shows good quality of fit.
|
|
If the data do not show a clear trend (like the two plots on the left) then this response variable cannot be described well with this ANN model. This can be either due to the premature termination of the optimization before reaching the optimum weights, or too simple network that is unable to identify possible more complex dependence, or sadly (and most probably) this variable simply does not depend on the selected predictors.
|
|
Graphical representation of the network architecture. If the checkbox “Display weights” was checked, the thickness of synapses (connenction lines) represent the absolute value of the corresponding weight and thus the amount of information that flows down between two neurons. From the thickness of the synapses going from the predictors we ca assess their significance (the thicker lines the more significant variable). Greater weight values on the input to response nodes (thick lines going to the predictor nodes) suggest the quality of prediction of each dependent variable. Color of synapses shows only sign of the weight (red = negative weight, blue = positive weight), which is of little practical interest in complicated nets, but may be of use in simple ones. Variable nodes are labelled by the column names, if the apropriate checkbox was checked.
|
|
Plot of the training (network optimization) process, which decrease generally the sum of squares of differences between prediction and the actual values of dependent variable, with the number of iterations on x-axis as described above. If ths cross-validation is chosen the prediction error is ploted as well (green). According to the development of the maximum error of prediction curve we can assess the quality of the model and data.
|
|