Abstract: This paper presents an algorithm to optimize artificial neural networks structure based on constructive method. And an improved BP algorithm, LMBP algorithm is introduced about simplest BP neural network for function approximation .By rules of error changing based on quadratic error and gradient reducing, analyzing the optimization of the network's structure and adding hidden neurons or adding network layers one by one adaptively, as a result a proper structure of the network is got. Simulation experiments are provided to compare the approach with RAN algorithm for solving function approximation. The results show the efficient of this algorithm.
Key words: BP neural network; structure optimizing; LMBP algorithm; Resource Allocation Network (RAN) algorithm
At present, the theoretical and application research of neural network have been well developed, and the neural network has been infiltrated into almost all the engineering application field. However, there is no good way to solve the configuration of neural network design systematically. The network configuration design problem is so important. If there are too many nodes in neural network, the framework is bulkiness, and the generalization ability is decreased, which will result in the difficulty in simulation. On the other hand, if there is few nodes in the network and a good model that can solve all the questions can't be formed, which causes non-convergence of the model. Usually, this type of research has many shortcomings .
Many scholars have brought out some creative research in optimizing the neural network structure, such as the Tiling algorithm, Upting algorithm, CC algorithm etc. The Tiling algorithm cannot extend to real number mapping; the Upting algorithm needs the size of the sample in advance; and the CC algorithm leads to over-simulation. Besides these, Pruning algorithm can be used, but it increases the calculation. The algorithm based on evolution has the problem of abstract coding and hard to explain. In practical, Trail and error method is applied. It is hard to set a proper network structure if sense or experience is the only consideration.
This paper presents a new way to optimize neural network using function approximation, and it is fundamentally an incremental method. Compared to Resources Allocation Network (RAN) brought by Platt, they are ameliorated BP algorithm and they assume a simplest network in advance. The differences between them are: RAN allocates new calculation node when the "un-modeling" sample is discovered and this way of serial study may allocate hidden node to "bad" samples. The algorithm presented in this paper is fundamentally a collateral algorithm. The LMBP algorithm is introduced at first; the judgment of study error is ameliorated; and the ability of the network is checked dynamically. If some generation has gone the errors cannot reach the predetermined requirements and they do not change when tracking the weight and threshold value, this means the structure of the network does not satisfy the problem and the quantity of the hidden units should increase. If it still not satisfies the problem, then the layer of the network should increase. The simulation experiment indicates that this algorithm is efficient and stable and can be popularized. The comparative experiment also indicates the network optimized by this algorithm is better than RAN does.
2. Theory and algorithm of network structure modification in the function-approximation neural network
To a 3-layer BP network, the following universal approximation theory has been proved by a lot of scholars: a 3-layer BP network which contains a hidden unit can approximate any continuous functions in a bounded area if there are sufficient hidden nodes. Chen Tianping from China points out: the boundness of neural function (not the continuity) is the adequate condition that ensures random approximation. Although universal approximation indicates that the neural network can realize random approximation with only on hidden unit, this does not mean that it is the most reasonable for a 3-layer network structure. As a matter of fact, to the same target function, it is said that a 4-layer neural network uses less nerve cell than the 3-layer network uses.
The BP algorithm is the most widely applied in neural network algorithm, but it is based on steepest descent method which leads to the inherent shortcomings: it is easy to fall into partial minimization, the convergences speed is slow and it causes surge effect. In order to get over the shortcomings mentioned above, an ameliorated BP algorithm-LMBP algorithm is used. In the optimization theory, Levenberg-Marquart algorithm is a very effective method. It lies in the middle of Newton algorithm and the steepest descent algorithm from the convergence speed and astringency's point of view. The related information of LM algorithm is listed in bibliography  and .
2.1 The amelioration to LMBP algorithm in network structure optimization.
In LMBP algorithm, the learning step size is the key to the effect of the algorithm. When increases to a certain limit, the learning error will not decrease and the weight vector will not change, it is meaningless to continuously learning, which is the exact time to adjust network structure. This changing rule makes LM algorithm useful in neural network structure optimization.
In the research, the input vector is one-dimension, the output vector is also one-dimension. The expected output , and the actual output vector is . If the target function is:
( 1 )
In this equation, is the number of sample, and is the error vector between expected output and actual output. moves towards.
The Z element is:
In the LM algorithm, the weight modification formula  can be get:
At first the defined network is 1-1, and the weight vector is:
In order to simplify the calculation, the threshold is treated like weight, and the size of the neural network embodies to the weight vector.
In the iteration process in formula 2, the step sizeis changing continuously. The traditional method is used. Let's set first, after using formula (2) the error either increases or decreases. The first situation: if the error decreases, then should be kept. This process is iterated untildecreases to 1/10. The other situation: if the error increases, then is kept, and ten times, and is calculated according to formula (2). When the later situation happens frequently, increases continuously and is not updated. By this time, the error does not decrease after iteration. If this happens to certain algebra, the neural network structure should be modified: to add new nerve cells in the hidden unit. The quantity of increased weight vector dimension depends on the quantity of nerve cells before and after the hidden unit. Suppose the quantity of the nerve cells before the hidden unit is m, and that after the hidden unit is l, then the quantity of increased weight vector should be m+l+1 (1 stands for the threshold value). At the very first, we only defined two layers and now one more layer is added, so the weight vector of diversification of network size 1-1-1 changes to:
Actually, the hidden layer and hidden nodes decide the size of the network, and the size of the network is related to its capability. So the bigger the size of neural network is, the more free parameters there will be in the network, vice versa. The number of free parameters in the network can be increased by enlarging the size of hidden nerve cells, which is propitious to approximate the target function. In the following learning process, we will re-evaluate a new weight vector and put it into formula (2). If the error could not reach the anticipation and the power could not been modified after some generations, we could modify the network scale by modifying power value vector. In one word, suppose the size of neural network is:
j=i+1, k=j+1, if we add one nerve cell in layer j, the weight vector will be:
We can set an upper limit to the quantity of hidden nerve cells, and when some hidden layer nerve cells grow to its upper limit, a new hidden layer is added and formula (3) is used only changing the subscript. When the neural network reaches a proper size and the error declines quickly, the expected requirements will satisfied soon. By this time the neural network structure is the appropriate one.
2.2 Use the ameliorated LMBP algorithm to optimize network structure
The ameliorated LMBP algorithm is used in optimizing network structure. In the research process, a 4-layer network is mostly used. The algorithm is the following:
A smallest network is provided: 1-1, namely one input and one output, there is no hidden unit;
Train the network using the ameliorated LM algorithm, export the simulated function and track the mean variance of expected output and actual output;
If the mean variance meets the requirements, the algorithm goes to end. If not, switch to step (4);
If the network structure reaches its upper limit, switch to step (5), or else changes the structure of formula (3)ï¼š
If it is a 2-layer structure, then changes to 3-layer;
If the network reaches its 3-layer upper limit, then changes to 4-layer, or else, adds nerve cells in the first hidden unit;
If it is a 4-layer network, then adds nerve cells in the second hidden unit and keeps the other structure stable;
Switch to step (2)
The algorithm ends and shows the result.
3. Simulation and analysis in the lab
In order to verify the validity of this algorithm, we carried out a simulation experiment in a MTLAB platform. In the following experiment, the learning samples are random distributed and all the test samples obey uniform distribution. The mean variance of expected output and actual output-test error substitutes the generalized error. The experiment includes three aspects:
Experiment 1 verifies the validity of the optimization algorithm. The output of time, precision and generalization capability are included. The object of the experiment is the Hermit multinomial:
In the training process, we assume the training sample and the test sample is 100, and the required precision of the algorithm is <=0.01. Table 1 and Figure 1to 3 show the result. In Figure 1, the light blue curve stands for the expected curve of the sample, while the red curve stands for the actual output curve of test sample after training. Figure 2 shows the movement of error in the network (corresponds to the glossy sects). We need to adjust the network structure until the error meets the requirements. Figure 3 is the changing curve of test error. In Table 1, the network structure 1-5-1 stands for the five nerve cells in the first hidden unit in a 3-layer neural network with no second hidden unit, and the description of the other neural network is the same. From these results, this algorithm is valid and stable.
Table 1: output results of Hermit Function using ameliorated LM algorithm
Time of experiment
Network structure after training
Figure 1 Target curve and neural network output curve after learning Figure 2 Training error curve in the learning process
Figure 3 Test error curve in the learning process
Experiment 2 compares with the problem of how a 3-layer RAN deals with the Hermit multinomial. In the MATLAB platform, we export the curve of how a 3-layer RAN deals with the approximation of Hermit multinomial. Figure 4 to Figure 6 indicate the results of RAN algorithm training while the ameliorated LMBP algorithm results are showed in Figure 1 to Figure 3. RAN algorithm has 400 training samples and 101 test samples. After learning, the hidden unit is 31 and the test error is 1.5488 and 400eras are trained.
Figure 4 the target curve and neural network output curve after learning Figure 5 the movement of hidden units in the learning process
Figure 6 the movement of test error in the learning process
From the parameters and the output figures we can notify that the number of hidden units is much bigger than that of 4 in Experiment 1 which uses LMBP algorithm to test error. The size of the network is much bigger as well. The test error which represents generalization capability is 1.5488; this is much bigger than the mean test error in Experiment 1.
Experiment 3 tests the generalization capability of network structure optimization algorithm. It is proved that the algorithm can train proper network structure for different functions. We choose three different functions to simulate and Table 2 gives the results. This experiment shows that the algorithm can train proper network structure for any functions.
Table 2 the results of different functions using ameliorated LMBP algorithm
Structure after network training
f = x3
f = x*sin(10*x)*cos(x)
This paper uses the BP network which is theoretically mature to find out a useful way to optimize neural network effectively. The paper starts from function approximation and uses the ameliorated BP algorithm-LMBP algorithm, and then uses the rule of error movement to ameliorate the algorithm to optimize network structure dynamically. The experiments and comparison of the experiments indicate that this algorithm is effective, stable and generalized. The network structure trained by this algorithm is much simpler, more capable of generalization, and more astringent. In our research, we also find that the ameliorated LMBP algorithm can optimize different size of the network for different functions. The future study will be focused on the complexity of the function that affects the network structure.