Disclaimer: This is an example of a student written essay.
Click here for sample essays written by our professional writers.

Any scientific information contained within this essay should not be treated as fact, this content is to be used for educational purposes only and may contain factual inaccuracies or be out of date.

Types of Neural Networks

Paper Type: Free Essay Subject: Biology
Wordcount: 6534 words Published: 8th Feb 2020

Reference this

A neural network is something that works in a way that is similar to the human mind. The most common types of neural networks being explained are,

  1. Recurrent Neural Networks
  2. Recursive Neural Networks
  3. Kohonen Self-organizing Networks
  4. Deep Belief Networks

RECURRENT NEURAL NETWORK:

  1. Introduction:

This type of neural network was first invented by John Hopfield and his variant is called Hopfield networks. The term Recurrent Neural Networks was coined because of the extra connection present in the network which connects the output values again to the input area. This allows for the data to keep flowing within the system thereby helping the system to work more efficiently when dealing with applications that require continuity. Applications like weather predictions, stock predictions, Speech and image recognition greatly benefit with the usage of recurrent neural networks. This Network types was developed in the 1980s to mainly perform the role of Content-addressable memory systems.

The main difference between a normal neuron and a recurrent neuron is the presence of three set of weights instead of two.

 

 Input     Output

F(x)

 

Here there is a weight on the input side, a weight set on the output side and another weight set on the feedback section.

  1. Network Architecture:

The general architecture of the Recurrent Neural Network can be used to explain the working of this specific network. The recurrent neural network works under the principle of combining the hidden layers. The same parameters are used across all the layers which is unlike other networks where different parameters are used for each layer. This can be used to deal with sequential data[1].

      Hidden

Recurrence

        

I1

 

O

 

I2

The process here is to feed in the input set along with the previous hidden value to the hidden layer which then goes to the output. The input layer transfers the values to the hidden layer where a nonlinear function such as ReLU or Tanh function is applied and at the output layer a softmax classifier function is applied.

  1. Training the Network:

Training this type of network is by using a slightly modified version of Backpropagation. This version is called Backpropagation Through Time (BPTT). This means the gradient output calculated for one step does not require the values from the current step alone but also from the previous step. The training is done by separating the data into test set and training set. The results are validated using the test set and a loss function is defined.

Recurrent Neural Networks has been modified and used as various networks. Some of the popular types of Recurrent Neural Networks are,

  1. LSTM Networks
  2. Bidirectional Recurrent Neural Networks
  3. Hopfield Networks
  4. Gated Recurrent Unit
  5. Continuous Time Recurrent Neural Network

LSTM Networks:

  1. Introduction:

These are called Long-Short Term Memory units and they were invented in the 1990s by German scientists Sepp Hochreiter and Juergen Schmidhuber. The main reason this type of Recurrent Network was created was to solve the problem of vanishing gradient and can help learning long-term dependencies. They can remember patterns over a long duration of time and this gives them their name.

  1. Network Architecture:

The architecture of an LSTM network consists of cells which act as memory blocks and two important values are transferred from one cell to another. They are the cell state and hidden state. The data in each cell can be modified using three types of gates[2]. They are,

  1. Forget gate
  2. Input gate
  3. Output gate

Forget Gate:  A forget gate can be used to edit or delete the information present in the cell state. This is done by multiplying using a filter and it is necessary to improve the performance of the network. This gate has two inputs; ht-1 which is the value of the previous hidden state and xt is the current input state.

Input Gate: The input gate helps in obtaining data and helps in adding it to the existing data present in the cell state. This process involves using a sigmoid function to regulate the values and a tanh function is used to create a vector and multiplying the regulated data to the vector. The input gate also  has two inputs; ht-1 which is the value of the previous hidden state and xt is the current input state.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

Output Gate: The output gate helps in collecting the information present in the cell state and showing it as output. This process involves using a tanh function and developing a vector and then use a filter to regulate the values that have to be displayed and this is done by using a sigmoid function. The final step involves multiplying the values that have been regulated before displaying them. This gate also has two inputs; ht-1 which is the value of the previous hidden state and xt is the current input state.

Bidirectional Recurrent Neural Networks:

  1. Introduction:

This is a type of extension to the recurrent neural network which helps in connecting information from two different states (the past and the future states. This type of recurrent neural networks was invented in 1997 by Paliwal and Schuster. This was invented mainly to access information from the future time state. This network can be developed by placing two recurrent neural networks on top of each other. There are Deep Bidirectional Recurrent Neural Networks which consists of multiple Bidirectional Recurrent Neural Networks stacked on top of each other[6].

This network’s construction can be done ensuring one part of the neuron is responsible for positive time phase and the other part of the neuron is responsible for the negative time phase.

  1. Training the Network:

This Bidirectional recurrent neural network can be trained by the same methods that were used to train Recurrent Neural Networks since the Neurons handling the opposite time phases are almost completely independent but in the case of Backpropagation Through Time alone the method is slightly modified by updating the input and layers separately.

 

 

          Forward Phase

 

                       Backward phase

 

 

Time(t-1)   Time(t)     Time(t+1)

Hopfield Networks:

This Network was invented by Dr. John Hopkin in 1982 and is a fully connected recurrent neural network which has only one layer and each neuron in that layer are connected with every other neuron in the same layer and every one of the neuron has an activation value of either 1 or -1. This type of network can be updated either synchronously on asynchronously.

This network can be trained by using learning rules. The input is taken and weights are added to it. Hebbian learning rule is the most predominantly used learning rule for training this network. Hopfield Networks can be used for pattern recognition, image recognition and medical image analysis.

Gated Recurrent Unit:

This is a type of recurrent neural network that is similar to Long-Short Term Memory Units in the sense that gating is done albeit in a slightly different method. This network has an update gate and a reset gate. The update gate works initially and helps in regulating the information that must be allowed to pass and the reset gate helps in classifying the data and deciding on which data to expel. Gated Recurrent Unit is a viable alternative to LSTM when more efficiency is required while slightly compromising on the performance because of the reduction of the number of gates from three to two.

Continuous-Time Recurrent Neural Network:

 This network contains neurons which is connected with every other neuron in the network with a time delay and the activation function of each neuron is evolutionary. This network can be efficiently used to solve constraint based problems and have the capability of recreating any dynamic system.

Applications of Recurrent Neural Networks:

  1. Can used in a wide variety of ways in the field of Natural Language Processing.
  2. Can used for image recognition, image generation and analysis.
  3. Text generation, text recognition, music recognition and Machine translation.
  4. Image captioning, Sampling and stock predictions.

RECURSIVE NEURAL NETWORK:

  1. Introduction:

A recursive neural network is a hierarchy based neural network where there involvement of time. The execution steps flow through the tree like structure of the network and is not time-based like recurrent neural networks. This type of neural network was developed in the 1990s for the purpose of solving Natural Language Processing based problems. They can be effectively used to learn logical structures and for solving trees[3].

  1. Network Architecture:

The network has a tree-like hierarchical structure where the weights are shared among all the nodes of the network. This network has parent nodes and the nodes connected to it at a lower state are children nodes which are similar to the parent nodes. This structure helps it in solving hierarchical structure-based problems very effectively.

                                                                                                                                                                                                                                                                                      First Hierarchical state                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Second Hierarchical state                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Third Hierarchical state

xbranch

vbranch

ybranch

zbranch

Zleaf2

Zleaf1

This diagram explains the hierarchical data of recursive neural networks and this network can be trained effectively using Backpropagation Through Structure; a method that is a variant of the method used for training recurrent neural networks; Backpropagation Through Time.

  1. Training the Network:

The training part involves applying Stochastic Gradient Descent to the data which is randomly from the training set and multiple iterations are done to optimize the function completely.

Applications of Recursive Neural Networks:

  1. Most commonly used for sentence parsing, image parsing and paraphrase detection
  2. Can be used for Image segmentation
  3. Extensively used in NLP related tasks.

DEEP BELIEF NETWORKS:

  1. Introduction:

A Deep Belief Network is a variation of neural network where the network is pretrained and mainly unsupervised learning is used. This network is acyclic and the variables are random. The layers and nodes in the layers have undirected connections. This type of network was developed by Geoffrey Hinton, Simon Osindero and Yee-Whye Teh in 2006 to improve the performance of image recognition, image generation, motion capturing and improved image retrieval performance.

  1. Network Architecture:

The general architecture involves these undirected layers are called Restricted Boltzmann Machines and these stacks of Restricted Boltzmann Machines help in communicating with both their next layer and also their previous layers. The network can be said to contain two important parts.

  1. Belief Net
  2. Restricted Boltzmann Machine

The Belief Net contains random binary units and all the layers have weight sets. A bias and the weighted input decide whether the state is either 0 or 1. This Net solves two major problems which are the Inference problem and the Learning problem. The Inference problem is the problem of identifying the states of the variables that have not yet been observed. The learning problem is the problem of estimating and altering the variables to ensure that the observed value is obtained as the result.

 VISIBLE UNITS      HIDDEN UNITS

H1

V11

 

H2

V2

            

H3

V3

H4

The Restricted Boltzmann Machine is a recurrent neural network which contains random values which are binary in nature and it can analyze and learn the probability distribution of the input values. A Restricted Boltzmann Machine contains a layer of visible units and a layer of hidden units. A set of weights is assigned for the connections between the visible units and the hidden units. These RBMs can be trained by using an algorithm called Contrastive Divergence. These have lesser layers and the nodes are connected independently to ensure the learning rate is higher. The Energy function E is calculated by taking the negative summation of visible unit, hidden unit and the weights.

                            E = –

(vx*hx*wxy)

This diagram shows the connections of a Restricted Boltzmann Machine with three input units and four hidden units. A Deep Belief Network consists of Restricted Boltzmann Machines stacked on top of one another. They use restricted Boltzmann Machine for the Pre-train duration and a feed forward network is attached to it to perform the final fine-tuning process.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  RBM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          RBM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                RBM

H3

H2

H2

H1

H1

H1

X(Data)

X(Data)

X(Data)

This is the architecture of a typical Deep Belief Network where the RBM are used initially and then connected to a feed forward network.

  1. Training the Network:

Training this type of network is done by using the contrastive divergence method. This method involves using the Gibbs sampling method inside the Gradient descent process[4]. This method has four major steps and they are,

  1. A sample is obtained and the probabilities are found and a vector is created.
  2. The positive and negative gradient are calculated
  3. The weight matrix is updated
  4. The biases are updated analogously.

Each of the layers are trained as Boltzmann Machines one by one in a sequential order in a method called Greedy Layer-wise training method. This method is effective since the data is modelled into two tasks where the first task is to learn the generative weights and the second task is to perform the modelling of the posterior distribution which results in a more efficient method. The training can be done effectively by making use of the MNIST database which has a vast collection of images which are exclusively available for network training purposes.

Applications of Deep Belief Networks:

  1. Face recognition
  2. Image recognition
  3. Handwriting recognition
  4. Motion capture analysis
  5. Speech recognition
  6. Acoustic modelling

KOHONEN SELF-ORGANIZING NETWORK:

  1. Introduction:

A Kohonen Network is also called as Kohonen Self-Organizing Map (KSOM). This type of network was developed to help cluster together data which are related. The modern version of this was invented by Dr. Teuvo Kohonen in 2001.The type of learning followed here is called competitive learning and helps to map the weights to the input information. These self-organizing maps can be done using either supervised or unsupervised learning[5].                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      X1                                                                                     Y1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    X2                                                                                     Y2

Unsupervised learning is more commonly used to analyze and interpret patterns and data that are not labelled. This process does not produce any error signal and and hence the output value cannot be verified to be accurate or not.

  1. Network Architecture:

This is the basic architecture of the Self-Organizing Map. The topology consists of a 1D or 2D array of output nodes which are connected to the input nodes and the output nodes are not unordered. The 1D or 2D lattice of output nodes each have a specific xy co-ordinate and they contain the weight vectors.

  1. Training the Network:

The training algorithm involves a few steps which are,

  1. The weights and the rate of learning is initialized
  2. The winning output unit is found by calculating the square of Euclidean distance
  3. The present input pattern is given  
  4. The new weight and the learning rate are updated
  5. The topology is reduced and a check is done to see if the stopping condition is achieved.

The winning output is calculated by taking summation,

   Dj = ∑(Wxy – x)2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       2D output layer lattice                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Weight Matrix                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       X1                               X2

Y3

Y4

Y7

Y2

Y5

Y8

Y1

Y9

Y6

Applications of Kohonen Self-Organizing Network:

  1. Speech recognition and prediction
  2. Priority bases analyses
  3. Seismic analysis

REFERENCES:

[1] Gang Chen, “A gentle Tutorial of Recurrent Neural Network with Error Backpropagation”, ArXiv, 14 January,2018.

[2] Abdelhadi Azzouni et al, “ A Long Short-Term Memory Recurrent Neural Network Traffic Matrix Prediction”, ArXiv, 8 June, 2017.

[3] Ozan Irsoy, Claire Cardie, “ Deep Recursive Neural Networks for Compositionality in Language”, NIPS, 2014.

[4] Marc’ Aurelio Ranzato et al, “Sparse Feature Learning for Deep Belief Networks”, NIPS, 2008.

[5] Teuvo Kohonen, “The Self-Organizing Map”, IEEE, 1990.

[6] M. Schuster, K.K Paliwal, “Bidirectional recurrent neural networks”, IEEE Transactions on Signal Processing, 1997.

 

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: