Network Intrusion Detection Using Machine Learning Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this cyber era, internet has become a vital source of communication in almost every profession. With the increased usage of network technology, its security has become very critical issue as the computers in different organization contains highly confidential information and sensitive data. The technique used to monitor the network security is known as Network Anomaly detection. Network Anomaly detection is used to detect network intrusions by monitoring the behavior of network traffic to check whether it is normal or abnormal. Different techniques are used to implement network intrusion detection system.

This work trains two different Machine Learning techniques, both supervised and unsupervised, for Network Intrusion detection. These techniques are Naïve Bayes classification (supervised learning) and Self Organizing Maps (unsupervised learning). The KDD Cup 99 dataset is used for Intrusion Detection Problem. As KDD Cup 99 dataset contains some symbolic attribute as well as numeric attributes, two types of symbolic conversion method have been used for these attributes. These are conditional probabilities symbolic conversion and indicator variables symbolic conversion. The two machine learning techniques are trained on both type of converted dataset and then their results are compared regarding the accuracy of intrusion detection.

Keywords:

Table of Contents

Undertaking iii

Acknowledgements iv

List of figures vii

List of tables viii

Chapter 1: Introduction 1

1.1. Overview 1

1.2. Network Intrusion 1

1.3. Network Intrusion Detection 2

Misuse detection: In misuse detection system, the information obtained from the network is compared with a database of attack signatures. A signature is a set of rules that define a network attack. Misuse detection techniques are mostly used for commercial purpose or in industries having the network systems. 3

Anomaly detection: In anomaly detection system, the normal behavior of internet is defined and then the network is monitored by comparing the traffic with the defined normal traffic and behavior. Hence it identifies the behavior that deviates from the normal behavior and that behavior or connection is marked as an attack. Anomaly detection techniques are used for research purpose and academic areas for implementing the network intrusion detection systems due to its theoretical ability for addressing different kinds of attacks [3]. 3

1.4. Approach 3

1.5. Objectives 4

1.6. Organization of the Thesis 4

Chapter 2: Machine Learning Approaches 5

3.1. Naive Bayes 5

6

3.2. Self Organizing Maps 7

2.2.1. Learning Rule for SOM 8

Chapter 3: KDD Cup 1999 Dataset 11

3.1. Introduction 11

3.1. Classification of Features 12

3.1.1. Basic Features 12

3.1.2. Content Features 12

3.1.3. Traffic Features 12

3.2. Symbolic features 13

3.2.1. Protocol features 13

3.2.2. Service features 13

3.2.3. Flag features 14

3.3. Symbolic features 15

Chapter 4: Implementation 16

4.1. Collection of the dataset 16

4.2. Data Processing 16

4.1.1. Conditional probability 17

4.1.2. Indicator Variables 18

4.3. Training and Testing Phase 19

4.3.1. Self Organizing Map 19

4.3.2. Naïve Bayes 21

4.4. Performance Evaluation 22

Chapter 5: Experimental Results and Discussion 22

Chapter 6: Conclusion 24

Abbreviations 26

Annexure 27

List of figures

Sample is given below:

Number Page

Fig 2.1Block diagram of xyz block 8

Fig 2.2 Basis Functions Diagram 9

Fig 2.3 Block Diagram of Codec 10

Fig 3.1Encoder 13

Fig 3.2 Decoder. 14

List of tables

Similar to the list of figures.

Chapter 1: Introduction

This chapter describes the Network Intrusion and its harmful effects on security of organization information. It also describes the significance and necessity of Network Intrusion Detection Systems to address the problem of network intrusions. It explains the approach taken to implement Network Intrusion Detection System.

Overview

In this cyber era, internet has become a vital source of communication in almost every profession. The Internet is very efficient and cheap method of communication in every important field of life. With the increased usage of network technology, its security has become very critical issue as the computers in different organization contains highly confidential information and sensitive data. However, the Internet Protocol (IP) on which the whole internet is based is very insecure and vulnerable to viruses and hackers. Data Confidentiality and security is one of the important issues for almost all organizations especially for Sensitive fields like Military, Avionics and Nuclear Power Centers. Everyday many companies face new and unidentified type of security threats from various kinds of network intruders and hackers. Network security has become one of the biggest challenges for highly sensitive organizations.

Network Intrusion

A Network intrusion is a suspicious and sudden deviation from the normal behavior of the network. The intrusion threatens the confidentiality, integrity and security of a network system. An intrusion can be defined as "Any set of actions that attempt to compromise the integrity, confidentiality or availability of information resources" [1]. Network Intrusion includes different kind of network attacks, data lost, worms, unacceptable use of policies and abnormality in the usual behavior of network traffic. However, the four major types of network attacks are

Remote to User attacks (R2L): This is an attack in which the privileges of a user are exploited on a remote computer. In this type of attack, the attacker does not have an account on a particular machine but he tries to gains unauthorized access from a remote machine like guessing password [2]. Its examples are xlock, guess_password, phf, sendmail, xsnoop etc [6].

Denial of Service (DoS): In this attack the memory resources are not allowed to respond to requests to the network and internet made by the users because the resources are made too busy by the intruder to respond to a legitimate or valid request. Hence the users are not allowed to use the service of their systems in this attack. Its examples are apache, smurf, Neptune, back, mailbomb, udpstorm etc [6].

User to Root Attacks (U2R): In this attack the user attempts to gain the privileges of administering user via a local user account. It means the attacker already has a local user account either legally or by using illegal means, and then he tries to gain administrative access to the machine [4]. One of its examples is buffer overflow in any form [2]. Its examples are Perl and xterm [6].

Probing: In this attack first the user identifies the weaknesses of the system so that they can be used to destroy the system later. It is done by gaining information about the host and network of computers. Its purpose is to exploit the security resource. Their examples are saint, portsweep, mscan, nmap etc [6].

Network Intrusion Detection

A Network intrusion Detection System inspects the incoming network traffic and identifies the suspicious activity on the network. Network Intrusion Detection is the most important and most widely used network security technique used to identify the vulnerable attacks by monitoring the network behavior and then taking necessary actions against them. Network Intrusion Detection system are classified as host based and network based. This classification depends upon the data sources that are used [2]. Host Based system uses the records and data maintained by the operating system. By using these records, the system can monitor things like system logs, users account records and file systems [5]. Network Based intrusion detection systems use network traffic information as its data source and monitors the network traffic [2].

In order to determine the behavior of the network traffic, the Network Intrusion Detection System must know the rules for normal activity on the Internet so later they can be compared with the suspicious behavior to identify an anomaly. The Network Intrusion Detection System can be categorized into two categories based on the intrusion detection technique; these categories are given below,

Misuse detection: In misuse detection system, the information obtained from the network is compared with a database of attack signatures. A signature is a set of rules that define a network attack. Misuse detection techniques are mostly used for commercial purpose or in industries having the network systems.

Anomaly detection: In anomaly detection system, the normal behavior of internet is defined and then the network is monitored by comparing the traffic with the defined normal traffic and behavior. Hence it identifies the behavior that deviates from the normal behavior and that behavior or connection is marked as an attack. Anomaly detection techniques are used for research purpose and academic areas for implementing the network intrusion detection systems due to its theoretical ability for addressing different kinds of attacks [3].

Approach

The basic idea in this thesis is to use two different machine learning techniques to analyze the network behavior and identify the suspicious attacks. The machine learning techniques would be supervised and unsupervised learning. For supervised learning, Naïve Bayes Algorithm is trained on the dataset and for unsupervised learning; Self Organizing Map is trained on the dataset. By using two different techniques, the efficiency and accuracy of each technique can be identified in intrusion detection.

The dataset chosen is KDD Cup 1999 dataset as it is specifically extracted for the intrusion detection problems. This dataset was extracted from the 1998 DARPA Intrusion Detection Evaluation Program which was prepared and managed by MIT Lincoln Labs. This dataset contains many connection records which have symbolic as well as numeric attributes. A new version of the dataset NSL-KDD is used for the implementation of intrusion detection systems. This dataset has made a lot of improvement in the KDD Cup 1999 dataset [7]. This dataset contains symbolic features as well as numeric features. The numeric features also contain attributes in large range. The symbolic features also need to be converted to numeric in order to be used by machine learning techniques. So the dataset is preprocessed to scale the large ranges of attributes to smaller range and symbolic features are converted to numeric features using symbolic conversion techniques. For the preprocessing of symbolic features of the dataset in this work, two different types of symbolic conversion methods are used which are Conditional Probability and Indicator Variables Conversion methods. After training the effect of symbolic conversion along with its particular machine learning algorithm are compared on the test dataset.

Objectives

The main objectives of this research work can be summarized as follows

Collection of the Dataset

Partitioning the dataset into training and test data

Preprocessing the dataset

Training two different machine learning techniques (supervised and unsupervised) on the dataset

Applying the trained machine learning methodologies on the test data

Comparing the results of all the two techniques

Organization of the Thesis

The thesis is organized into 5 Chapter further. Chapter 2 describes the Machine Learning Techniques and the Symbolic Conversion techniques for the dataset. The dataset chosen is discussed in Chapter 3 along with its implementation details. Chapter 4 discusses the results achieved. Chapter 5 describes the Future work and Conclusion.

Chapter 2: Machine Learning Approaches

This chapter discusses the theoretical detail of Machine Learning techniques that have been chosen to implement Network Intrusion Detection Systems. It describes two machine learning technique one of which is supervised machine learning technique and the other one is unsupervised machine learning technique.

Naive Bayes

Naïve Bayes Classification is a supervised learning technique. In supervised learning the aim is to train a system to map the input to output given the correct values are provided by the supervisor [9]. Naïve Bayes Classifier is based on Bayesian Classification technique. Bayes rule calculate the posterior probability P(C|x) using likelihood P(x|C) and prior P(C) with evidence P(x) as follow

[9]

Where C is the class and x is the input.

Naïve Bayes Classifier is a Bayesian Network which is based on the assumption that all the input attributes are conditionally independent given the target value [10]. Given a series of n attributes, the Naïve Bayes classifier make 2n! independent assumptions [6]. It reduces a multivariate problem to a group of univariate problem. In Naïve Bayes a new instance is provided with a tuple of n attribute values (a1, a2, ….., an), where n is the dimension of input instance.

[10]

[10]

Where vnb denotes the output value generated by the Naïve Bayes Classifier, vj is the target value that can be taken by the new instance from the set V [2]. P(vj) is the probability of target value and P(ai|vj) is the conditional probability that a particular feature f has attribute a given the target value v. In our case

While training, Naïve Bayes Classification requires only one scan of the connection vector or input instance. It does not need to be trained in multiple iterations.

The Naive Bayes classifier is designed for use when features are independent of one another within each class, but it appears to work well in practice even when that independence assumption is not valid. It classifies data in two steps:

Training step: Using the training samples, the method estimates the parameters of a probability distribution, assuming features are conditionally independent given the class.

Prediction step: For any unseen test sample, the method computes the posterior probability of that sample belonging to each class. The method then classifies the test sample according the largest posterior probability.

The class-conditional independence assumption greatly simplifies the training step since you can estimate the one-dimensional class-conditional density for each feature individually. While the class-conditional independence between features is not true in general, research shows that this optimistic assumption works well in practice. This assumption of class independence allows the Naive Bayes classifier to better estimate the parameters required for accurate classification while using less training data than many other classifiers. This makes it particularly effective for datasets containing many predictors or features.

In testing phase, the conditional probabilities that are found in training phase, are used for a new instance of an input. The Naïve Bayes formula is applied on the attributes of the input to find the output value vnb for each of the attack connection and normal connection. The input is labeled as an attack or normal depending on with which connection type vnb is maximum.

Consider an example of solving the problem that whether one should play tennis or not under certain conditions of weather [10]. There are four features which would be used to decide. These are Outlook, Temperature, Humidity and wind. We have to predict the output yes or no for the target value playtennis. For doing this we will create a lookup table for each of the feature and make n number of rows in the table according to the number of attribute of the feature. There would be three columns for the lookup table of each feature. One column will list the attribute, the other two columns will contain the probability of the attribute given playtennis equals to yes P(a = a1|playtennis = yes) and probability of the attribute given playtennis equals to no P(a = a1|playtennis = yes) where a is the attribute and a1 is the particular attribute in each row. When the Naïve Bayes classifier would be trained on a particular dataset, the lookup table generated would be used in testing phase. In testing phase when a new input instance would be given, the probabilities from the lookup table would be used in the Naïve Bayes theorem and equations.

Naïve Bayes is the highest practical learning method. Its performance is comparable to other neural network and decision trees in many domains. Naïve Bayes methods are widely used in classification of text problems. An interesting difference between Naïve Bayes and other neural network techniques is that in Naïve Bayes there is no need of explicit search in space of hypothesis [10]. It is formed by finding the probabilities of various data combinations in the training dataset.

Self Organizing Maps

Self Organizing Maps (SOM) is data analysis and visualization technique in Machine Learning proposed by Professor Kohonen (1990, 1995). The idea of SOM is to reduce the dimensions of high dimensional data into 1 or 2 dimensions so that they can be easily understood and visualized by humans. The data is converted into a graphical representation so that the features of the data can easily be identified. A Self Organizing Map is a self learning neural network because it competes the neurons to check who represents the data better and winner neuron is the one that best matches the data. It is an unsupervised learning because no or little human intervention is required during the learning process. In unsupervised learning technique, there is no supervisor who provides correct values of output, we only have input data. A self organizing map creates a map of 1 or 2 dimensions to reduce the dimensions of high dimensional data and then it makes the cluster of similar items together on the map. It arranges the resulting clusters on a grid. It is a competitive network where goal is to transform an input data set of arbitrary dimension to a 1 or 2-dimensional topological map [11]. By competitive we mean that the neurons in the SOM that are most similar to the input are adjusted to match the input more. The self organizing map display similarities of items or concepts.

Fig 2.1: Structure of SOM [8]

The structure of the SOM is a basically single feed forward network, where each input attribute at the input layer is connected to all the output items at the output layer. The output layer is mostly a 2 dimension grid of output units. The number of dimensions of input layer is higher than number of dimensions of output layer. In SOM each neuron of the output layer is represented by weight vectors with dimensions equal to the dimension of the input instance. Each neuron is connected to adjacent neuron with a neighborhood relation who defines the structure of the map.

Learning Rule for SOM

The learning phase of SOM starts by initializing the weight vectors of SOM by random values. After initialization, the following steps are taken to train the SOM.

Choose a vector x from training data in cyclic order and present to the map

Distance Measure: Calculate the distance of the input vector to the SOM to find the best matching unit. Commonly used distance is Euclidean Distance. It is given as

Where dij is the distance of input neuron x of n dimensions to the neuron wij of the output layer in the SOM, i and j are the coordinates of the weight vector on the map. The SOM neuron with the minimum distance to the input neuron is designated as winner neuron d(k1,k2), where k1 and k2 are indices of winner neuron.

Update Rule: Once the best matching unit has been found, the next step is to update the winner neuron and its neighbor to be more like the input neuron. The update rule is given as

Where α(t) is a learning rate function. It also decreases with time. h(ρ,t) is the neighborhood function. It is given as

ρ is the Euclidean distance of the winner neuron to the neighborhood neuron. σ is the radius of the neighborhood.

Repeat from step 1 for m epochs with k no of inputs. These steps are repeated until the training comes to an end.

The basic SOM algorithm can be demonstrated by a simple example [2]. Consider a simple 4 by 4 map. The input is one dimension. An input x = 6.4 is send to the map. Here we choose σ = 1, α(t) = 0.5. After calculating the Euclidean distance of input with each neuron of the SOM, we find that weight vector with value 6 is the winner neuron. Now the winner neuron along with its 4 neighbors is updated using the update rule. The figure below graphically demonstrates the updating process,

8.2

6.2

15

14

16

12

8

3

4

13

9

5.7

1

4.2

11

6.7

4

3

2

1

11

7

10

6

12

8

5

6.4

9

16

15

14

13

Fig 2.2: SOM Learning Example

The Self Organizing Map is one of the most popular neural network techniques and it is widely used in variety of application. These applications may involve analysis of data. Many commercial network intrusion detection applications also use SOM algorithm to detect attack connections in the network packets. SOM is also used as a preprocessor for a supervised learning algorithm [9].

Chapter 3: KDD Cup 1999 Dataset

This chapter describes the KDD Cup 1999 dataset which has been chosen for training and testing in our NIDS. It is selected because it is used widely in network intrusion detection systems.

Introduction

The KDD Cup 1999 dataset is based upon the version of a dataset that was used by 1998 DARPA Intrusion Detection Evaluation Program. This dataset consisted of simulations of intrusions for a military network environment. The KDD'99 dataset consists of a large number of connection vectors which are estimated above 500,000 connections. The connection is defined by [] as follows

A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address. [Task Description]

Each connection has 41 features. Each connection is classified as normal or as an attack connection. The attack types can be categorized into 4 different classes as discussed in chapter 1.

DOS

R2L

U2R

Probing

The training data consists of 24 attack types and the test data consists of additional 14 attack types. Each record of the dataset contains 41 features.

34 features are numerical.

7 features are symbolic.

The features can be categorized in 3 classes

Basic Features

Content Features

Traffic Features

Classification of Features

The features of connection records in KDD Cup 1999 dataset are categorized into three main categories which are discussed below.

Basic Features

This group contains the features that can be extracted from a TCP/IP connection [3]. Its examples include duration of connection, protocol type (tcp), service of the connection and amount of data transferred.

Content Features

These features are extracted from payload of the connections [2]. These features enable one to identify the suspicious behavior in the connection. For example number of failed login attempts, root access obtained or not.

Traffic Features

These features are based on a window interval and are divided into two categories which are given as follows

Time Based Traffic features

These include connection of only past 2 seconds that has same service as the current connection [3].

Host Based Traffic features

These connections are derived from the past 100 connections. They are used for catching attacks that span longer than 2 seconds [2].

Symbolic features

The KDD'99 dataset has 7 symbolic features out of which 4 are binary features and other 3 have more than 2 attributes. We will consider the later three features for symbolic conversion in our work. These symbolic features are listed in table below along with their number of attributes.

Table 3.1: Symbolic Features

No

Feature Name

Number of attributes

1

Protocol Type

3

2

Service

70

3

Flag

11

4

Land

2

5

Logged in

2

6

is_host_login

2

7

is_guest_login

2

Protocol features

The protocol feature describes the protocol type for the connection. There are 3 types of protocols which are represented symbolically in the KDD'99 dataset as follows

TCP

UDP

ICMP

Service features

The service feature describes different types of services that are available for connections to utilize. There are 70 services that can be used. Each of the 70 service has been grouped into eight clusters depending on its utilization by TCP port [1].

Services that are used to remotely access other machines.

File transfer services e.g., ftp.

Mail transfer services e.g snmp.

Web Services like web server, http.

Services used to obtain statistics of the system.

Name servers services.

Services for other protocols like ICMP..

Flag features

The flag features describe the status of the connection. There are 13 flags but 11 of them are used in KDD'99 dataset. These features are further clustered into 6 types which are described in table below.

Table 3.2: Flag Features [1]

Cluster

Name

Description

F1

S0

Connection attempt seen, no reply

REJ

Connection attempt rejected

F2

S1

Connection established but not terminated

SF

Normal establishment and termination

OTH

No SYN seen, just midstream traffic

F3

S2

Connection established and close attempt seen by originator

RSTO

Connection established, originator aborted

F4

S3

Connection established and close attempt seen by responder

RSTR

Connection established, responder aborted

F5

RSTOS0

Originator sent a SYN followed by a RST, SYN ACK not seen by the responder

SH

Originator sent a SYN followed by a FIN, SYN ACK not seen by the responder

F6

RSTRH

Responder sent a SYN ACK followed by a RST, SYN not seen by the originator

SHR

Responder sent a SYN ACK followed by a FIN, SYN not seen by the originator

These were the symbolic features included in KDD'99 dataset. The other 4 numeric features have binary values 0 and 1. These binary features are given above in Table 3.1. The three symbolic features protocol, service and flag are converted to numeric features using symbolic conversion methods.

Symbolic features

The KDD'99 dataset has 34 numeric features with different ranges.

Chapter 4: Implementation

This chapter describes the implementation of Network Intrusion Detection System using two machine learning techniques on the dataset that is preprocessed using two different symbolic conversion methods.

A brief description of the steps that would be followed is given below,

Collection of the training and test dataset from NSL-KDD [7] dataset.

Conversion of ASCII values to numeric values to be loaded in MATLAB 7.0.

Implementation of two different dataset preprocessing algorithms.

Implementation of Machine Learning Algorithms.

Training the Machine Learning Algorithms on preprocessed dataset.

Testing the Machine Learning Algorithms on the test dataset.

Comparison of results.

Collection of the dataset

The NSL-KDD [7] dataset is a new version of KDD Cup 1999 dataset. The NSL-KDD dataset includes a lot of improvements in the KDD CUP 1999 dataset. It mainly reduces the redundancy of many connection vectors in KDD CUP 1999 dataset. The training dataset is selected from NSL-KDD dataset to include 500 connection vectors out of which 250 are attack and 250 are normal connections. A small number of vectors are chosen for training dataset as the processing in MATLAB 7.0 is quite slow. The test dataset also includes 500 connection vectors. There are many ASCII values in the dataset representing normal connection with the word "normal" and attack connections with the word "attack". The word "normal" is replaced by the number 0 and the word "attack" is replaced by the number 1. The three symbolic features, protocol, service and flag are also converted to numeric values before preprocessing.

Data Processing

The KDD Cup 1999 dataset include many symbolic features, but most of the machine learning techniques does not handle symbolic features. To address this problem, the symbolic features in the KDD Cup 1999 dataset are converted to numeric values using two different approaches which are Conditional Probability and Indicator Variable Symbolic conversion method [1].

Besides symbolic features, the KDD Cup 1999 dataset also contains some features with large range of numeric values, for example src_bytes and dst_bytes are spanned over a large range which is 0 to 1.3 billion. The range of these features was reduced to 0.0 to 10.00 by using Logarithmic Scaling (with base 10).

Conditional probability

In this approach, each symbolic feature is replaced by an array of conditional probabilities of each class given the symbolic feature f has a specific attribute a [1]. This array is given as

where m is the number of attributes of a feature and n is the number of classes which are 2 in our case i.e., normal and attack.

The conditional probability Conversion approach is applied to symbolic features of KDD'99 dataset. These features include Protocol, Service and flag feature. When the KDD Cup 1999 dataset is preprocessed using conditional probability conversion, the 41 features of each connection vectors are increased to 44 features as 3 symbolic attributes which are protocol (3 attributes), service (70 attributes) and flag (11 attributes) are replaced by their conditional probability vector each. Each of the conditional probability vectors contains two values. Let's have an example of Protocol Feature to get a more clear understanding of the Conditional Probability approach.

Example

Protocol feature have 3 different attributes

Protocol (F2) = {tcp, udp, icmp}

The 3 attributes are replaced by conditional probabilities as follows

Table 4.1: Sample Dataset

f1

f2

…………………

f41

Attack/ Normal

X1

udp

…………………

Y1

Normal

X2

tcp

…………………

Y2

Normal

X3

udp

…………………

Y3

Attack

By applying the conditional probability symbolic conversion, we get the following result

udp = [P(Normal|f2 = udp), P(Attack|f2 = udp)]

tcp = [P(Normal|f2 = tcp), P(Attack|f2 = tcp)]

icmp = [P(Normal|f2 = icmp), P(Attack|f2 = icmp)]

Table 4.2: Converted Dataset

f1

f21 (Normal)

f22 (Attack)

…………………

f41

Attack/ Normal

X1

1/2

1/2

…………………

Y1

Normal

X2

1

0

…………………

Y2

Normal

X3

1/2

1/2

…………………

Y3

Attack

Indicator Variables

The Indicator Variable Conversion method use binary coding to represent the absence or presence of an attribute of each symbolic feature. 0 represents the absence of an attribute and 1 represents the presence of an attribute [1]. For example, if a feature has three attributes, that feature can be represented by 001,010,100 or 100,010,001.

When the KDD Cup 1999 dataset is preprocessed using indicator variables conversion, the 41 features of each connection vectors are increased to 122 features as 3 symbolic attributes which are protocol (3 attributes), service (70 attributes) and flag (11 attributes) are replaced by their indicator variables vector each.

Let's have an example of Protocol Feature to get a more clear understanding of the Indicator Variables approach.

Example

Protocol feature have 3 different attributes

Protocol (F2) = {tcp, udp, icmp}

The 3 attributes are replaced by indicator variables as follows

Table 4.3: Sample Dataset

f1

f2

…………………

f41

Attack/ Normal

X1

udp

…………………

Y1

Normal

X2

tcp

…………………

Y2

Normal

X3

udp

…………………

Y3

Attack

udp = [0 0 1]

tcp = [0 1 0]

icmp = [1 0 0]

Table 4.4: Converted Dataset

f1

f21

f22

f23

…………………

f41

Attack/ Normal

X1

0

0

1

…………………

Y1

Normal

X2

0

1

0

…………………

Y2

Normal

X3

0

0

1

…………………

Y3

Attack

The dataset is preprocessed in the testing phase using the same method which has been used in the training phase.

Training and Testing Phase

Both supervised and unsupervised machine learning techniques are trained on the preprocessed training dataset. The Naïve Bayes algorithm is used as supervised machine learning technique and SOM is used as unsupervised machine learning technique. After training both the techniques for two different types of preprocessed dataset, their performance are compared in detecting the attack connections.

Self Organizing Map

Learning using SOM requires initialization of SOM as codebook of neurons, training of SOM according to the algorithm discussed above and finally the test phase. The description of each of the phase is given below.

Codebook Initialization

The codebook array in the code represents the neurons of the self organizing map. The codebook is selected to have 30 by 30 dimensions. There would be 900 neurons in the codebook vector, each with dimensions same as the input instance. The input instance has dimensions according to the conversion techniques applied on the dataset. When conditional probability conversion technique is applied, an input instance has 44 dimensions; hence codebook neurons also have 44 dimensions. Similarly when indicator variable conversion technique is applied, an input instance has 122 dimensions, hence codebook neurons also have 122 dimensions. The codebook neurons are initialized using the rand function of MATLAB 7.0.

Training Phase

The self organizing map is trained on 500 connection vectors taken from NSL-KDD dataset. These connection vectors are classified as normal and attack. The SOM is trained on the dataset preprocessed by two different conversion methods which are discussed above. The SOM was selected to have 35 by 35 dimensions. The dataset was trained in 15 iterations. The neighborhood radius of SOM was chosen to be 15. The learning rate α(t) is taken to be 0.1 at the start and it decreases with time in each epoch. Mexican hat was chosen as the update rule.

h(ρ,t) is the Mexican hat function. ρ is the distance of the neighbors to the winner neuron which is calculated using Euclidean distance formula. σ is the space of the neighborhood which is 15 at the start which is decreased with each epoch.

After the calculation of update rule for each neighborhood, the neighbors are updated with the formula given

Here Wij(t) is the weight vector of SOM neuron at time t and X(t) is the input vector at time t.

The training phase is taken out in 15 epochs for 500 inputs each. During each epoch, the codebook vector is initialized according to the weights of the neurons. After 15 epochs we get an updated self organizing map which is used in the testing phase.

Testing Phase

In the testing phase, the dataset is taken to have 500 connection vectors among which half are normal and half are attack. The new inputs from the test dataset are given to the system to determine whether it is an attack connection or normal connection.

In testing using SOM, the updated codebook vector is used for testing each connection vector. The Euclidean distance is taken from each connection vectors to all neurons in the SOM. The neuron with minimum distance to the input vector is marked and determined if it has classified the input correctly. Then the number of false positives (attacks classified as normal) and false negative (normal classified as attacks) are determined. Finally the accuracy of the classification using Self Organizing Map is determined.

Naïve Bayes

Learning using Naïve Bayes algorithm requires training phase and the test phase. The description of each of the phase is given below.

Training Phase

Naïve Bayes Classifier is also trained on 500 connection vectors which are classified into two classes Normal and Attack. In training phase of Naïve Bayes, P(vj) and P(ai|vj) are calculated for each connection vector in the training dataset. The learning algorithm is trained on two type of preprocessed dataset. In both cases, a vector is created for each feature. The size of this vector is double the attributes of a feature. The reason behind the double size of the vector is that each element of the vector will hold the conditional probability value of each attribute of the feature given the class is normal or attack. First half portion of the vector is occupied with the conditional probability of the attributes of the feature given the class is normal and the other half contains the conditional probability of attributes given the class is attack.

Testing Phase

In testing phase, the same dataset is chosen for Naïve Bayes as is chosen for SOM having 500 connection vectors. The vectors of conditional probability of attributes that were stored in training phase of Naïve Bayes are used in testing phase. For each input from the test dataset, first the probability is found that the connection is a normal, and then the probability is found that the connection is attack. Both the probabilities found and compared to determine which one is maximum. If the probability for normal connection is greater than attack connection, the input is classified as normal else attack. In order to determine the probability, the product is taken of the probabilities found in training phase for each attribute of a feature.

Performance Evaluation

After the completion of training and testing phase for both machine learning algorithm, the results are compared. In comparison of results it is observed that which of the two machine learning technique gave minimum false positive rate in testing phase. It is also observed which of the symbolic conversion technique has a better effect on the performance of the systems and evaluation of results. Finally the accuracy of the both techniques is evaluated. Four combinations of results are made by finding the SOM performance with conditional probability symbolic conversion technique and indicator variable symbolic conversion technique. Then the performance for Naïve Bayes is observed both with the conditional probability conversion technique and indicator variable conversion technique.

Chapter 5: Experimental Results and Discussion

The SOM and Naïve Bayes algorithms are implemented in MATLAB 7.0. The training dataset contains 500 connection vectors out of which 250 are normal connections and 250 are attack connections. The test dataset also contains 500 connection vectors. As MATLAB 7.0 is quite slow, a few connection vectors were taken both for training and testing. When the conditional probability conversion technique was used, we had 44 attributes of each connection and when indicator variable symbolic conversion technique was used, we had 122 attributes for each connection.

The Self Organizing Map was chosen to have 30 by 30 dimension. SOM was trained on dataset containing 500 connection vectors in 15 epochs with 15 neighborhood space, the learning rate was chosen to be 0.1 in the start and it decreases with time. The Mexican hat function was used as the neighborhood update function.

After testing using SOM and Naïve Bayes machine learning algorithm on dataset preprocessed using indicator variables and conditional probability conversion techniques, we get the following results for false positive rate and accuracy (%).

Table 5.1: Results for detecting attack connections

Machine Learning Algorithms

Conversion Techniques

False Positive

Accuracy (%)

Self Organizing Map

Indicator Variables

14/250

94.4

Conditional Probability

15/250

94

Naïve Bayes

Indicator Variables

4/250

98.4

Conditional Probability

113/250

54.8

The table above shows that the best results are obtained by training Naïve Bayes Machine Learning Approach on dataset preprocessed by Indicator Variable Symbolic Conversion technique. The results obtained by training of Self Organizing Map on both type of preprocessed dataset are almost same and good. The worst results are obtained after training Naïve Bayes algorithm on dataset preprocessed using Conditional Probability Symbolic Conversion approach. We also deduce from the results that Indicator Variable Symbolic Conversion approach is better than Conditional Probability Conversion approach.

After testing using SOM and Naïve Bayes machine learning algorithm on dataset preprocessed using indicator variables and conditional probability conversion techniques, we get the following results for false positive rate and accuracy (%).

Table 5.2: Results for detecting normal connections

Machine Learning Algorithms

Conversion Techniques

False Negative

Accuracy (%)

Self Organizing Map

Indicator Variables

3 /250

98.8

Conditional Probability

16 /250

93.6

Naïve Bayes

Indicator Variables

198 /250

20.8

Conditional Probability

130 /250

48

As observed from the result of false negative rate, we find that in this case Self Organizing Map technique outperforms Naïve Bayes Machine Learning Algorithm. There is not much drawback of false negative rate as it does not affect the security of any organization. Though false positive rate may affect the performance of the detection system as more connections would be there to be scanned and verified.

Chapter 6: Conclusion

This thesis demonstrates the implementation of a network intrusion detection system using Machine learning techniques with different symbolic conversion methods for the dataset. The work shows the significant effects of different symbolic conversion applied on the dataset. As Self Organizing Map is unsupervised learning techniques, it should perform better than the Naïve Bayes as it is supervised learning technique but Naïve Bayes outperforms many neural network algorithms.

The effect of indicator variables conversion and conditional probability conversion is almost the same in case of detecting attacks in SOM learning but it varies in case of detecting attack in Naïve Bayes learning. The conditional probability conversion increases the dimension of symbolic features to 2, as each symbolic feature is replaced by two attributes because there are two classes Normal and Attack. Whereas dimensions increased by indicator variables is much large as it represents each attribute of a feature by total number of attributes in a feature, for example if a feature has 70 attributes, each attribute would be replaced by 70 attributes. Hence indicator variables largely increase the dimensions of dataset. Matlab was used for implementation; hence the processing was quite slow. Due to this reason the training and testing was performed on a small dataset. In future C language can be used for fast processing and for implementation of real time Intrusion Detection System.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.