Associative Implementation For Network Intrusion Detection Computer Science Essay

Published:

The use of network to connect to internet on computers has widely increased in recent times and so has the attacks generating from the network. The need for Network Intrusion Detection system is highly desirable to prevent attacks generating from the network as computers are not always protected against such attacks. These NIDS system checks the packet headers and the packet contents to detect attacks and it provides a better security against attacks coming from networks as compared to firewalls. NIDS requires an efficient string or pattern matching algorithms to detect attacks as the packets header and contents are checked against a large number of potential attacks. And at the same time NIDS must maintain its speed with the network speed.

A lot of approaches have been used for NIDS to support high speed networks like some ASIC based commercial products, use of hardware for string or pattern matching algorithm to improve the efficiency of NIDS for high speed networks as software based NIDS can only support network speed of few hundred Mbps. In this report we suggest an idea to improve the performance of NIDS by adding hardware for the string matching algorithm as it is the most computationally expensive part of the NIDS. Adding hardware will provide the benefit of replicating the hardware algorithm and allows it to run in parallel. One limitation would be that of hardware resources needed. In this report we present an idea of first converting the string matching algorithm into Finite automata which was done on software using java. Then we provided a hardware implementation for that automata using VHDL and tested it via simulation.

INTRODUCTION

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

As the use of computers connected to a network has increased drastically in recent times, so it becomes essential to protect computers against any attacks coming from the network. These attacks mainly include attempts to hack into computers through network or attempts to make the service unavailable to the user. These attacks are generally prevented by intrusion detection which is either host based or network based. Host based intrusion detection monitors traffic coming from the network on the host computer while network based intrusion detection monitors traffic on network. However firewall provide the basic network security but firewall only checks the packet header on the basis of IP address and TCP port number . But some servers like web and mail servers are allowed through a firewall so it becomes essential to check the content inside the packet as well with the packet header.

Network Intrusion detection systems (NIDS) detects suspicious network traffic and attacks originating from the internet by monitoring the network. NIDS not only checks the packet header but also checks the packet content for signatures to detect attacks, these signatures are just a pattern or string of data in packet which may be considered of potential threat. This is achieved by running intrusion detection on a dedicated machine connected to the network or possibly on a network router or firewall. An advantage of using NIDS is that it can detect the attack on the network itself and thus prevents it from causing problems in PC, as PC are not always protected for such attacks.

A lot of NIDS have been developed in the past and mostly it is software based. Software based NIDS are good for detecting attacks for slow networks or with networks with less traffic. But for high speed networks a hardware resource is needed. Another way would be to run a host based Intrusion detection as well with the software based NIDS, however this approach will increase the load on the host computer and will affect its performance and also it requires a software to be download on host computer which is not always possible as it is not possible to download software in embedded systems.

In this project we suggested a way to improve the performance of NIDS for high speed networks by adding hardware that will perform the computationally intensive parts of these operations - such as the string or pattern matching. One of the benefits of using hardware is that we can replicate our hardware algorithms and allow these to run in parallel depending upon the hardware resources. A number of different approaches have been used to implement this string or pattern matching in hardware. In this project we used the idea of converting the matching operation into a Finite Automata and then providing a hardware implementation for that automata.

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

In this project we converted the string matching operation which is the most computationally expensive part of the NIDS into Finite automata based on KMP string matching algorithm and then the automata was implemented as a 'state transition table'. To provide the hardware implementation for the automata we used the idea of associative memory or "content addressable memory" (CAM). We presented the memory with a data item that we are searching for (the search key) and it returned us the address of that item in memory. Then to associate a particular piece of output data with an output address, we added an additional output table to store these values. We then used this form of CAM to map one data item to another: the input data is looked up in the CAM to give an address which is used as an input to the RAM to select a piece of data. We then used this as an alternative way to store out state transition table, by using the current state and data input as the key input and the next state as the output.

BACKGROUND

SNORT

An example of such type of network intrusion detection system is Snort - lightweight intrusion detection system by martin Roesch [1]. Lightweight intrusion detection system means it can be easily deployed in any mode of network and it is generally used for small network with less network traffic. Snort uses 'rules' that specific checking network data packets for specific IP address and port numbers, and also require us to search selected packets for various strings or patterns of data that are identified as being specific to a particular attack.

Snort contains its detection rules in two fields known as header and content. Header checks the source and destination IP addresses and ports and content checks the packet payload for one or more specified pattern. The content rule is generally in an ASCII, mixed text or in Hex which is enclosed between "|" symbol. The Snort system contains Logging and alerting subsystem also. The logging option can be used to log packets for a given IP address and the alert is used to raise an alert for security. For example,

log tcp any any -> 11.1.1.0/35 67

The above rule will log all incoming traffic for port 67 going to IP address 11.1.0.1

alert tcp any any -> 11.1.1.0/35 67(content: "abc|4a1f|";

msg: "Incoming Traffic";)

The above rule will check for packets going to port 67 and IP address 11.1.1.0/35 and contain the pattern abc|4a1f|.

But as the speed of networks is quite high these days, lightweight intrusion detection system can consume large amount of processor time. Also, as Snort uses rules for intrusion detection, so with high speed networks it may only be possible to handle limited number of intrusion detection rules.

SEARCH PATTERN ALGORITHMS

Network Intrusion detection systems uses string or pattern matching algorithm to inspect the data coming from the network to detect the attacks. One of the most common and efficient algorithm used is the Boyer Moore algorithm [2]. The Boyer Moore algorithm searches the characters of the pattern from right to left, and if any mismatch occurs it shifts the text to the right using two shift functions the good-suffix shift [3] and the bad character shift [3]. However, according to Boyer Moore algorithm, each pattern is independently compared with the data which effects the performance because a substring can be repeated in multiple patterns and each time the pattern will be compared the repeated substring will be compared too.

Another efficient string matching algorithm is the algorithm suggested by A.V. Aho and M.J. Corasick [4]. This algorithm is generally used to find the occurrence of any number of keywords in a string. The algorithm consists of two parts, first part is to construct a finite pattern matching machine based on the set of keywords and then the string is inserted as an input in the finite pattern matching machine and it signals whenever there's a match for the keywords in the string.

Another one of the most efficient algorithm for pattern matching in string is Knuth, Morris, & Pratt algorithm [5], it is used to find the occurrence of a given pattern of string within another string. This algorithm searches the occurrence of given pattern of string of characters within another string and in case of a mismatch it doesn't backtrack and scan the previously searched characters again and based on the information of previous match of characters it predicts the position in the string where the next match should begin thus saving time and improving performance. For example, searching for a pattern abcab in string abcacabcab would be as follows:

abcacabcab

|

abcab

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

The pattern is placed on the left and the search begins by scanning the leftmost character of the string now there's a match for four characters of the pattern until the character c in the string. Now there is no need to remember the previously scanned characters of the string as the position of the pattern justifies that and the pattern can be shifted five places to the right and the match is found.

abcacabcab

abcab

However, Software based NIDS can only support networks with limited speed for example a software based NIDS with 500 rules would find it difficult to run those rules for a network with network speed of 100Mbps, but as the speed of networks are quite fast these days up to 2Gbps, some networks even have the speed of 10Gbps these days .So, software based NIDS cannot support the bandwidth of high speed networks and the only way to make software based NIDS to run on high speed network would be do eliminate some of the rules or compromise on leave some of the packets but both of these solutions will compromise on the security of the network.

HARDWARE

One way to improve the performance of NIDS for high speed networks would be to add hardware to it. Since the most expensive part of NIDS is the string or pattern matching operations, so NIDS can be improved by adding hardware that will perform the processor intensive parts of these operations - such as the string or pattern matching. . A benefit of using hardware being that subject to limitations in the amount of hardware resources we can replicate our hardware algorithms and allow these to run in parallel. A number of different approaches have been used to implement this string or pattern matching in hardware, many of which consist of converting the matching operation into a Finite Automata and then providing a hardware implementation for that automata.

EXISTING WORK

There are several commercial products available in the market for packet inspection to perform matching like ClassiPI from PMC-Sierra [12]. It is a classification processor for networks which is used for packet classification and to perform packet scanning on strings and on regular expressions also for content searches on packets and it is implemented as Application specific integrated circuit (ASIC).

An approach was taken by Gokhale et. al. [11] which uses the idea of CAM (Content Addressable memory) In this approach the data in the packets are checked against the contents of a CAM and it produces a match vector if a match is found and then it associates the match vector with the data and forwards it for further processing. In this approach the contents in the CAM can be changed so this approach is dynamically reconfigurable.

An Approach by Cho et. al [8] used deep packet filtering to search for multiple string patterns and also introduced the idea of using ROM. In this approach the incoming data from the network is searched for a string prefix and the matched prefix is used as an address by the ROM to search for remaining part of the string. Their approach also reduced the area cost as they used logic reuse techniques eliminating duplicate patterns and also by sharing the substring comparators.

An approach by Baker and Prasanna [13] also uses the idea of CAM, this approach uses the method of pre-processing of complex patterns to achieve better clock frequency and also to reduce the area size as identical patterns will be put together. Their approach is designed to support large database of patterns using graph based partitioning and min-cut partitioning. They proved that their approach can also be used to perform comparisons on multi-byte input of data. Their methodology also uses an AND function to AND the decoded inputs and produce a "match".

Another interesting approach was used by Attig, Dharmapurikar and Lockwood [14] which uses the idea of using Bloom filters for string matching. A bloom filter when used with FPGA allows to search for large number of strings. In this approach a hash table is also used to check if the match occurred is a false match or an exact match. This approach has a drawback that false positive matches may exist.

FINITE AUTOMATA APPROACHES

One such approach to convert the matching operation to a FPGA based automata was first suggested by Sidhu and Prasanna [7], it involves constructing hardware NFA (Nondeterministic Finite Automata) for a given regular expression. The NFA approach requires less time and save space as compared to constructing a DFA (Deterministic Finite Automata) and allow us to match even complex regular expression without converting it to DFA. This is achieved by dynamic reconfiguration and parallelism.

Another such approach was suggested by Franklin et. al. [9] which uses a similar approach as suggested by Sidhu and Prasanna of constructing a NFA for a given regular expression. However, Franklin et. al. approach extends the Sidhu and Prasanna approach by using large regular expressions and adding some of the metacharacters like "?", ".", and "0" and if there is any change in the rule the design of FPGA needs to be compiled again while Sidhu and Prasanna supports dynamic reconfiguration. The Franklin et. al approach used 8 bit data item and reported the operating frequency of 30MHz-127MHz.

An approach was taken by Moscola et. al [10] to construct a DFA(Deterministic Finite Automata) for a given regular expression, there can be only one active state at a time with DFA so this approach is used for byte by byte matching and DFA is generally considered to be more compact than NFA, the authors proved this by generating state machines for regular expressions from some rules extracted from the spam assassin program and then generating a NFA and DFA for the regular expressions and it was found that the number of states in DFA was less than the number of states in NFA.

IMPLEMENTATION

In this project we will be using FSM (Finite state machine) approach for string matching. We can implement many of the string matching algorithms using the FSM approach but as we will be using hardware to implement the state machine, so it is advisable to use algorithms that process the data sequentially in an order rather than operating on the data in a random order because hardware will perform better if a data stream is supplied to it rather than it accessing random data from the packet. Also, it is suggested to use only one data character per clock cycle as no matter how much data will be operated by the system it will help in maintaining the speed of the system with the network speed. A better approach to implement FSM is to use a table to map data input and current state to next state. The data input and the current state will be in register and the table will be used to map data input and current state to next state. Based on these facts, in this project we will be using KMP algorithm for string matching to generate data for FSM table.

KMP

The KMP algorithm can be easily implemented on FSM by using a prefix function. This prefix function can be represented graphically as a state transition diagram. Here the automata is always in one of its states (numbered circles) and each input character will cause transitions from one state to another - along the labelled edges between states. A simple automata matching the string "abc" is as follows:

0

0

1

2

3

a

b

c

a

a

b,c, z

b, z

c, z

b,c,z

Notes: z is used to represent all characters other than a, b or c

State 3 is the 'final' state, which indicates a match

a

There is a initial state of '0' which is the starting state and then there are three more states each for one character of string "abc". starting from initial state If the input 'a' is given at any state then the next state would be '1' and for all other inputs which causes a mismatch then it goes back to the previous state which is '0'. Similarly, if it is at state '1' and an input 'b' is given then it goes to the next state which is '2' and if it is at state '2' and an input 'c' is given it reaches the 'final' state which will indicate a 'match'. but, KMP algorithm does not always operate on one data character per clock cycle like in case of a mismatch it has to compare more than one input data character For example if the automata is at state '2' so the next input character would be compared to either 'c' or 'a' and in case of a mismatch the next state would be '0' and then input character would be compared to 'a'. But KMP algorithm can be adapted to overcome this problem and process only one data item in one clock cycle by recursively resolving the next state for automata, allowing for multiple failed matches. However, Baker and Prasanna [15] suggested an approach using dual comparators and a buffer which will ensure that only one input data character will be processed in one clock cycle. In this project we will using a table based implementation of the FSM to ensure that only one input data character will be processed in one clock cycle.

One way that we can implement our automata above for a table based FSM would be as a simple 'state transition table'. So for our automata above, this would be as follows:

Next state

State

0

1

2

Input

a

1

1

1

b

0

2

0

c

0

0

3

z

0

0

0

In this project we achieved this by using a java program, so in the program to generate a state transition table for the string "abc", we used a overlap function of java

We build the overlap function in java as follows:

int[] overlap = new int[states+1];

overlap[0] = -1;

for (int i=0; i<search.length(); i++)

{

overlap[i+1] = overlap[i] + 1;

while( overlap[i+1] >0 && search.charAt(i) != search.charAt(overlap[i+1]-1))

{

overlap[i+1] = overlap[overlap[i+1]-1] + 1;

}

}

System.out.println("overlap function: ");

for(int j=0; j<=states; j++)

{

System.out.print(" " + overlap[j]);

}

System.out.println("\n");

Now, to create single character 1 tick FSM we just created an array in java as follows

int [ ] [ ] fsmtable = new int[ALPHABETSIZE][states];

System.out.println("Automata table\n State");

System.out.print(" ");

for(int s=0; s<states; s++)

System.out.print(s + ", ");

System.out.println();

for(int c=0; c<ALPHABETSIZE; c++)

{

boolean nonZero = false;

for(int s=0; s<states; s++)

{

fsmtable[c][s] = next(c, s, states, overlap, search);

if(fsmtable[c][s] != 0)

nonZero = true;

}

if(nonZero)

{

System.out.print((char)c + ":");

for(int s=0; s<states; s++)

{

System.out.print(fsmtable[c][s] + ", ");

}

System.out.println();

}

}

}

Now to recursively resolve the next state for automata, allowing for multiple failed matches we used the following method in java

private int next(int c, int st, int states, int[] overlap, String pattern)

{

if(st == states-1)

{

In special case when we are in the final state so there is no more to match, we used the return method in java

return( next(c, overlap[st], states, overlap, pattern) );

}

if(((char)c) == pattern.charAt(st))

{

If character matches ok

if(st+1 == states)

return(overlap[st+1]);

else

return(st+1);

}

else if(st != 0)

{

Now, if the character doesn't match and we are on in the 0 (idle) state then again the overlap table was used to work out possible matches

return( next(c, overlap[st], states, overlap, pattern) );

}

else

{

return(0);

}

So, we achieved just a two dimensional array used as a lookup table from this java program. So for our automata above, the state transition table we achieved using the overlap function in this java program is as follows:

Also, this project uses the idea of associative memory or "content addressable memory" (CAM). With this type of memory we present the memory with a data item that we are searching for (the search key) and it returns us the address of that item in memory.

Data

Search key

Match

Address

Address out

CAM

If we wish to associate a particular piece of output data with an output address, then we just add an additional output table to store these values, such as below:

Data

Search key

Data out

Address out

CAM

RAM

Address in

Data

out

We can then use this form of CAM to map one data item to another: the input data is looked up in the CAM to give an address which is used as an input to the RAM to select a piece of data. We can use this as an alternative way to store our state transition table, by using the current state and data input as the key input and the next state as the output:

Key input

Current state

Current Input

0

a

1

a

1

b

2

a

2

c

3

a

-

-

The final line shows a 'default' result if no other key entries match.

So, the next step is to work out all the possible combinations of input and current state where a 'next state' occurs from the 'state transition table' to create contents for CAM as given in the diagram below. so, the CAM can map input data and state to next state.

This was also done in this project using a java program. So, in the java program we achieved this by modifying our automata table using an if condition where the next state is not equal to '0'.so we get all the possible combinations for the next state.

if(fsmtable[c][s] != 0) {

System.out.print((char)c + " ");

System.out.println(" "+s+" "+fsmtable[c][s]);

state[count++] = s;

}

And we got desired output as follows from the java program

Now the next step was to create the contents for the state half of the CAM which contains data for the state. For this, first we worked out all possible states based on the results above and then created a state table as given in the diagram below

* represents all other values when there is no match.

So, this was done in the java program by just creating an array for the state and filling it up with the values for state.

int [] state = new int [STATELENGTH];

The next step was to work out all the values for the state in bit pattern to put those values in the CAM and to build the state half of the CAM. For that first of all we calculated the value of state as per the address. For example, for address 1, we first check if state match a value of '1' so for all the states whose value is '1' we get a '1' and for all other we put '0' then we get a bit pattern and we put this bit pattern in address '1' as shown in the diagram below.

The same way we repeated this for all values of the state input and we achieved the content for the state half in a bit pattern.

This was done in the java program using the function repBit in java which converts an integer value into binary value.

private void repBit(int i) {

i = (i & 0x00ff);

for(int bit = 31; bit >= 0; bit--) {

if ((i & (1 << bit)) > 0)

System.out.print("1");

else

System.out.print("0");

}

System.out.println();

}

Similarly, we worked out all the possible values of the input data in bit pattern to build the input half of the CAM as per the diagram shown below. But, for the input values we used ASCII (American Standard Code for Information Interchange) table as the system can only use numbers. So, for input a,b,c we checked if the input match a value of 'a' and if does match we get a '1' otherwise '0' if doesn't match a value of 'a' and similarly we worked out the values for 'b' and 'c' also. So, the values will be as follows as shown in the diagram:

we modified our java program for the input values by using the same function repBit in java to convert the value for input a,b,c into binary and for the ASCII table we used a for loop.

for(char alphabet=0; alphabet<255;alphabet++)

{

if(alphabet == 97)

{

System.out.print(alphabet+" ");

repBit(79);

}

else if(alphabet == 98)

{

System.out.print(alphabet+" ");

repBit(80);

}

else if(alphabet == 99)

{

System.out.print(alphabet+" ");

repBit(96);

}

else

System.out.println(alphabet+" 0000000");

Now the next step is to convert these bit values for the state and input half of the CAM into hexadecimal value. for this we used a function called Integer.toHexstring in java to convert the bits values into hexadecimal value. This was done in java as follows

System.out.print(Integer.toHexString(0x10000 |i).substring(1).toUpperCase()+" ");

HARDWARE IMPLEMENTATION

Now as we have the data for the state and input generated from the state transition table using java program. The next step was to build hardware to put those data in and to map the current state and input data to next state and to signal a 'match' if it reaches the final state.

The design below was used to build our automata in hardware.

This project implemented such a system in VHDL and tested this via simulation. The VHDL design was targeted at a Xilinx Virtex 5 FPGA and the resource use and performance of the design was tested by synthesis using Xilinx design tools.

Here we used a RAMB16_S36_S36 dual port BRAM primitive to hold the values of input data and current state. Then a VHDL 'AND' operator is used on the two input buses. A priority encoder is also used to determine the number of the first input that is set to '1'. Then we just defined a table of constants in VHDL and the synthesis tools generated the required ROM using LUT primitives.

For our hardware component we used dual port BRAM (Block RAM) with RAMB16_S36_S36 primitive. The advantage of using a dual port BRAM is that it consists of two ports A and B which can be used independently which means that it can be used as two separate pieces of memory. We can write the data in one or both the ports and in a similar way data can be read from one or both the ports. RAMB16 consists of 16kb of storage area which means each port of the BRAM has 16384 bits of memory for the data. Each port of BRAM is fully synchronous for the write operation and has individual clock associated with it. The EN handles the write operation for both the ports, so if enable is low no data will be written. Below is the diagram for BRAM with its ports and description of all the ports.

Diagram. Dual Port BRAM

Port Name Description

DI[A|B] Data Input Bus

DIP[A|B] Data Input Parity Bus, can be used for additional data inputs

ADDR[A|B] Address Bus

WE[A|B] Byte-wide Write Enable

EN[A|B] When inactive no data is written to the block RAM and the

output bus remains in its previous state

SSR[A|B] Synchronous Set/Reset for either latch or register modes

CLK[A|B] Clock Input

DO[A|B] Data Output Bus

DOP[A|B] Data Output Parity Bus, can be used for additional data

outputs

RAMB16_S36_S36 primitives of BRAM consists of a 9 bit (8+1) address bus for each port as shown in the diagram below, so in our design we set the top address bit to '0' on one port and '1' on the other to use BRAM as two separate pieces of memory effectively.

Diagram : BRAM ports for RAMB16_S36_S36 primitive

We Initialized the memory content of RAMB16 primitive using INIT_xx attributes and altogether there are 64 INIT_xx attributes (INIT_00 to INIT_3F) which consists of 64 hex values and 256 bit for each INIT_xx i.e. 16384 bits altogether. If no value is given to initialize INIT_xx attribute, it is automatically configured to all 0. We used the values generated in hexadecimal from our state transition table in the java program to configure the INIT_xx attributes.

We also used a ROM (Read-only Memory) in our design in VHDL, the constant ROM has 32 entries, each of which is a std_logic_vector (5 downto 0). Also subtype' BITVECTOR was used which is just an alias for std_logic_vector.  This avoids upsetting the 'type' operator that doesn't like complex types such as std_logic_vector (5 downto 0). ROM was just initialized with 32 entries of 6-bit numbers using Table to output the address of Next state and to signal a match if it reaches the final state.

So, In the hardware VHDL design the input data and the state data generated from the state transition table in java in hexadecimal was stored in the BRAM ports A and B using INIT_xx attributes and it outputs a 32 bit values and the output of BRAM was "AND" together and provided as a input to priority encoder where in the priority encoder we used a 'when' loop to determine the first input that is set to '1'. The priority encoder outputs a 5 bit address which is used to look for a value in the table held in ROM to map the next state based on the current state and input data and if the value 'match' the final state it outputs a '1' bit match signal otherwise it goes back again as an current state input to BRAM.

RESULTS AND EVALUATION

Conversion of string matching operation using KMP string matching algorithm into a finite Automata and then the automata was implemented as a 'state transition table' which was used as a lookup table. This was done in java and compiled and showed the following results.

Then the input data and current state data was worked out from the 'table' based on next state

And then a state table was created to get the values for the state part of the CAM and then was converted into bits pattern and then into hexadecimal values to be used as data to initialize the BRAM for hardware implementation.

Now these hexadecimal values for state data can be used to directly initialize the BRAM with state data using INIT_xx attributes of BRAM by copying these values from the java program and pasting it to the desired INIT_xx attributes.

Similarly, we worked out the values for input data in java, but for the input we used ASCII tables. As per ASCII table, there are 256 characters and character a, b and c is at value 97, 98 and 99 so, for the input data we got following output from the java program.

Now these values can be used directly to initialize the BRAM with the input data by using the INIT_xx attributes. By just copying and pasting the hexadecimal values in the INIT_xx attributes.

The Hardware design was written in VHDL. First the VHDL code was written for each component then a top file was created to connect all the components together. Then the design for implementation was synthesized successfully using Xilinx and showed the following device utilization summary.

 

Device Utilization Summary

Slice Logic Utilization

Used

Available

Utilization

Note(s)

Number of Slice LUTs

30

19,200

1%

 

    Number used as logic

30

19,200

1%

 

        Number using O6 output only

30

 

 

 

Number of occupied Slices

9

4,800

1%

 

    Number of occupied SLICEMs

0

1,280

0%

 

Number of LUT Flip Flop pairs used

30

 

 

 

    Number with an unused Flip Flop

30

30

100%

 

    Number with an unused LUT

0

30

0%

 

    Number of fully used LUT-FF pairs

0

30

0%

 

    Number of slice register sites lost

        to control set restrictions

0

19,200

0%

 

Number of bonded IOBs

10

220

4%

 

Number of BlockRAM/FIFO

1

32

3%

 

    Number using BlockRAM only

1

 

 

 

        Number of 36k BlockRAM used

1

 

 

 

As it can be observed from the device utilization summary that we have just used 30 LUTs out of the total 19,200 available LUTs that means we have just used around less than 1% of the available LUTs and also we have used just 1 BRAM only out of 32 BRAM available on the board. This suggests that one Xilinx Virtex 5 FPGA can support a large number of such string matching operations. The Virtex 5 FPGA component we used is a smaller one but some Virtex 5 FPGA has around 1032 BRAM Blocks. So, we can implement a large number of string searches on an FPGA board.

Now for the simulation, we created a test bench in VHDL to provide the design with a clock input. The simulations results are as follows.

As it can be observed from the simulation that as the clock input goes high 'clk = 1' data input is being generated.

The next step would be to put the actual data from the string matching operation for the input data and state into this design and get the simulation for that. However due to time constraints we were not being able to get the simulation with the actual data.

CONCLUSION

In this report we focused on the string matching techniques in NIDS. As we observed that string matching is the most computationally expensive part of the NIDS and software based NIDS can only support network bandwidth of few hundred Mbps we proposed an idea of using hardware to perform the string matching for NIDS for high speed networks. We also discussed other various approaches by which string or pattern matching has been done in the past for NIDS.

In our project we converted the string matching operation into a Finite Automata. The Finite Automata executed the string matching operation based on KMP algorithm for string matching. Then we implemented our automata as a 'state transition table' which is just a two dimensional array used as a lookup table. Software was written in java to perform the matching operation in Automata and we got the desired data which was used as input data for our hardware design to which is tested via simulation. In our hardware design we used the concept of CAM and we implemented CAM using BRAM, we initialized the BRAM with the input data and state data we generated from our automata table and a table stored in ROM in the VHDL design was used to map the input and state data to next state data and it signals a 'match' if a match occurs. For this hardware design VHDL code were written and the design was tested and simulated. We observed from this test and simulation that adding hardware for string matching increases the speed of the system at less area cost as we used BRAM which can be used as two separate pieces of memory to store data. Also, as the hardware implementation is table based so if there is change in the string we can just update the data in table and we don't have to build the design for hardware implementation again. This is quite important as if the system needs to be update to search for new string it will save time and resource.

FUTURE WORK

Future work in this project is mainly to do design level simulation which is the RTL model simulation and the primitive simulation because to implement this design on a FPGA we need to simulate the design using RTL model and primitive simulation as behavioral simulation doesn't necessarily always work on FPGA and then the main aim would be to build an actual hardware from this VHDL design. In this project we put the data in the VHDL files for look up table before we synthesize it. However for the actual hardware the data in the look up table needs to be updated fast while the system is working. so to avoid synthesizing the design again or to build the design again every time there is an update on the look up table we can just use some external logic to allow us to write our configurable data to the BRAM at boot-time> we can use a multiplexer to boot input to the address ports and using WE (Write Enable) to boot enable to write the data. An example is shown in the diagram below.

Also in this project we just used a simple string 'abc' for the string matching. Future developments can include operating on large 'strings' to perform the matching in hardware.

Another work would be to find a way to reduce the state transition table as it is held in memory (ROM) as it uses a lot of memory resources. One way would be to build a logic design for it but it would be difficult to implement as every time there is change in the string the design would need to be build again for the new string.

One way to improve this project would be to perform parallel string matching using partitioning to improve the search rate for string matching on multi-byte data. As for large input data the automata becomes quite complex. This approach would involve a set of finite automata and each automata would operate on one byte of data from multi-byte data and will search for other parts of the string. Then the results from all the automata are combined to check if the string has been matched or not.

Also it may be useful to explore further techniques to minimize or compress the Automata to reduce the size BRAM used.

Another interesting approach would be to add other approach along with the approach used in this project like using bloom filters. Bloom filters provide cost effective and fast solution for string matching but have a drawback of showing false matches. We can combine these two techniques together in some way so that an exact match can be produced using the technique used in this project and at the same time using bloom filters to achieve low cost and effective string matching solution.