Managing An Intrusion Detection System Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

An Intrusion Detection System (IDS) is a program that analyzes the impact of an execution but also tries to find indications that the computer resources have been misused or not. This paper focuses on some of the challenges in managing an existing intrusion detection systems and ways to improve them which could provide high accuracy in detecting intrusions, low false alarm rate and reduced number of false-positives

1. Introduction

Most of the intrusion detection systems detect signatures of attacks that are known, by searching for attack-specific keywords in the network traffic. Several of these systems suffer from a high false-alarm rates (often 100's of false-alarms in a day) and poor detection of any new attacks. This kind of poor performance can be improved, by using a combination of preferential training and generic keywords. Generic keywords are selected in order to detect attack preparations, the actual break-in, and the actions after any break-in.

Preferential training weights keyword counts to discriminate between the few attack sessions in which keywords are known to occur and the several normal sessions where keywords may occur in other contexts. This kind of approach was used in order to improve the baseline keyword intrusion detection system which is used to detect user-to-root attacks in any Intrusion Detection Evaluation. Study tells that it reduced the false alarm rate by approximately two orders of magnitude (which is roughly one false alarm per day) and increased the detection rate to approximately 80%. The improved keyword system not only detects new but old attacks in this data base as well and has roughly the same computation requirements as with the original baseline system. Both generic keywords and preferential training were required to obtain this large improvement in performance.

Heavy cartel on the internet and world-wide connectivity has increased the potential damage up to great extent that can be inflicted by distant or remote attacks launched over the internet. It is very difficult to prevent such kind of attacks by security policies, firewalls, or any other mechanisms because the system and the application software always contains several unknown weaknesses or bugs, and because of it's complexity, often unforeseen, interactions between the software components and/or the network protocols are often exploited by attackers. Intrusion detection systems are designed to detect intrusions and attacks which inevitably occur despite several security precautions.

The most common approach to intrusion detection is often called "Signature Verification," which detects previously seen, known, attacks by looking for any invariant signatures left by these attacks. These signatures may be found either in any host-based audit records on a victim machine or in any of the stream of network packets sent to and from the victim that are captured by a "sniffer" which stores all of important packets for on-line or for any future examination. The Network Security Monitor (NSM) was an early signature-based, intrusion detection system, which found attacks by searching for particular set of keywords in the network traffic captured using a sniffer.

Early versions of the Network Security Manager (NSM) were the foundation for many of the government and commercial intrusion detection systems including NetRanger and NID. This kind of system is famous since one sniffer can monitor traffic to many workstations and the computation that is required to reconstruct the network sessions and the search for keywords is not excessive. In reality, these systems can have a high false-alarm rate (for example 100's of false alarms in a day) because it is often difficult to select a particular set of keywords by hand which successfully detects actual attacks while avoiding false alarms for normal traffic. Moreover, these signature-based systems must be updated often to detect any new attacks as they are encountered.

If we consider the two research systems that took part in DARPA in 1998 off-line intrusion detection valuation, which provided good performance for user-to-root attacks which where local users on a UNIX host, which illegally obtained root-level privileges. The teams' performance was much better than that of an un-tuned baseline keyword reference system which possessed a detection rate of only 20% at a false-alarm rate of above 100 false-alarms in a day. The intention of the research described in this paper was to find whether the simple baseline keyword system could be modified in order to obtain a similar efficiency improvement and to analyze those factors that contribute to enhanced performance.

Practical methods which are successful in enhancing the baseline system could also be used in order to improve current commercial and government keyword-based systems. Two of the most important factors which were explored in this research were adding new generic keywords to the existing system to identify actions associated with attack components and using preferential neural network training. This work focused on analysis of the network traffic that is obtained using a sniffer, on attacks aimed at UNIX hosts, and on the attacks on which a local user illegally obtained root privileges on a victim machine.

Table showing Attack types in test data














2. Background

This work was made possible by the accessibility of the large-scale realistic intrusion detection database created under DARPA funding in 1998. Over two months of network traffic that was generated and was alike the type of traffic that was observed flowing between U.S. Government websites and the Internet. Traffic and attacks were engendered on a network which simulated thousands of UNIX hosts and 100's of users. Several attacks were launched from outside the simulated site against SunOS, Linux, and Solaris UNIX victim hosts on the inside. Sniffing the data containing all the bytes that are transmitted to and from the simulated site were used for intrusion detection system development and testing the data along with attack labels and timing information. The 7 weeks of training data provided in this database were used for training and the 2 weeks of test data which was used for testing.

Seven kinds of user-to-root attacks shown in the above Table were included in this database. Five of these attacks were considered "old" that were facilitated in training data and two were "new" attacks that were visible only in test data. These types of attacks employ many mechanisms to illegally achieve the root-level privileges including buffer overflows, system misconfigurations, and race conditions. A part of the attacks were made stealthy both by hiding the text of attack Perl, shell, and C source code exploit scripts and by spreading the attack across several telnet sessions. All attacks included the telnet sessions to victim machines. Seven weeks of the training data contained approximately 1,200 telnet sessions and 34 attack instances at the same time two weeks of test data contained approximately 12,900 telnet sessions and 35 attack instances (most of the 12,900 test sessions are characteristics of denial of service and password guessing attacks). Only training data was used for system development.

3. Improving the existing Intrusion Detection System

The data acquired from sniffing the network is first processed to invigorate transcripts containing all the bytes that are sent to and from victim hosts during the telnet sessions. The code characters in each of the telnet sessions that were returned from destination to the source are then dealt to produce counts of the number of times each of the keywords, on a pre-defined list, occur in each of the sessions. The total number of keyword counts across all the keywords is the first output. This keywords count is used as a reference. It is in accordance with the keyword count score produced by the un-tuned keyword baseline reference system which was a part of the DARPA evaluation during 1998. Its score is also congruous to the session warning score generated by several keyword-based intrusion detection systems. In ideal conditions, this count would be atomically related to the probability of the attack in any telnet session. In the enhanced system, keyword counts are further efficiently dealt by the neural networks. Single network weights keyword counts to provide an enhanced estimate of the posterior probability of an attack in each of the telnet sessions and the other network attempts to classify known attacks and there by providing an attack name.

All of the pattern classification experiments were performed by pattern classification software using LNKnet. Early experiments were performed with 58 keywords that were selected to be representative of the keywords that were used in current keyword-based intrusion detection systems during the 1998 DARPA evaluation. These include keywords that were detected to perform suspicious actions (for example "passwd", "shadow", "permission denied", "+ +") and keywords that detected well known attacks (such as "from: |", "login: guest"). This keyword list was then amplified based on a hand examination of attacks in the training data. It has come to know that most of the attacks included attack code downloading, attack preparation, the actual break-in for which a new root shell which was created at certain regular intervals of the time, and actions performed after root-level privilege was obtained. Keyword selection first involved looking for words or word strings that might occur basically during attacks. Keywords were added to detect downloading attack code such as "uudecode", ">ftp get", setup actions for example "chmod", "gcc", a whole new root shell (for instance "root:", "# "), a debug statement that few buffer overflow code prints when a root shell is successfully generated ("Jumping to address"), the clear-text classification of attack exploit source code (for e.g "fdformat", "ffbconfig"), and some of the actions performed after the break-in (such as "linsniff", ".rhost"). Moreover, strings that were added to detect the operating system of the victim to help classify attacks (for example "SunOS UNIX", "Red Hat Linux"). On the whole 31 new keywords were included to the existing 58 old keywords. An effort was made to select generic keywords that would conclude across attacks and also the keywords that would be difficult to hide. For example, the responses from system programs were used as keywords instead of the input strings typed by users. Moreover, the strings that were used by programs such as uuencode to delimit the text transfers which were used as keywords instead of the commands that set up these transfers. Orderly expression string matching was used to dock keywords to the beginning of lines, allowing an arbitrary number of spaces between multiple words, and allowing arbitrary digits in few strings.

Tenfold cross validation experiments were carried out using multi-layer perceptron classifiers and the preferential training data to choose both keywords and the network topology for the detection of neural network. Most of the experiments were carried out using stochastic gradient descent learning, a squared-error cost function, twenty epochs of training, and a step size of 0.01. Keywords were chosen using both weight magnitude pruning for networks having no hidden nodes, and forward-search feature selection. Accurate detection performance and a low false alarm rate was obtained using thirty keywords, a one layer network with no hidden units, and a sigmoid output unit.

The rate of false alarm increased to 35 or more keywords and the detection rate decreased to 25 or fewer keywords. The most important 10 keywords with intensely positive weights that are associated with attacks were all new keywords devised to detect transmission of attack scripts to target machines (such as "cat >", "uudecode < "), a whole new root shell "uid=0(root)", "bash# ", and post-attack actions or components of attack scripts for example "linsniff", "ffbconfig". Some of the keywords selected to detect attacks were given intense negative weights for instance "#", "root", "/etc/motd" which indicates that the previously mentioned keywords are more common in normal sessions when compared to attack sessions. This result shows the importance of preferential training and also testing keywords on normal sessions in order to determine their capability to detect attacks while generating few false alarms. Choosing keywords barely on the basis of attack sessions may usher to excessive false alarms since keywords that seems to be essential to attacks may be usually used in other normal contexts.

Neural network similar to the above mentioned one with no hidden units also provided the best attack classification performance. Categorization performance on training data was not ideal because both of the stealthy versions of the fdformat attack, which involved no indications of the attack kind, were unclassified as eject attacks. Attack and file script names in both of the stealthy attacks had been chosen at random by the attackers and all the attack exploit scripts and attack actions were disguised.

4. Results

The Receiver Operating Characteristic (ROC) curves for a citation system using the total keyword count based upon 58 previously used keywords, for a second citation system using the total keyword count based on 89 previously used and new keywords, and for a neural network system using thirty old and new keywords. The first system is the baseline reference system using the total keyword counts of all previously used keywords. The second is an enhanced baseline system using supplementary generic keywords but simply counting their occurrences. The third is same as the second one, but using preferential neural network training to weight keyword counts

These curves are generated by varying a detection cut-off for each system which determines the level of a system's output score which must exceed for a session to be labeled an attack. The identification rate is the number of attack sessions with scores as mentioned above threshold divided by the number of attacks and changed to a percentage, and the false alarm rate is the total number of normal sessions as mentioned above the threshold divided by the number of days in the training data. The detection threshold is swept over a range that produces a detection rate ranging from zero to 100% to generate each ROC.

The baseline keyword counting system with old keywords requires a high false-alarm rate (greater than 50 false alarms per day) to detect more than 80% of the attacks. Adding new keywords lowers that false-alarm rate of the baseline keyword counting system to roughly 10 false alarms per day to detect 80% of the attacks and using a neural network to weight keyword counts of a smaller set of 30 keywords lowers the false- alarm rate to an acceptable and practical rate of roughly one false alarm per day. These results demonstrate that a dramatic reduction in false-alarm rates can be produced by a combination of adding new keywords and discriminative training. The availability of training data with ground-truth and both normal and attack sessions was essential to permit both selection of new keywords and discriminative training. These results also demonstrate that even well designed keyword based systems can perform poorly if keywords are outdated and primarily specific to individual attacks.

Attack classification performance during testing was measured for the five attack types that were in the training data. Classification was perfect for attacks that could be identified solely by the identity of the victim operating system (loadmodule and perl) and when attack exploit scripts were clearly visible in telnet transcripts. All stealthy instances of ffbconfig and fdformat attacks were, however, misclassified as eject attacks. The performance of the enhanced keyword system with new keywords and discriminant training for old attacks that occurred in the training data and for new, previously unseen, attacks that occurred only in the test data. Low false-alarm rates of roughly one false alarm per day are sufficient to detect 80% of both types of attacks.

This surprising result demonstrates that the keyword-based intrusion detection system can generalize and detect new attacks even when trained only on old attacks. Generalization occurs because this system is designed to detect attack setup actions, actions taken to hide attack exploit script transmission, and actions taken after a successful attack that are common to many attacks. Although the mechanism used to obtain root-level privilege changed between old and new attacks, these characteristics of the attacks were similar for old and new attacks.

The performance of the improved systems with discriminative training and new keywords for attacks that were made stealthy by distributing them across multiple sessions and for normal attacks that were completed in a single telnet session. As can be seen, it is more difficult to detect attacks that are distributed across multiple sessions because each individual session contains fewer keywords. A false-alarm rate of roughly one false alarm per day is sufficient to detect 80% of the attacks that occur completely within one session. This rate must increase to roughly 10 per day to detect 80% of the multi-session attacks.

The performance of the enhanced intrusion detection system for attacks that were build stealthy by disguising the contents of attack exploit shell, Perl, and C source code scripts and for attacks where the exploit scripts were clearly visible in telnet sessions. Even though it is more difficult to identify attacks when attack scripts are disguised, 80% of both types of attacks can be identified at a false alarm rate of approximately one false alarm in a day.

The above results again demonstrate the benefit of including keywords that identify attack setup and post-attack actions instead of depending on attack specific keywords that are visible primarily in attack scripts.

5. Summary and Discussion

Large improvements in performance were obtained for a simple keyword intrusion detection system using a combination of adding new keywords and discriminative training. New keywords were added which detected actions that were common to many attacks and simple neural network discriminative training was used to produce output posterior probabilities which distinguish between telnet sessions with normal actions and with attacks. An improved system had a high detection rate of roughly 80% at a low false alarm rate of roughly one false alarm per day. Improved performance required using training data from the DARPA intrusion detection evaluation data base for both normal sessions and attack sessions. Attack sessions are necessary to derive new keywords and both normal and attack sessions are required to support discriminative training.

The enhanced system that uses new keywords and discriminative training could detect old as well as new attacks not included in the training data, and it could detect stealthy attacks where attack exploit scripts were hidden, and (to a lesser extent) attacks distributed across multiple sessions. Many existing keyword-based intrusion detection systems could be improved by adding similarly selected keywords and by using discriminative training to weight keyword counts. These modifications do not have to increase computation rates because the number of keywords does not necessarily need to be expanded and the keyword score weighting requires very little computation. Future work is planned to evaluate this approach for actual network traffic, to extend keywords to detect additional pre-and post- break-in actions, and to extend the system to integrate information across multiple telnet sessions and network services.