This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The activities of businesses, military and governments rely increasingly on web technologies and applications. The ease of implementation and use of these technologies has made them an essential component of online commercial sites, intranet and extranet applications, as well as the internet services offered by companies. Today, new applications are almost systematically developed with web technologies. Together with that, hackers have found more subtle ways to attack web applications. According to international statistics, SQL Injection and Cross Site Scripting are most popular vulnerabilities of web applications. The consequences of this type of attacks are quite dangerous, such as sensitive information could be stolen or authentication systems might be by-passed. To mitigate the situation, several techniques have been adopted. In this paper, a security solution is proposed using Artificial Neural Network to protect web applications against this type of attacks.
Keywords: Artificial Neural Networks ANN, SQL Injection, Cross-Site Scripting, Datasets, Web Application Firewall.
"Google says it suspects Microsoft is doing this by using Internet Explorer 8 and the Bing toolbar, both of which send user data to Microsoft, to watch how people use Google."
"MySQL.com was hacked using blind SQL injection by two hackers that go by "TinKode" and "Ne0h". MySQL's parent company Sun/Oracle has also been attacked by the same hackers. Both tables and emails were dumped from their databases, but no passwords."
"The php.net team announced that the server of the php.net developer wiki has been hacked by unidentified attackers who stole account credentials."
"Dr Mallya's website www.mallyainparliament.com has been hacked and the Pakistani flag has been placed with a dire message from an organization known as the Pakistan Cyber Army."
Above examples show that most power full applications are also attacked by hackers.
To provide maximum security for web applications, there are specific solutions should be implemented. One of these solutions is Web Application Firewalls (WAF). Most WAFs are based on filtering incoming user requests against a set of predefined rules and signatures. The ability of pattern matching is mainly achieved using regular expressions, such as in ModSecurity the most famous WAF . However, with the rapid development of web applications, the number of threats and defined attacks signatures is dramatically increasing. Accordingly, traditional pattern matching techniques (particularly regular expressions) are not effective anymore. There is an urgent need to adopt a new pattern matching technique that tackles the requirements of the current stage of security measures.
This paper will go through the concept of artificial neural networks and how to apply it in a form of a web application firewall. To focus on proving the concept of utilizing ANN in a web application security framework. Moreover, it has to be able to deal with the dynamic nature of web application attacks and signatures, including its complicated patterns, such as SQL injection signatures with all possible evasion techniques. More importantly, the time of filtering incoming requests should not affect the performance of the web application.
The Open Web Application Security Project (OWASP) Top 10 Web Application Security Risks for 2010 are :
A2: Cross-Site Scripting (XSS)
A3: Broken Authentication and Session Management
A4: Insecure Direct Object References
A5: Cross-Site Request Forgery (CSRF)
A6: Security Misconfiguration
A7: Insecure Cryptographic Storage
A8: Failure to Restrict URL Access
A9: Insufficient Transport Layer Protection
A10: Unvalidated Redirects and Forwards
Many of above attacks are avoided by effective configuration of server and php.ini file.
Introduction to SQL Injection
In the OWASP Top Ten 2007 web application vulnerabilities , Injection Flaw was ranked the second most prevalent vulnerability. The priority has jumped to number one most critical vulnerability in OWASP Top Ten 2010 release . This reflects the seriousness of this type of vulnerabilities. In the Injection Flaw family, SQL Injection is particularly popular and can cause various consequences in compromising web applications. Basically, SQL injection attacks occur when web applications directly use user's inputs to build an SQL query to access the backend database without a proper validation on the inputs ,. To perform SQL injection, hackers can use different techniques. These techniques can be classified into five main categories as will be explained below , .
In this type, the attackers inject some SQL token into the user input and cause the selection clause of an SQL query to be true all the time:
Select * from users where username='admin' or 1=1 --' and password='';
In this type, the attackers inject a UNION query into the SQL query to get more data:
Select bookTitle, ISBN from books where bookID = 1 UNION Select "hack", balance from accounts where accNo = 3456 --;
In this type, the attackers inject additional statements to execute for hacking purpose:
Select * from users where username=''; drop table accounts -- and password=''
This type is based on the error message returned from the web server to find more information about the database:
Select * from books where bookID =
convert (int,(select top 1 name from sysobjects where xtype ='u'));
This type of attack often is based on different response-time of the web server to discover other information about the database:
Select * from users where username=
'hello1'; select if( user() like 'root@%', benchmark(1000000,sha1('test')), 'false' ); --' and password=''
This technique is used to by-pass the defending scheme that escapes special characters (such as quotes, dashes, etc.) or some keywords:
Select * from books where bookID = 1 ; exec(char(0x730065006c00650063007400200040004000760065007200730069006f006e00);
This runs sp_msdropretry [foo drop table logs select * from sysobjects], [bar].
The various techniques of SQL injection listed above are used by hackers to achieve different purposes: bypassing a login system, modifying a table in a database (using some SQL queries, such as insert, delete, update, etc), shutting down SQL Server, getting database information from the returned error message or inference, or executing stored procedures. Moreover, this can lead to further damages. For instance, after getting the login credentials of the administrator/root of a website through updating the database or abstracting valuable information from the error message, the hacker can login with the administrator privilege and perform sensitive actions. The next section will shed the light on advanced techniques used by hackers to bypass traditional security defense systems.
Cross-site scripting (XSS)
Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications that enables malicious attackers to inject client-side script into web pages viewed by other users. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Cross-site scripting carried out on websites were roughly 80% of all security vulnerabilities documented by Symantec as of 2007. Their impact may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site, and the nature of any security mitigations implemented by the site's owner,.some examples are here
Cross-Site Scripting (XSS) attacks occur when:
Data enters a Web application through an un trusted source, most frequently a web request. The data is included in dynamic content that is sent to a web user without being validated for malicious code.
XSS attacks can generally be categorized into two categories: stored and reflected. There is a third, much less well known type of XSS attack called DOM Based XSS that is discussed separately here.
Stored XSS Attacks
Stored attacks are those where the injected code is permanently stored on the target servers, such as in a database, in a message forum, visitor log, comment field, etc. The victim then retrieves the malicious script from the server when it requests the stored information.
Reflected XSS Attacks
Reflected attacks are those where the injected code is reflected off the web server, such as in an error message, search result, or any other response that includes some or all of the input sent to the server as part of the request. Reflected attacks are delivered to victims via another route, such as in an e-mail message, or on some other web server. When a user is tricked into clicking on a malicious link or submitting a specially crafted form, the injected code travels to the vulnerable web server, which reflects the attack back to the user's browser. The browser then executes the code because it came from a "trusted" server.
XSS Attack Consequences
The consequence of an XSS attack is the same regardless of whether it is stored or reflected (or DOM Based). The difference is in how the payload arrives at the server. Do not be fooled into thinking that a "read only" or "brochure ware" site is not vulnerable to serious reflected XSS attacks. XSS can cause a variety of problems for the end user that range in severity from an annoyance to complete account compromise. The most severe XSS attacks involve disclosure of the user's session cookie, allowing an attacker to hijack the user's session and take over the account. Other damaging attacks include the disclosure of end user files, installation of Trojan horse programs, redirect the user to some other page or site, or modify presentation of content. An XSS vulnerability allowing an attacker to modify a press release or news item could affect a company's stock price or lessen consumer confidence. XSS vulnerability on a pharmaceutical site could allow an attacker to modify dosage information resulting in an overdose.
As the number of web application security vulnerabilities and incidents increases day by day, there have been some solutions to mitigate the situation. These solutions are often in a form of a Web Application Scanner (WAS) and a Web Application Firewall (WAF). A WAS is computer software that search for web applications' vulnerabilities before these web applications are published online . Since a WAS is not meant to work as a real-time filtering mechanism for incoming traffic, it does not affect the performance of the web applications. However, a WAS cannot protect the web applications on the fly and requires modification on the code of the web applications which is often laborious and tedious. Besides, if the source of the web applications is not accessible when the test is running after publishing the web application, then the detected vulnerabilities might not be mitigated. To protect web applications on the fly, web application firewalls are used. A web application firewall (WAF)  is an appliance, server plug-in, or filter that applies a set of rules to an HTTP conversation. Generally, these rules cover common attacks such as Cross-site Scripting (XSS) and SQL Injection. By customizing the rules to your application, many attacks can be identified and blocked. The effort to perform this customization can be significant and needs to be maintained as the application is modified.
Positive vs. Negative Security models of WAF
The two approaches fig 1 to security most often mentioned in the context of application security positive and negative are diametrically opposed in all of their characteristic behaviors, but they are structured very similarly. Both positive and negative security approaches ,  operate according to an established set of rules. Access Control Lists (ACL's) and signatures are two implementation examples of positive and negative security rules, respectively. Positive security moves away from "blocked," end of the spectrum, following an "allow only what I know" methodology. Every rule added to a positive security model increases what is classified as known behavior, and thus allowed, and decreases what is blocked, or what is unknown. Therefore, a positive security model with nothing defined should block everything and relax (i.e., allow broader access) as the acceptable content contexts are defined. At the opposite end of the spectrum, negative security moves towards "blocked what I know is bad," meaning it denies access based on what has previously identified as content to be blocked, running opposite to the known/allowed positive model. Every rule added to the negative security policy increases the blocking behavior, thereby decreasing what is both unknown and allowed as the policy is tightened. Therefore, a negative security policy with nothing defined would grant access to everything, and be tightened as exploits are discovered. Although negative security does retain some aspect of known data, negative security knowledge comes from a list of very specific repositories of matching patterns. As data is passed through a negative security policy, it is evaluated against individual known "bad" patterns. If a known pattern is matched, the data is rejected; if the data flowing through the policy is unidentifiable, it is allowed to pass. Negative security policies do not take into account how the application works, they only notice what accesses the application and if that access violates any negative security patterns.
Web Application Security
Positive Model Negative Model
What is Allowed
What is Denied
Fig. 1 Models of WAF
Artificial Neural Networks
An Artificial Neural Network (ANN) is a massively parallel distributed processor consists of a set of neurons interconnected to each other . Like a human brain, an ANN has the ability to learn through a training process to obtain knowledge and makes that knowledge available for later use.
The basic component of an ANN is the neuron. Each neuron has three important components (fig. 2): a set of synaptic connections (which are represented by a set of synaptic weights and bias); a propagation function (Î£) which is a linear combination between the input elements modified by the set of synaptic weights and bias; and an activation function (Ï†) which takes the output of the propagation function as its input and generates the output of the neuron. It is the set of synaptic weights and bias that stores the knowledge acquired during the learning phase.
Fig. 2 Model of A Neuron 
The way neurons are connected to one another will define the architecture of an ANN. In this research, Multilayer Feed forward Networks (MLN) was used. The architecture of MLNs is demonstrated in fig. 3.
With the learning ability, ANN can be trained to perform different engineering tasks. Some of the tasks that can be identified are: pattern recognition, pattern association, function approximation, control systems, filtering, and beam forming . Among these different learning tasks, pattern recognition is the one of interest in this research. Pattern recognition is a process in which a pattern or input is assigned to one of a predefined category or a class. There are some algorithms that can be used to train an ANN for a pattern recognition task, such as: Back Propagation, Radial-basis Function, and Support Vector Learning, etc. Among them, Back Propagation is the algorithm that is specifically devised to train a multilayer perceptron. The algorithm has been implemented in Matlab , which is a popular tool to train ANNs. MLNs trained with Back Propagation have been used in different fields such as Intrusion Detection Systems  and Image Processing . The classification accuracies in all these applications are higher than 90%, especially the application of MLN in an intrusion detection approach has an accuracy of 99.25% .
Fig.3 A Multilayer Feedforward Network (MLN) 
The success of ANN in intrusion detection systems has motivates this papre to investigate a new solution for the challenging limitations of Web Application Firewalls (WAF). More importantly, with ANNs the answer for a scalable solution for WAFs can be found based on some of the instinct features of the ANNs : The ability to learn and store the empirical knowledge; the nonlinearity of the ANN; the ability to generalize the solutions; the ability to adapt when the context changes; the computational performance; and the massively parallel structure of the ANN.
In this proposed system we design an Artificial Neural Network with the tool MatLab fig 4. Artificial Neural Network configuration should be like below table.1
Adaption Learning Function
Number Of Layers
Number of Input Neurons
Total Number of Neurons
Table 1 Settings of Artificial Neural Network in MatLab
Fig.4 Snapshot of Designed Artificial Neural Network in MatLab
The Artificial Neural Network is trained with number of attacking keywords of web applications. Now this ANN system act as a web application firewall and it can filter the attacking keywords from data which is entered by user in web forms. The below fig.5 shows the basic model our proposed system.
Securely Configured Server which can protect from many simple attacks
Artificial Neural Network Component
Fig .5 Model of Proposed System
To implement this system we use XAMPP Package in Windows 7 operating system. XAMPP contain the following open source tools as bundle table 2.
Dynamic Programming Language
Table 2 Main tools installed in Server System
Server Side Settings to Avoid some simple attacks
The Server is configured in such a way that it can avoid many of small attacks, many of these are PHP configuration settings . The Settings will be shown below.
register_globals set to off
safe_mode set to off
display_errors set to off
Disable these functions : system(), exec(), passthru(), shell_exec(), proc_open(),popen().
Disable permissions like delete, drop to user on mysql.
open_basedir set for /tmp and /htdocs or /www
expose_php set to off
allow_url_fopen set to off
allow_url_include set to off
magic_quotes_gpc set to on
Make user entered form data into suitable format which data can be used as input to Artificial Neural Network. Data preprocessing basic functions are
Remove C-like comment
Remove the commented stings in the user entered data.
UNI/*anything */ON/* anything */ SE/* anything */LE/* anything*/CT
Remove string concatenation
Remove the concatenation symbols in user entered data.
EXEC ("IN" + "SERT" + " IN" + "TOâ€¦"â€¦)"
EXEC ("INSERT INTO ..")
Divide the sting into words and convert those words into Decimal format because decimal format is more suitable for Artificial Neural Network input.
Ex: "SELECT" is Converted to "83 69 76 69 67 84"
Artificial Neural Network Component
The Artificial Neural Network is trained in such way that it can filter the attacking keywords from user entered data. The Artificial Neural Network is trained with dataset contain different keywords the below table 4 show the some sample keywords ,, of data set.
Other Restricted words
Table 4 Sample training data set
If Artificial Neural Network find attacking keywords  then without rejecting the data, Decision making component do modifications to data with alternative words  which is safe to store in database. Modification of data is done in the following way table 5
Table 5 sample mapping table of restricted words and Alternative Words
The below diagram fig 6 shows the basic flow of data
User enter Data As:
"SELECT name FROM STUDENT";
OutPut : 0
Convert Decimal to String
Using mapping table replace restricted word with Alternative word
83 69 76 69 67 84 to SELECT
SELECT to CHOOSE
Artificial Neural Network Compont
For Input: 83 69 76 69 67 84
\" 34 92
SELECT 83 69 76 69 67 84
Due to magic_quotes_gpc set to on data transformed into
\"SELECT name FROM STUDENT\";
Execute the command with user entered data example storing in database etc.
Fig. 6 basic flow of data
In this approach, the experiments also give promising results on the training data set with some configuration gives 100% accuracy on both training data and validation data.
Artificial Neural Network can distinguishing between normal and malicious content based on the training data. This approach gives promising result of both the accuracy and the processing time. The Artificial Neural Network have ability to detect new bad patterns even those are not trained. The key point is that the ANN can be re-trained overtime to incorporate more "knowledge" into the ANN.
The solution also has some limitations. The quality of a trained ANN often depends on its architecture and the way the ANN is trained. More importantly, the quality of the trained ANN also depends on the quality of the training data used and the features that are extracted from the data.
The solution can be extended to include user inputs from any possible HTTP request (not just in the request line the request body), such as headers to have more control over session handling. Also, more protocols can be considered other than HTTP to generalize the solution, such as accepting input from Web Services protocols, like SOAP and Develop a technology which can filter the images uploaded by user.