Semantics based web application security

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Research and Development Theme:

The number of cybercrime threats has grown tremendously due to the significant

advancements in World Wide Web (WWW) and its applications. Information security OEM

Symantec reported that during 2008, they observed more than 18 million drive-by download

attacks and more than 23 million misleading application attacks [42]. These crimes are

emerging as major threat for e-business, e-health and other WWW applications on the

Internet. Recent surveys show that about 80% of Web based attacks are being deployed at

the application layer of OSI model and more than 90% of Web applications are vulnerable to

these attacks. Enormous effort has been made to mitigate these attacks through various

security mechanisms in the form of scanners, intrusion detection systems, encryption

devices, and firewalls. However, these measures are unable to mitigate many of the threats

which specifically aim to compromise Web application security. In particular, existing

solutions cannot capture the context of online users' requests (queries) in relation to the Web

applications and underlying protocol. Our research findings show that capturing the context is

an essential requirement to design and implement effective defense mechanisms against

Web application attacks. We propose a semantics-based Web Application Security system

architecture. The proposed system is a novel approach based on a paradigm shift from

existing network security techniques. We use ontologies to specify the context of attacks

through semantics, and define rules to detect the attacks effectively. By using the ontological

representation, we can significantly improve the detection of most important Web application

attacks such as XSS (cross-site scripting), SQL Injection, and Directory Traversals. The

proposed system architecture can be refined and expanded with nominal effort to cater for

more attacks. The ontologies for our proposed system will be developed in Web Ontology

Language (OWL). JENA API will be used for analyzing the users' requests to detect

complex as well as zero-day Web application attacks based upon known attacks methods.

Initial attack detection capability of the system will be carried out by generating attacks using

Paros tool. Final evaluation of the system will be carried out in cooperation with industry

partners in real scenarios on Web servers, and comparison will be made to the existing

network security and web security analysis tools like Mod Security, Secure Sphere, and

Application for ICT-Related Development and Research Grant Page 4

Snort [43, 45, 47]. The implementation of our work will be released as an open-source

product, with the ability to detect and analyze only the most common web application attacks,

including XSS, SQL Injection, and Directory Traversal. The system will be designed to

handle other web application attacks too, but these extensions may be reserved for future or

commercial development of our work. It is important to mention that we have already

designed and implemented the ontology based prototype as a proof of concept for the

detection of some basic Web attacks.

Project Status:

(Please mark )

ƒn New Modification to previous Project

Extension of existing project

Project Duration:

Expected Starting Date: 01/09/2009

Planned Duration in months: 24 months

Executive Summary:

The aim of this project is to provide an effective and open source intrusion detection system

for Web-based attacks. The exponential increase in cyber crimes with the expansion of Web

applications have become the most important security concern for e-business, e-health and

other Web applications on the Internet. Recent survey shows that about 80% of Web based

attacks are being deployed at the application layer of the OSI model and more than 90% of

Web applications are vulnerable to these attacks. Various security mechanisms in the form

of intrusion detection systems, encryption devices, and firewalls have been deployed but

tend to be less effective against the Web-based threats, due to their extremely flexible

nature. In order to mitigate application level attacks the system needs to grasp the context of

the information contents (e.g., web page or script) and able to filter that contents on the basis

of its consequences on the target applications. This proposal introduces new concepts and

an architecture to use semantics for detecting and preventing attacks at the application layer

(specifically, attacks through HTTP). The proposed system will be capable of performing

intrusion detection through the ontological representation of attacks, application protocols

such as HTTP and associated data; furthermore it allows automatic generations of attack

rules. By building the attack model using ontologies, the system will significantly improve

attack detection capability and should be able to detect Web attacks which appear to be

generalized forms of existing attack techniques (i.e., zero day attacks based on existing

methods). We have already developed a prototype ontology model of application layer

attacks for the HTTP protocol. The proof-of-concept prototype uses Description Logic based

Web Ontology Language (OWL) for knowledge representation and it is implemented on top

of the JENA framework. The prototype system is deployed and evaluated as a surrogate

proxy in front of the Web server to detect and protect Web applications from application layer

attacks like Cross Site Scripting (XSS) and SQL injection. System evaluation shows

significantly improved detection capability, as compared with some other existing techniques

and solutions, and provides significant search space reduction, as well as it helps in

Application for ICT-Related Development and Research Grant Page 5

eliminating many problems associated with existing techniques. We are sure that through

this research project we will provide significantly improved ontology based intrusion detection

system that works at the application layer.

Scope, Introduction and Background of the Project

A. Scope of the Project:

Web Application Security is a sub-domain of information security, which deals with securing

web resources in term of confidentiality, integrity and availability of web-based information.

Information security is divided into a number of domains, such as network security,

application security, database security, and operating system security. Application security

can be further sub-divided into the domains of peer-to-peer security (security of

Information/contents which are shared/accessible to everyone else in the peer-to-peer

environment, and vice versa), e-mail security (protection of Electronic Mail) and web

application security. Web application security represents one of the more exposed and

challenging tasks in the present scenario of information security.

Figure 1: Domain of this project

Web application security is the domain of this project which deals with the attacks mounted

on Web resources from the hackers.

According to Web Application Security Consortium, "Web application security covers the

technology layers starting with the web server and follows through to the software created to

run online banks, e-commerce, auctions, webmail, etc. As a general rule, if the application

communicates over http, it is under the scope of web application security" [34].

Initially the proposed IDS would be focusing on the Web applications using only HTTP

protocol. However, the research outcomes will be Web application and protocol independent

and therefore later can be applied to secure web applications in heterogeneous environment.

Application for ICT-Related Development and Research Grant Page 6

B. Introduction:

The Web applications security has become increasingly important in the last decade due to

its massive increase in development, deployment and use of web application technologies

(such as e-business, e-sciences and e-health). A security assessment by the Application

Defense Center, which included more than 250 Web applications from e-commerce, online

banking, enterprise collaboration, and supply chain management sites, concluded that at

least 92% of Web applications are vulnerable to some form of attack [1]. Another survey

found that about 75% of all attacks against Web servers target Web-based applications [2].

Web application attacks especially SQL injection and cross-site scripting are two of the most

common security vulnerabilities that plague web applications nowadays [3]. On April 24,

2008 hundreds of thousands of Web servers were hacked, including several at the United

Nations and in the UK government through exploitation the vulnerabilities of Microsoft IIS [4].

According to the National Vulnerability Database (NVD), a repository for documented

network and software security threats, there are over 18,500 vulnerabilities in the web based

applications which include 2,147 cross-site scripting (XSS), 2,757 buffer overflow and 1,600

SQL injection vulnerabilities [5].

Unchecked input validation is the major source of attacks at web application level. According

to the Open Web Application Security Project (OWASP), four of the top ten vulnerabilities for

Web applications are input validation problems [6]. The vulnerability caused by unchecked

input can lead a hacker to "inject" code to bypass or modify the originally intended

functionality of the program to gain information, privilege escalation or unauthorized access

to a system.

For example, in an XSS attack, it is possible for a hacker to obtain sensitive information (via

cookies) belonging to a user that is accessing a trusted website. Another web-based attack

exploits poorly implemented server-side web applications, and can enable a hacker to get

superuser (root) privileges through accessing executables on the website host such as the

shell, chmod, ps or other UNIX commands.

In an SQL Injection attack, a database query is manipulated through user (client) input. For

example, the following statement containing an SQL query may be used by a web


aa = "SELECT * FROM users WHERE username = " + UserName + " AND password = " + PassWord

The USERNAME and PASSWORD fields are substituted according to user input strings

provided on a form on a legitimate web page; However, suppose that the user entered the

following strings on the website form:

Username: admin

Password: anything' OR 'x' = 'x

The SQL query is expanded to:

SELECT * FROM users WHERE username = 'admin' AND password = 'anything' OR 'x' = 'x'

Since, the expression 'x' = 'x' is always true, this query will return all the users in the USERS

table. Similarly, a hacker can use DELETE, DROP or UNION commands to manipulate and

modify web application server databases through malicious input.

Traditional security solution like web scanners provides the first line of defense against web

attacks and detects the "well-known" security flaws whose signatures are present. Current

network security scanners lack semantic knowledge about web applications and are thus

unable to make intelligent decision upon data leakage or business logic flaws [7], failing to

detect many critical vulnerabilities [8]. Signature based solutions usually maintain a "white

Application for ICT-Related Development and Research Grant Page 7

list" and a "black list" containing signatures of benign inputs and signatures of malicious

attack vectors respectively. These lists need regular updating of signature, and are unable to

detection variations of known attack techniques (hence no zero day attacks can be

detected). Consequently, such solutions may generate many false positive and false

negative alarms [9].

Furthermore, most network security solutions analyze network traffic packets individually,

and do not continually monitor each and every network flow for possible security violations.

So due to the lack of contextual information, these network solutions are ineffective in

mitigating application level attacks, whereas a semantic system can intelligently understand

the application's context, the actual data and contextual nature of attacks. A more effective

system should validate the input syntactically and semantically: Syntax-based validation

provides the size or content restrictions and semantic-based validation may focus on

specific data type, specific format and understanding potentially dangerous and malicious

commands or content with respect to their context and consequences.

Ontology-driven software systems are capable of showing a shared understanding of

structured information about the concepts within a specific domain and also provide the

reasoning and ability to analyze the information automatically. Ontology-driven software

systems can also specify the various semantic relationships among different concepts,

mitigating the interoperability issue and allowing reuse and progressive evolution. The

proposed system maintains the concept of different entities such as protocols, data, attacks,

and describes their relationship in the form of ontologies. This gives important reasoning that

is the basis for an efficient and robust detection system. Specific rules are generated by

capturing the context of the domain and the relationship between the entities.

Unlike traditional systems of IDS we have planned for developing the Web application

Firewall as a reverse proxy, keeping in view the positive aspects of web application firewall.

In order to comply with PCI 6.6 (important clause introduce by the Payment Card

Industry(PCI), a Data Security Standard)[52], that organizations have to fulfill one of the two

requirements before December 2009 otherwise monthly fines may ranging from $5,000 to

$100,000 for missed deadlines. These are as follow:

  • All the custom application code must be reviewed for common vulnerabilities by an

organization that specializes in application security.

  • Deploying an application layer firewall in front of Web Applications.

Web Application Firewall has been selected in our case instead of going for code review due

to the following points:

• It entails less cost

• More flexible solution

• Less resource utilization

• Time consuming

We will provide some detail for each aforementioned reason below:

It entails less cost:

According to Jeremiah Grossman the CTO of White Hat security, says that an annual

average cost is approximately 40,000 USD in consulting fees for each small to medium sized

Web application. Similarly, according to Robert Begg, CEO of Digital defense, each line of

source code would approximately cost five USD. So cost associated with code review is

huge and in large projects where line of code extends to millions line of code, this entails a

sufficient amount of the organizations budget.

Application for ICT-Related Development and Research Grant Page 8

More flexible solution:

• Deploying an application layer firewall gives you a single point of control so that you

can specify what content is allowed to users.

• The outer world clients are not aware of the names of the content or actual web

servers which allow you to easily change content servers or to make host name


• It can be installed without having any impact to existing infrastructure.

Less resource utilization:

As code review indulge your project into an endless and costly, find, fix and test loop that ties

your overall organizational resources. At the other side, application layer firewall gives you

the protection from the most common attacks to all the application servers deployed in the

organizational internal network. Moreover, it does not engage organizational resources and

allow them to perform more productive tasks.

Time consuming:

PCI 6.6 requirement demands organizations to get their source code reviewed, line by line.

This code review again is very time consuming especially for very large projects. This activity

more often result in crossing project deadlines.

As we are developing the Web Application Firewall (WAF) that complies with Payment Card

Industry Data Security Standards (PCI DSS) Requirement 6.6. The scope of this standard

includes a wide range of requirements, thus requiring great deal of resources both in terms

of manpower and finances as the project scope is reasonably very large. Team working on

the project shall put in its best with limited resources to come up with a state of the art

application within the given span of time to capture the Web application security market well

in time. Following clauses introduced in PCI DSS may facilitate in understanding the scope

of the project, need of WAF and its timely delivery to the market even further:

For a WAF to be compatible with the PCI 6.6 it must:

• React appropriately to OWASP Top Ten vulnerabilities

• Inspect web application input and respond to them based on active policy or rules,

and preserve the action taken.

• Prevent the data leakage that is it should inspect web application output and respond

to it based on the active policy or rules, and preserve the action taken.

• Enforce both positive and negative security models. The positive model defines

acceptable whereas negative model defines what is NOT allowed.

• Inspect both web page content, such as Hypertext Markup Language (HTML),

Dynamic HTML (DHTML), and Cascading Style Sheets (CSS), and the underlying

protocols that deliver content, such as Hypertext Transport Protocol (HTTP) and

Hypertext Transport Protocol over SSL (HTTPS). (In addition to

• SSL, HTTPS includes Hypertext Transport Protocol over TLS.)

• Inspect web services messages, if web services are exposed to the public Internet.

Typically this would include Simple Object Access Protocol (SOAP) and extensible

Markup Language (XML), both document and RPC-oriented models, in addition to

Application for ICT-Related Development and Research Grant Page 9


• Inspect protocols (proprietary or standardized) or data construct (proprietary or

standardized) that is used to transmit data to or from a web application, when such

protocols or data is not otherwise inspected at another point in the message flow.

Note: Proprietary protocols present challenges to current application firewall

products, and customized changes may be required. If an application's messages do

not follow standard protocols and data constructs, it may not be reasonable to ask

that an application firewall inspect that specific message flow. In these cases,

implementing the code review/vulnerability assessment option of Requirement 6.6 is

probably the better choice.

• Defend against threats that target the WAF itself.

• Support SSL and/or TLS termination, or be positioned such that encrypted

transmissions are decrypted before being inspected by the WAF. Encrypted data

streams cannot be inspected unless SSL is terminated ahead of the inspection


• Prevent and/or detect encrypting session cookies, hidden form fields or other data

elements used for session state maintenance.

• Automatically receive and apply dynamic signature updates from a vendor or other

source. In the absence of this capability, there should be procedures in place to

ensure frequent update of WAF signatures or other configuration settings.

• Fail open (a device that has failed allows traffic to pass through uninspected) or fail

closed (a device that has failed blocks all traffic), depending on active policy. Note:

Allowing a WAF to fail open must be carefully evaluated as to the risk of exposing

unprotected web application(s) to the public Internet. A bypass mode, in which

absolutely no modification is made to the traffic passing through it, may be applicable

in some circumstances. (Even in "fail open" mode, some WAFs add tracking headers,

clean up HTML that they consider to violate standards, or perform other actions. This

can negatively impact troubleshooting efforts.)

• In certain environments, the WAF should support Secure Sockets Layer (SSL) client

certificates and providing the client authentication via certificates.

• Some ecommerce applications may require FIPS (Federal Information Processing

Standards) hardware key store support. If this is a consideration in your environment,

make sure that the WAF vendor supports this requirement in one of their systems

and be aware that this feature may drastically increase the cost of the solution.

Anatomy of a Web Application

The web is a worldwide network providing a highly interactive environment for electronic

communication to billions of users globally through a diverse range of applications. A Web

application is a computer program providing services to website visitors for submission and

retrieval of information over the Internet. The web services are accessed and interacted with

via Web browsers.

The major components of web applications include code that resides on the Web servers,

application servers, databases, and backend systems of an organization. The simple model

Application for ICT-Related Development and Research Grant Page 10

of web application is shown in Figure 2 [50].

Figure 2: A Simple Web Application Model

Web browsers interact with the web application by sending requests to the web application

server through the HTTP protocol. The Web application server manipulates the request and

processes the request in shape of query to a database to retrieve the required information,

which is subsequently sent as a response to the web browser for the end user.

The traffic generated on WWW mostly relies on the HTTP protocol for communication. The

HTTP requests mainly consist of two parts: message header and message body. The

message header contains the general information like client software name, referrer,

executing script path, while the body is made up of name pair values of the controls on the

submitted form. Due to the stateless nature of HTTP protocol, the server cannot distinguish

between two users. The server distinguishes between two users through a session ID. This

session ID is valid only for a given time slot.

Figure 3 explains the mechanisms used by Web application for the protection of sensitive

information flowing from user browser to database. The user is authenticated by a username

and password. This information travels trough network from the browser to the server (in

encrypted form if HTTPS/SSL is used). A firewall filters undesirable network traffic, and the

Web server validates the input. In case the data has any semantic or syntactical errors (e.g.,

invalid format or invalid result of type checking), the server raises an exception. The

application server performs the auditing and logging activity and keeps the records of all

transactions. Finally, sensitive information received by the server may be stored in encrypted


Semantics based Web Application Security:

Concept, Design and Implementation

School of Electrical Engineering and Computer Sciences A center of excellence for quality education and research

Application for ICT-Related Development and Research Grant Page I

List of Abbreviations and Acronyms

EE External Evaluators

ICT Information and Communication Technologies

IPR Intellectual Property Rights

JPD Joint Project Director (Co-Director)

PD Project Director

PI Principal Investigator (Organization)

"Principal Investigator" means the person, company, partnership,

undertaking, concern, association of persons, body of individuals,

consortium or joint venture which receives funding from the

Company to execute a research and development project."