Method Against Non Control Data Attacks Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Non-control-data attack is a new attack method which does not alter the target program's control data. Attackers corrupt a variety of application data including user identity data, configuration data, user input data, and decision-making data. Existing defense methods are mostly based on static analysis which need source code. This paper proposed an improved pointer taint analysis method based on dynamic taint analysis. Attacks are detected when an invalid pointer is dereferenced and this pointer is made up of external data. We implemented a tool based on dynamic binary instrumentation framework Pin, hence works on commodity software. Experiments show our method can detect control-data attacks and most of non-control-data attacks.


With the rapid development of computer and Internet, various types of computer viruses, worms, malicious programs have become big threats to computer systems. The most known computer worms, Morris Worm [31], Code Red [23] and SQL Slammer spread over the Internet rapidly and bring huge economic loss to computer users worldwide. Malicious attackers often break into system by exploiting security vulnerabilities resulting from memory corruption errors, e.g., buffer overflow, double free, integer overflow, and format string vulnerability.

Most memory corruption attacks usually overwriting a value in memory which is subsequently loaded in processor's program counter. Then attackers can alter control flow and execute shellcode that are injected by them, or some existing code, such as libc on UNIX style systems. Control data includes return address, function pointer and jump targets. These attacks are called control-data attacks since they manipulate data which affect a program's flow of control. Many defensive techniques have been proposed against control-data attacks, such as canary check, nonexecutable stack, and address space layout randomization.

Since successful exploitation of control-data attacks is becoming harder than before, many attackers change their focus into some new techniques. In this paper, we analyze a new type of attacks: non-control-data attacks. These attacks corrupt user input, user identity data, configuration data or decision-making data to conduct attacks instead of control data. Attackers should familiar with program's data area and require application-specific knowledge. Although non-control-data attacks are hard to construct, they are still realistic threats to computer systems because existing defense methods cannot detect these attacks.

Dynamic taint analysis (DTA) has drawn widespread attention in systems security research recently. It has an ability to monitor a program during an execution. Its main idea is to mark external data with taint tag, track data's propagation and mark all data derived from tainted values as tainted data. Finally, it checks whether tainted data are used in a wrong way. Based on this technique, we propose pointer taint analysis, a variant of dynamic taint analysis, to defense non-control-data attacks. We use taint tag and pointer tag to mark memory data. These two tags are propagated during program's execution. Attacks are detected when an invalid pointer is dereferenced and this pointer is made up of external data. We implement a tool based on dynamic binary instrumentation framework Pin, hence works on commodity software.

Related Work

Non-control-data attacks VS Control-data attacks

Control-data attacks usually exploit buffer overflow or other vulnerabilities to alter data such as return address, function pointer. This data will be loaded into program counter and used as memory address of the next instruction. Fig.1(a) shows an example of control-data attacks: a server reads a request from network and subsequently calls the corresponding handler. Attackers write data to reqbuf, overruns the buffer's boundary, overwrites adjacent memory-function pointer and thus call a different function.

Non-control-data attacks also take advantage of vulnerabilities to achieve their goals. However, they corrupt data values that are not directly related to the program's control flow. An example of an attack against non-control data is shown in Fig. 1(b): The server stores the address of host name in the variable name. By overflowing cl_name, an attacker can change name pointer. Reply string from server will contain client's string and a memory region specified by attackers. This can cause information leaks of sensitive data.

Chen et al. [] is the first one to demonstrate that non-control-data attacks, just like control-data attacks, are a serious threat against many real application. He pointed out that attackers can successfully breach computers by overwriting many different types of critical data. Experiments show many widely-used server programs, e.g., Wu-ftpd , Null HTTPD, all have vulnerabilities to conduct such attacks.

Existing Non-control-data attacks Solutions

There are some existing defense methods against such attacks: Static methods add protection structure and need to recompile. These methods need source code. Dynamic methods are generally based on dynamic taint analysis and track the spread of the data in the application. However, attack points vary in many locations, which make ​​a new challenge for dynamic defense methods.

YARRA is a conservative extension to C. YARRA programmers introduce special type declarations and ascribe the special types to their critical data structures. YARRA guarantees that these critical data is only written through pointers with the given static type. YARRA's semantics are proved soundness and strong enough. A prototype of a compiler and runtime system for YARRA is implemented.

ValueGuard is based on the concepts introduced by StackGuard and extends them to cover all variables instead of only return address. ValueGuard framework rewrites the source code of the application and inserts canary values in front of all variables. When these variables are used, it verifies the integrity of a variable's canary. If the canary has been changed, it is a sign of a non-control data attack.

struct req {

char reqbuf[64];

void (*handler)(char *);


void do_req(int fd, struct req *r) {

//now the overflow

read(fd, r-reqbuf, 128);



(a) control-data attack

void serve (int fd) {

char *name = MyHost;

char cl_name[64];

char svr_reply[1024];

//now the overflow

read(fd, cl_name, 128);

sprint(svr_reply,"hello %s,

I am %s",cl_name, name);

svr_send(fd, svr_reply, 1024);


(b) non-control-data attack

Figure 1. Non-control-data attacks VS Control-data attacks

Data Space Randomization is to randomize the representation of different data objects. One way to modify data representation is to xor each data object in memory with a unique random mask ("encryption"), and to unmask it before its use ("decryption"). Transformation approach for DSR is based on a source-to-source transformation of C programs.

Write Integrity Testing (WIT) combines static points-to analysis with runtime instrumentation. It computes CFG (control-flow graph) and the set of objects that can be written by each instruction at compile time. Then it assigns a color to each object in memory and to each write instruction in the program, so that the color of memory always matches the color of an instruction writing it.

Edward uses a security policy based on bounds-check recognition (BR) in order to detect non-control-data attacks and other buffer overflows. Under this scheme, tainted information must receive a bounds check before it can be safely dereferenced as a pointer. However, some papers have criticized this technique suffers from many false positives and false negatives.

Katsunuma[]has proposed a new approach for preventing buffer overflows using pointer injection (PI) policy. PI enforces that untrusted information should never directly supply a pointer value. Instead, tainted information must always be combined with a legitimate pointer from the application before it can be dereferenced. This method relies on implementation in hardware.

Pointer Taint Analysis

Basic dynamic taint analysis is effective against control-data attacks, but it leaves non-control-data attacks undetected. Studies show that the nature of control-data attacks and most of non-control-data attacks vary, but they all rely on dereferencing a pointer. They all write data to an address or read from an address that is altered by attackers. To summarize, these attacks all involve an insecure pointer dereference and this pointer is manipulated by attackers. For instance, format string attacks often follow this pattern: the format function interprets the format string inputs and then shows the content of a memory address which is carefully constructed. HTTP server-GHTTPD and WU-FTPD attack cases mentioned in Chen et al. [9] also use an attacker-provided value as a pointer and dereference it.

We propose pointer taint analysis according to this attack signature. It is an extended version of basic dynamic taint analysis, so it also has three components: taint introduction, taint propagation, taint sink. DTA marks data that come from external source with a taint tag, e.g., network, file system or user input. Taint propagation propagates taint property through the system to all data derived from tainted values. Taint sink is a location in the code where users want to perform some checks.

Here we define two important concepts.

Definition 1 Taint Tag (T-tag): a property of memory data. It indicates whether this data is tainted. True value means it is a tainted value and false value means it is a clean value.

Definition 2 Pointer Tag (P-tag): a property of memory data. It indicates whether this data is a legitimate pointer. True value means it is a legitimate pointer and false value means it is not a legitimate pointer.


We designed Dynamic Pointer Taint Analysis Tool (DPTA) with the Pin DBI framework. Pin is a dynamic binary instrumentation framework and supports Linux and Windows executable for IA-32, Intel(R) 64, and IA-64 architectures. Pin provides a rich API that deveploers can create some powerful tools. Pintools can be thought of as plugins that can modify the code generation process inside Pin.

DPTA is a pintool which can be used for basic dynamic taint analysis and mainly focus on pointer taint analysis. DPTA monitors system calls and mark data values with T-tag and P-tag. Tags are kept in separate memory-tagmap. DPTA use Pin's instrumentation API to inspect every instruction before it is translated by the JIT compiler and insert analysis routines. Tagmap is updated when T-tag and P-tag are propagated through the system. When the memory is accessed through jump/call/ret or dereference instructions, DPTA checks whether an attack exists or not.










mov ebx, 0x0a

mov eax, [esp+0x10]

call eax



Code cache



Add analysis code

Function calls

System calls(I/O)

User space

Kernal space

Figure 2. DPTA system overview

Taint Introduction

It is the same with basic dynamic taint analysis for T-tag. We mark all data that comes from a suspect source with a taint tag. The suspect source includes network, file and user input. We can also define a new suspect source. Meanwhile, other data is marked untainted.

For P-tag, we should identify whether data values are legitimate pointers. P-tag must only be initialized for root pointer assignments, where a pointer is set to a valid memory address that is not derived from another pointer. Root pointer assignments, are divided into two categories: static root pointer assignments and dynamic root pointer assignments. Only root pointers and pointers derived from root pointer can be viewed as legitimate pointers.

DPTA stores data tags in a tagmap, which contains a shadow memory for holding two properties of data stored in memory. We tag data in byte level since a byte is the smallest addressable chunk of memory in most architecture. Each byte is tagged using single-bit tags. T-bit is set for T-tag and P-bit is set for P-tag. T-bit is set to 1 if data is tainted; otherwise, T-bit is set to 0. P-bit is set to 1 if data can be used as a legitimate pointer; otherwise, P-bit is set to 0.

Pointer Identification

Our method depends on accurate identification of legitimate pointers in order to initialize P-tag for these memory locations. We distinguish between static root pointer assignments and dynamic root pointer assignments which are mentioned in taint introduction.

Dynamic root pointer Dynamic root pointer is initialized with a valid address in dynamically allocated memory. All pointers to dynamically allocated memory are derived from the return values of some system calls, such as brk, mmap, and shmat. We set true value to P-tag of return values. For example, we use Pin API to bind system call brk with a function post_brk_hook. Pointers to stack memory are all derived from SP (stack pointer). So we also set P-tag of the stack pointer register at process startup.

(void) syscall_set_post (&syscall_desc[__NR_brk], post_brk_hook)

Static root pointer The data section of an object file contains pointers initialized to statically allocated memory addresses. The code section contains instructions used to initialize pointers. All references to statically allocated memory are placed in the relocation table. With access to full relocation tables, static root pointer can be accurately identified. However, full relocation tables are not available in practice. DPTA runs on x86 system. Static root pointer assignments are performed using an instruction such as movl that assigns a 32-bit constant to a register.

So we first get start address and end address of static area. DPTA inspect movl instruction and check its operands. If operands are a 32-bit constant and a register, and this constant is in the range of static area, DPTA takes this operation as a static root pointer assignment.

Taint Propagation

T-tag propagation is similar to basic dynamic taint analysis. If tainted data are involved in an arithmetical operation or a logical operation, the result is marked tainted. When a tainted value is copied, the destination becomes tainted. There are some special instructions that cannot be handled with the aforementioned primitives. For example, "xor op1, op1" is a commonly used idiom to reset registers, so we clear its output operand's tag and set T-bit to 0.

P-tag propagation is a little complex. Any operation that could reasonably result in a valid pointer should propagate P-tag. Adding or subtracting an offset from base address is a common operation for pointers. We can obtain address offset if a base address is subtracted from another base address. Thus for add/subtract instructions, if either but not both of the source operands is a P-data, the destination operand's P-bit is set to 1; otherwise, P-bit is set to 0. We propagate P-bit for data transfer instructions such as mov and movd, since copying a pointer should copy P-bit as well. For AND instructions, when the high-order bits of a base address are extracted, the destination operand becomes a base address. P-bit of the destination operand is the same to the source operand. Other arithmetic instructions and logical instructions don't result in a valid pointer, so P-bit of the destination operand is set to 0.

Table1 Tag Propagations



Propagation methods


arithmetic instructions: mul op1, op2

T(op1) = T(op1)∨T(op2)

data transfer instructions: mov op1, op2

T(op1) = T(op2)

logical instructions: and op1, op2

T(op1) = T(op1)∨T(op2)

special instructions: xor op1, op1

T(op1)= 0


add/subtract instructions: add op1, op2

P(op1) = P(op1)⊕P(op2)

other arithmetic instructions: mul op1, op2

P(op1)= 0

and instructions(base address):

and op1, 0x11..00

P(op1) = P(op1)

other logical instructions: or op1, op2

P(op1)= 0

data transfer instructions: mov op1, op2

P(op1) = P(op2)

Attacks Detection

DPTA detects attacks by monitoring jump instructions and pointer dereference. Attack detection policy for jump instructions is similar to basic dynamic taint analysis. It is easily accomplished by monitoring jump/call/ret instruction and checking its operand's T-tag. If T-tag is true, DPTA raises an alert.

Pointer dereference policy defines which kind of data can be used as address, as shown in Table 2. If T-tag is false, it means that this data is not tainted data, pointer dereference is legitimate no matter its P-tag is true or false. If T-tag is true and P-tag is false, it means data is tainted data and not a valid pointer, pointer dereference is an attack. If T-tag is true and P-tag is true, it means data is tainted data but it is a valid pointer, pointer dereference is also legitimate.

Table2 Attacks Detection Policy



Used as address







Attack detection




Our test environment consists of a host, equipped with two 3.07 GHz CPUs and 4GB of RAM each, running Linux Ubuntu with kernel version 3.4. Our aim is to quantify security protection Coverage and performance of DPTA.

Security Protection Coverage

We choose four typical control-data attacks and four non-control-data attacks as test cases. The result shown in Figure-3 demonstrates that DPTA can detect control-data attacks and most of non-control-data attacks. DPTA cannot deal with attacks without corrupt pointers.





Stack overflow



Stack overflow



Double free



BSS overflow



Format String Attack against User Identity Data


Null httpd[5]

Heap Corruption Attacks against Configuration Data



Stack Buffer Overflow Attack against User Input Data



Integer Overflow Attack against Decision-Making Data


Performance Overhead

We select five programs to evaluate the overload of our system and Libdft which is a DTA tool developed by Kemerlis V P.As shown in Figure-4, DPTA imposes a slowdown that ranges between 1.5x and 28x(average 12.5x). Compared to Libdft, DPTA has higher overload which is caused by strict monitor policy we used. DPTA monitors commonly used data transfer instructions (MOV etc.) more than jump instructions. This causes a heavy overhead. However our system can defense non-control-data attacks while Libdft cannot.







archive 160 MB data





Decompress 20MB data





compress 20MB data





find pattern in 160MB file





encrypt 16MB data