Bug Detection And Taint Analysis Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract. There has being a rapid increase in software and information technology systems. Malicious individuals have also developed wide range of methods to attacks systems. Bugs in programs are used to exploit and cause vulnerabilities to the system. Bug detecting tools have developed and are still being researched on to detect and eliminate software bugs from programs quickly. Bug detecting tools can perform analysis statically or dynamically. Taint analysis is a technique that can be used to detect bugs, software vulnerabilities and prevent attacks. This paper addresses the techniques and tools used to detect bugs in programs.

Keywords: Bug, bug Detection, taint, taint analysis, static analysis, dynamic analysis, information flow, data flow, control flow

1 Introduction

This paper describes and compares several tools for detecting bugs and performing taint analysis. All these tools have differences and similarities in methodology and techniques.

The continuous advances in the information technology sector and ubiquitous nature of modern day technology have made a huge impact on our daily lives as human beings. An example of a day to day use of technology is long distance communication that is achieved in fraction of seconds whether by email or by mobile phones. Information has become more accessible, business activities are run automatically and there constant improvements in the military defence operations and gadgets. These are all diverse ways and different sectors that information technology has hugely impacted. Though not visible, computer software plays a critical role in these systems and several other systems. The computer hardware would be unable to communicate or interact with the user or agent without the availability of computer software.

Consumers therefore expect their software products to meet the requirements and specifications provided. Notably, the development of software is complex and intellectually tasking. Software developers and companies however have a duty and responsibility to provide consumers with bug free software or software with minimal bugs in them irrespective of the type of software; whether the software is open-source, free or commercial. A program or software that runs as expected may not however be bug free. A bug can be defined as an error in a command or program that performs functions in a way it is not specified or expected to perform and may pose a security risk or threat to the system.

The presence of bugs could have an adverse effect on the user of software and/ or also on the software developer or software company. An example of the effect of a software bug in a system is the case in which patients were exposed to massive radiation overdose resulting severe injuries and death by Therac-25 - a computerized radiation therapy system between June 1985 and January 1987 [1]. Another example is the cyber crime and attacks on Google that originated from China in February 2010. This was attributed to several vulnerabilities. One of attributed vulnerabilities was linked to a bug in Internet Explorer [2]. Software companies also make themselves financially liable and expose themselves to lawsuits when their software products have flaws that can cause severe security risks.

Bug types can be divided into several classes in different ways but Lu et al [5] divides bugs into memory related, concurrent and semantic. Memory related bugs can be classified further into buffer overflow, stack smashing, memory leak, uninitialized read and double free. Data race bugs, atomicity-related bugs and deadlock are further classifications of concurrent bugs.

Several types of bug detecting tools currently exist. These tools find different types and ranges of bugs in codes and therefore have different tradeoffs. The tools also differ in the techniques they use in finding bugs. Some of the techniques used are type system analysis, model checking, control-flow and data flow analysis. Some bug finding tools use bug patterns found in previous programs that have being analysed [8]. An ideal bug detection tool should produce few false positives and false negatives, should be easy to use and implement and must also detect bugs early.

Taint analysis is the labelling of some data from an untrusted source as tainted. The tainted data is then monitored and tracked as the program is executed to detect when the tainted data is being used for an attack and where it gets propagated to in the program. Taint analysis can provide information on the vulnerability, the cause, effect and origin of the vulnerability. Taint analysis provides few false positives or negatives. False positives are warnings about correct codes and false negatives are failings to warn about incorrect code.

Taint analysis is used to prevent software vulnerabilities such as buffer overruns, format string vulnerabilities, cross-site scripting, SQL injections and command injections. They also used in software debugging and software testing. Taint analysis can be divided into two classes; static taint analysis and dynamic taint analysis. Static taint analysis is either source code level analysis or binary code level analysis. Dynamic taint analysis can either have a coarse-grained approach or a fine-grained approach. [4]

The rest of the paper is organized as follows. Section 2 provides an overview on bugs, causes, bug detection tools and different types (Section 2.1). Sections 2.2 to Section 2.4 describe different types of static tools, dynamic tools and tools that integrate both forms of analyses. Section 3 describes taint analysis and gives examples of tools or frameworks that can be used to perform taint analysis. Section 4 describes related work and Section 5 concludes the paper.

2 Bug detection tools

This section provides general background knowledge to bugs, causes and bug detection tools. Different bug detection tools are also described and evaluated in this section.

2.1 Background

Bugs in software can have a huge effect during the execution of the program. When a program runs as expected, it does not mean it is bug free. The undetected bug may produce inaccurate result and cause severe errors. The presence of a bug in a program can cause any of the following errors; boundary error, calculation error, race conditions, state error, exception error, input validation error and many more. Bugs may be caused by human errors in software design, development, sometimes by compilers producing wrong codes or incorrect function specification. Nakashima et al [3] stated that 49.5% of software bugs were caused by the designer's carelessness.

A lot of research has gone into developing techniques that automatically detect bugs in programs. There are different types of bug detection tools and they differ in the way the function and they type of bugs they can find. Bug detection tools can be divided into static and dynamic tools. Static techniques are more concerned with information-flow control and are not used on running programs or executing them but dynamic analysis places emphasizes on control-flow accuracy over data-flow and is used on program at runtime or during execution. There is a third classification of tools called model checking. Model checking was formerly classified static tools but is sometimes used during a program's execution. [5]

Bug detecting tools can also be divided according to the type of rules they use in detecting bugs in programs. [5] They can either be classified as Programming-rule-based tools, statistic-rule based or annotation-based tools. Violations to programming rules are reported by Programming-rule-based tools. Statistic-rule-based tools Statistic-rule-based tools use statistical methods like logistic regression, conditional probability and hypothesis testing to compare large amounts of program that were successful or failed executions. Annotations-based tools use written annotated software to detect inconsistencies and bugs. Certain tools also work on specific languages that they were built to work.

2.2 Static-analysis based bug detecting tools

This section of the paper would consider some bug detecting tools that use static analysis and how they function.

2.2.1 Extended Static Checker for Java

ESC/Java [6] the extended static checker for Java is an annotation-based and a static tool that runs at compile time. It uses theorem proving to detect bugs and software written with an annotation language to produce warnings that are not consistent with the written annotations. An annotated java program is passed into the front end of the program, parsed and subsequently type-checked. The front end produces abstract syntax trees that are passed to the translator and type-specific background predicate that is passed to the theorem prover. The type-specific background predicate is the encoded form in first order logic of types and fields in the java class.

The translator translated what was inputted into it a simple language and produces guarded commands. The guarded commands are passed to the VC generator. Verification commands are produced from the VC generator and passed to the Theorem prover. The in-built automatic theorem prover is invoked and it produces the prover results that are passed to the Postprocessor. The Postprocessor processes the prover results and provides the appropriate warning output to the user. ESC/Java identifies explicitly stated invariants in Java Modelling Language (JML) and implicit pre-conditions and post conditions in Java operations [7].

The ESC/Java is unsound both on the user-level and also on the programming language level [7]. It may not detect some errors. It is also incomplete. It may give spurious and incorrect warnings and may miss some errors. [7] Producing annotations for annotation-based tools can also be a time consuming task. Though not perfect, the ESC/Java is cost effective. It improves programs by detecting appropriate errors in them. It is cheaper than full program verification program that can theoretically catch all errors but it finds more errors than conventional static checkers. Because it uses an automatic theorem prover, it can find errors that are normally located during the runtime of a program. It also provides warnings on race conditions errors and deadlock errors. These errors are synchronization errors in concurrent programs.

2.2.2 FindBugs

FindBugs [8] is another example of a static tool for Java. It uses bug pattern detectors to produce warnings about potential bugs that have similarity to other bugs. Each bug detector deals with one or more of the following issues: single-threaded correctness, thread or synchronization correctness, performance and security to suspicious code and programming practices. FindBugs tool has 45 different bug pattern detectors implemented on it.

Some of the detectors do not scan through the actual program code. They analyze the class structure and inheritance hierarchy of the program code. Some detectors use control flow analysis and others use dataflow analysis. Some bug pattern detectors scan through the bytecode of the program classes linearly.

FindBugs works to achieve simplicity, precision and effectiveness. It can be extended when custom bug pattern detectors are written [9]. However, FindBugs produces false positives. These are warnings that are not to be produced. Sifting through warnings to determine which are false positives or not can be time consuming and may not be considered good programming practice.

2.2.3 PREfix

PREfix [10] is a programming-rule based tool. It is based on static analysis and was developed to find dynamic programming errors in C or C++ programs. Program codes are parsed into abstract syntax trees by the front end C/C++ compiler. PREfix selects a graph and path to cover during execution. Errors are reported after the simulation of the individual functions. A model for the function is obtained from combined summaries obtained from individual path simulation summaries. The model defines the behaviour if the function. Its methodology produces precise and detailed error warnings.

PREfix works with selective path selection, model abstractions capturing, model simplification and analysis parameterization. It however is restrictive in the type of bugs it can locate.

2.2.4 Argus

The last static bug detecting tool to be discussed in this section is Argus. It is a statistic-rule based tool. Statistic-rule based tools use statistical methods such as conditional probability and hypothesis testing to collate data about software bugs by reviewing results from a number of passing and failing executions. Argus works on online unlike some statistical bug detection tools that work offline and are limited in the number of failing executions available to them. It adds complexity and problems to system with limited connectivity.

It uses anomaly detection at runtime to detect bugs. Argus builds statistical models from executions in its training mode from runtime statistics and detects bugs in single execution in its detection mode by comparing the runtime statistics and learned statistical model. Error warnings are produced when there is a significant difference between the runtime statistics and the statistical models.

2.3 Dynamic-analysis based bug detecting tools

This section of the paper considers some bug detecting tools that use dynamic analysis.

2.3.1 DIDUCE

DIDUCE [11], Dynamic Invariant Detection Checking Engine is a type of statistic-rule based dynamic bug detection tool. DIDUCE dynamically extracts invariants from program executions in Java programs. The learned and obtained invariants are compared with the program's behaviour. Irregularities from the comparison are then reported as warnings. DIDUCE is designed to locate memory related bugs.

For usage, the program files are listed and specifications of which program points to instrument may also provided by the user. The instrumented class files are inserted at the beginning if the classpath and the running of DIDUCE detecting tool is run. DIDUCE can run in two modes- the training mode and the checking mode. Either of these modes can be selected by the user for DIDUCE to run in. The difference between the two modes is that in the training mode, DIDUCE learns about the program invariants while in the checking mode, warnings obtained from the invariants are reported. The training mode also runs in the checking mode.

Violations of invariants for example up to a certain degree of gravity can be requested for only by the user from the DIDUCE. DIDUCE cannot only find errors but it can also provide root cause of errors. It is can be used to detect algorithmic errors, hidden errors and errors in inputs DIDUCE has a high time over-head.

2.3.2 Purify

Purify [12] is an example of a programming-rule based dynamic bug detection tool. It therefore locates bugs during the run time of the program. It adds into the program code checking instructions and checking logic to the program code. The checking instructions detect memory access errors in memory read and write commands. The checking logic inserted into the program verifies system call interface. Purify can also detect memory leaks when watching how the memory is used. It uses garbage collection techniques for this detection of memory leaks.

To detect memory access errors, Purify watches the memory access during the running of a program and analyzes the state code of each memory byte. A change in the current state during access raises a flag and a warning or message is sent to the user. Memory leaks are somewhat difficult to detect. Memory that is not being used should be released. The garbage collection technique uses a garbage detector and a garbage reclaimer. The garbage detector tracks and eliminated memory leaks. In the algorithm used by Purify, referenced blocks from the data and stack segments are marked. Then, the heap is successively checked for allocated that are no more reference to the program. This allocated memory blocks are reported.

Purify [5] is easy to use Purify was designed to detect memory leaks and access errors, and therefore it cannot detect all types of errors as shown in [5] for example a stack buffer overflow meaning they do not work as well with stack objects as they do heap objects. It also has run time over head.

2.4 Combining static and dynamic-analyses based bug detecting tools

This section of the paper would consider the effect of combining static and dynamic analysis together to produce better and more effective results.

Static bug detecting tools are good at finding implementation errors like null pointer reference. They can find all types of errors like logical or semantic errors. They scan source codes or binary codes directly making execution of the program unnecessary. It is capable of detecting generic errors such as buffer overflow. They have no run time over head but this approach makes it possible for static analysis to produce a lot of false positives and false negatives. The false positives and false negatives give too many warnings on non-existent bugs that would not be useful and would be cumbersome and time consuming to view the programmer. They are also not as precise as dynamic tools.

Dynamic analysis uses test cases since dynamic bug detecting tools are executed during the running of the program. This approach reduces the amount of false positives and false negative. The choice of the test case determines how accurate the analysis would be. In dynamic analysis, the programmer can decide on the paths, procedure calls, types or branches to run the tool. However not all test cases and conditions can be worked on and some bugs may be missed. The missed bugs may be on paths that were not executed during the testing run time. They are affected by run time overhead

Static and dynamic analyses differ in methodology and provide varied uses for different situations. A bug detecting tool that can merge the advantages of both analysis and minimize or eliminate their disadvantage would prove very effective and efficient in detecting bugs. This tool should be sound and complete. A sound bug detector proves only true sentences and a complete system proves all true sentences. An analysis is sound if claimtrue(p) ⇒ true (p) and is complete when true (p) ⇒ claimtrue(p). An analysis is sound in bug detection if it is complete to prove correctness and sound to prove correctness if it is complete in bug detecting.

Research into models that combine both static and dynamic would be constructive and valuable. Appropriate understanding of the two types of analysis, their duality and differences would make development of approaches from one domain to another easier. Ernst [13] skillfully stated three ways to integrating both analyses. These are Aggregation (Pre- and Post-processing), Analogous analyses, Hybrid analyses. In the aggregation method, the input of one analysis is derived from the output of the other form of analysis. This can also be extended though repetition. That is, a static analysis output can become the input of a dynamic analysis and the output from the dynamic analysis becomes the input of a static analysis. The output from a dynamic analysis can be input to a static analysis. Features obtained at run time can be verified. If a static output is inserted into the input of the dynamic analysis, instrumentation requirements are reduced. Codes could also be tested more comprehensively. Static and dynamic analysis can also be applied to the program code they are capable of dealing with best.

In Analogous analyses, the same program can have both static and dynamic approached applied to it. The results from the different analysis would have varied characteristics. An example is program slicing-what part of a program contributes to computation values. Program slicing can be done both statically and dynamically. Type-checking is another example of an analysis can be done both statically and dynamically. Type-checking is confirming the right actions or operations are performed to the right type of values. In memory checking for example Purify [ ] works with run time instrumentation and LCLint [14] works statically a compile-time dataflow analysis. Specification checking and generation can be done either statically or dynamically. Theorem-proving is an example of static specification checking. Static specification generation can either be done manually or by abstract interpretation. Dynamic specification generation can be done by invariant detection. They both perform identical analysis both with different rules. For techniques where one type of analysis exists, research can be done to discover the other analysis whether static or dynamic.

For Hybrid analyses, dynamic and static analyses can be blended into one tool. User may be able to choose which of the analyses to use at the particular time. Subset descriptions of static and dynamic analyses are unified in hybrid analyses. This approach allows users to apply different approaches to other domains. It provides opportunity to optimize the different type of analyses to detect the various types of bugs. The Hybrid analyses also enable a constructive comparison of result obtained by using either of the approaches.

The following sections describe different tools that use techniques that combine both static and dynamic analysis.

2.4.1 Check 'n' Crash

Check 'n' Crash [7] is a static-dynamic tool that uses Post-processing (Aggregation) analyses. ESC/Java is a static bug detection tool. The output from a program processed with the ESC/Java is post-processed with a dynamic step. The Check 'n' Crash consists of the ESC/Java and the J-Crasher [16] tool. It takes names of the Java files to be tested as inputs. The ESC/Java is called and it produces error conditions. Each error condition as a constraint system and ESC/Java is extended by parsing and solving the constraint system. The solution is variable assignments that satisfy the error condition or constraint system. The variable assignments are compiled into concrete test cases with JCrasher in JUnit test cases. JUnit then execute these test cases. If an exception is thrown, then the error forecasted by ESC/Java exists. However, if an exception is not thrown by the test case execution, an error does not exist and a false positive was produced. Check 'n' Crash eradicates false positive warnings (language-level unsoundness) and improves the error reports quality.

2.4.2 DSD-Crasher

In the DSD (dynamic-static-dynamic) Crasher [7, 21] the output from an initial dynamic analysis is inputted into the Check 'n' Crash tool. It works on user-level soundness removing user-level false warnings. The Daikon tool (the initial dynamic analysis stage) [17] uses an existing regression test suite to infer possible program invariants. These invariants are outputted as JML annotation that would be inputted into the Check 'n' Crash tool. The Check 'n' Crash tool runs the generated test cases and reports violations. DSD-Crasher has an enhanced bug detecting ability, reduced false warnings than the Check 'n' Crash tool and therefore more accurate error reports.

2.4.3 JNuke

JNuke [15] is a framework that that combines static and dynamic analyses for bug detection. The framework is based on the generic analysis algorithm. It combines runtime verification, model checking and counter-example exploration. The static part of the algorithm looks for defects and test cases are written for the reported defects by a programmer. The static environment uses abstract interpretation. Run-time verification is applied to the test cases written by the programmer using the dynamic side of the same algorithm.

JNuke [23] static bit converts the byte code into an a set of instructions and also use code instrumentation to change it to a file that can run on a Java debugger or VM. The VM is used for model checking and analysis by backtracking. Checkpoints care provided to store differences in states.

2.4.4 DART

Another example of a tool that merges static and dynamic analyses is DART. Direct Automated Random Testing, DART [18] is comprised of three main techniques. These techniques are static source code parsing, dynamic analysis and automatic generation of a test drive. DART works on programs automatically and does not require test cases and therefore do not require the programs to be run or compiled. It works on C programs and guarantees sound warnings. It offers a trade-off between static and dynamic bug detecting tools. Constraint solver or theorem prover is not required when implementing DART.

2.4.1 DSD-Crasher

In the DSD (dynamic-static-dynamic) Crasher [7, 21] the output from an initial dynamic analysis is inputted into the Check 'n' Crash tool. It works on user-level soundness removing user-level false warnings. The Daikon tool (the initial dynamic analysis stage) [17] uses an existing regression test suite to infer possible program invariants. These invariants are outputted as JML annotation that would be inputted into the Check 'n' Crash tool. The Check 'n' Crash tool runs the generated test cases and reports violations. DSD-Crasher has an enhanced bug detecting ability, reduced false warnings than the Check 'n' Crash tool and therefore more accurate error reports.

Static and dynamic analysis can be merged together with benefit but may not be very cost-effective.

3 Taint Analysis

3.1 Background

Taint analysis is type of information-flow analysis where data is marked tainted if it originates from or is derived arithmetically from malicious or untrusted sources and can cause a potential security threat at sensitive sinks. These are security-sensitive or vulnerable places in a program. The tainted data is monitored to reveal how it would be used in a program, where it flows and how it propagates

Taint analysis has being used to detect vulnerabilities in web applications, detect and prevent different types of attacks such as buffer overflow, SQL and command injections, cross-site scripting (XSS) and format string attacks. Taint analysis has also being explored in regards bug detecting and software testing. It can also be used to ensure secure information flow.

3.2 Different uses of taint analysis

3.2.1 Taint-style attack detection and prevention

Web applications are continually being exposed to attacks by malicious users that want to do atrocious activities for varied reasons. A web application typically has a user interface probably a browser and a back-end database. Malicious attackers may enter strings into the web application to gain unauthorized access or even cause a denial-of-service attack. Firewalls installed in system still allow certain attacks to go undetected. These attacks are called tainted object propagation.

Some of the attacks where harmful data are used to manipulate web application are described. Cross-site scripting (XSS) where authenticated user details are obtained from the cookies and the user executes malicious scripts. SQL instructions can also be entered into the database for running- SQL injection and command injections can be entered into user input fields to run shell commands. Web cache poisoning another form of attack can be done can be done by HTTP response splitting. The form of attacks can leak confidential information.

When harmful data in inserted into web applications, the cookies can be poisoned, the HTTP header can be manipulated to send HTTP requests to the application, hidden fields in forms can have harmful values in them. Parameters can also be tampered with. Taint analysis takes data from the user, taints it and monitors the tainted data to check and guarantee that it prevents the data from propagating and being used in sensitive parts of the program.

3.2.2 Information-flow analysis

Taint analysis can be used to check whether tainted data has not gotten to sensitive sinks in a program and how tainted data is used in a system. Taint analysis can be used for example to classify data according to sensitivity or security classification of clearance level. A top secret information or data can be labeled tainted and monitored that it does not flow to a lower classification level of say unclassified or intermediate-secrecy level. The monitoring can use access control policies defined in the system

3.2.3 Bug detection and software testing

Masri et al [19] proposed algorithms for debugging that uses program slicing and information flow analysis or tainting. The dynamic detects data flow analysis that is wrong and the program slicing determines information about the part of the program is responsible for the insecure data flow.

Taint analysis can be divided into dynamic taint analysis and static information-flow analysis. This section describes different tools that implement either of the two approaches.

3.3 Static taint analysis

3.3.1 TAJ

TAJ [16] is a static taint analysis for Java. It detects four main security issues. These are Cross-site scripting (XSS), Injection flaws, Malicious file extension and Information leakage and improper error-handling attacks. These forms of attack could be prevented by monitoring the data that comes from untrusted sources before they propagate. This can be done through data flow and/or control flow.

TAJ works in two phases: a pointer analysis and a hybrid slicing algorithm to monitor the data that was tainted. In the first stage of the analysis, TAJ applies a certain level of object sensitivity to the methods it analyzes and also performs call graph construction.

3.3.2 Pixy

Pixy works statically to detect cross-site scripting (XSS) vulnerabilities in PHP code by using data flow analysis. PHP is widely used for developing web applications. A taint analysis tool that detects vulnerabilities in PHP code would be very useful. Pixy uses flow-sensitive data flow analysis to detect taint style vulnerabilities. This works on the control flow. A parse tree graph of the PHP program is created using JFlex, a Java lexical analyser and the Java parser Cup. The created parse is converted into a linear form and kept as a control flow graph.

Pixy then executes applies alias analysis, literal analysis and path pruning. Simple application of taint analysis to the linearized form of the created parse may not produce precise results. In alias analysis, the variables assigned to the tainted data must also be propagated to the aliases. Literal values held by constants and variables at specific program points must also known by the taint analysis. Path pruning uses the results obtained from the literal analysis to evaluate branch conditions. Pixy however does not support PHP's object oriented features and also produces some false positives.

3.3.3 DYTAN

DYTAN [22], Dynamic Taint Analyzer is a dynamic taint analysis framework. In dynamic taint analysis, the data is tainted at run time. DYTAN runs on binaries and therefore does not need the program source code. DYTAN's framework works strategically with the three domains that characterize the dynamic taint analysis. The three domains are taint sources, propagation policy and taint sinks.

Taint sources are the memory locations that can tainted. These could be data read from or file or network connection or variables or functions values. DYTAN allows its users to decided memory locations that should be tainted. The variable name, global or local procedure to this memory location must be stated by specifying the offset of the program's base address.

A function can also be name as taint source. The user must provide the name of the function to be tainted. Data from I/O streams can also be stated as taint sources. DYTAN supports three types of I/O stream: the keyboard, the network and the filesystem. The user can call for all data from any stated three sources to be tainted. Specific I/O stream can have their data tainted. The network name, IP address or the IP name and port must be provided to identify the unique network and the absolute paths of the files must be provided for unique identification. The number of taint markings to be linked to a source is not limited but the higher the numbers, the more overhead space it uses. This can be used to separate the different taint sources.

The propagation policy describes how the tainted data can be propagated in the program during run time. The taint propagation must contain the taint source, the tainted data and the areas of the program that would be affected by values of the tainted data. DYTAN allows it users to describe the mapping function and areas that would be affected by the tainted data. The areas that may be affected by the tainted data can be provided by data-flow or control-flow. Data-flow deals with the propagation of taint analysis explicitly and control-flow deals with the propagation that are results of control-flow dependencies. This is the implicit propagation. Users can decide which of the propagation types to use (either implicit propagation or explicit propagation) and evaluate the differences.

Taint sinks are areas the taint analysis is to be performed. Each taint sink must have its own ID, memory location, code location and operations to be done at the taint sink. To use DYTAN, the taint source, propagation policy and taint sink must be provided through a configuration file in the XML format. The configuration is used to produce instrumented executable by the x86 executable. Control flow graphs are created for control flow based taint propagation. During the running of instrumented program, two libraries are used. One is DYTAN's library and the other library was provided by the user. This library provided by the user contains operations for checking and methods for combing the taint markings

DYTAN is quite flexible and provide users with techniques of customising the system. Users can specify what memory locations to taint, what data to be tainted and how the propagation must take place. Users can also decide on the type of tainting to perform. Either data-flow or data-flow and control-flow based tainting.

3.3.4 TaintCheck

TaintCheck [4] uses its own environment to run the dynamic taint analysis on the program. It does not use source code or special recompilation. TaintCheck has four components: TaintSeed, TaintTracker, TaintAssert and Exploit Analyzer. The TaintSeed part of the TaintCheck marks data from untrusted sources as tainted. TaintSeed marks data from network sources as tainted by default. However, TaintSeed can be configured to marks inputs from other sources by using an extended policy configuration. TaintSeed can allocate a single bit to memory locations to denote that they are tainted. This is done by disabling logging. Enabling logging, allows a taint data structure of a tainted memory to be created. The data structure holds system call numbers, a snapshot of the stack and a replica of the written data.

The TaintTracker is used to monitor instructions performed by the tainted data to make out whether the results from the instructions would be tainted; that is TaintTracker checks the propagation of the tainted data. There are three categories of instructions that TaintTracker checks. These are data movement instructions, arithmetic instructions and instructions that are neither used for moving data or performing arithmetic operations. Examples of data movement instructions are LOAD, STORE, PUSH, POP, etc. By default, data from a data movement instruction is tainted if any byte of data at the movement instruction source is tainted.

Data from destinations that for arithmetic operations are tainted if any of bytes of the operands in the source is tainted. XOR, SUB, ADD are examples of arithmetic operations instructions. NOP and JUMP are neither used for data movement or for performing arithmetic operations. TaintTracker adds instrumentation before data movement and arithmetic instructions. When the result from an arithmetic operation is tainted, the taint structure of shadow memory of the result is linked to the same taint structure of tainted operand or a new taint structure can be constructed that still links to the previous taint structure. The new Taint structure holds appropriate information such as operand location and a snapshot of the stack. The fourth part of the TaintCheck framework, the Explicit Analyzer can the use these links to check how the tainted data was propagated.

The third part of the TaintCheck framework is the TaintAssert. The TaintAssert is used to determine whether the tainted data was used suspiciously. If there is a suspicious activity, TaintAssert notifies the Exploit Analyzer that performs more tests. TaintAssert is used check that tainted data is not used to jump addresses such as return addresses or function pointers. Instrumentation is placed by each jump instruction to monitor the data used to indicate the targets. The data must not be tainted. TaintAssert also checks whether format string arguments are not using tainted data. This prevents format string attacks that can be used to leak data or perform other malicious attacks. TaintAssert can be used to check that arguments to a system call are not tainted and also to check application-specific or library-specific attacks. By default, jump address and format strings checks are done. TaintAssert must be configured to perform system call arguments and application or library-specific checks.

The final part of the system, the Exploit uses the detected tainted data that has potential to exploit the system and attack to give information on how the potential exploit originated and what the exploit tried to achieve. The information provided from the previous stages in the TaintCheck (from the TaintSeed and TaintTracker) and from backtracking Taint structures is used to derive the path from the tainted data source and its subsequent usage. Another key feature of the Exploit Analyzer is the way it derives attack signatures. Exploit Analyzer allows attack to be executed in a restricted environment after the attack has being detected. Semantic information about the attacks can be used to generate accurate signatures about attacks.

4 Related work

Lu et al [5] proposed a benchmark called BugBench for evaluating software tools. Their paper compared Valgrind, Purify and CCured. These tools were to detect the same type of bug- a memory related bug. They were significant differences in results. Some were more successful than others and some had a large overhead. These is because the tools had different ways of implementing their analysis.

Wagner et al [24] reviewed FindBugs, PMD and QJ Pro and compared bug finding tools to testing. They also compared defects derived from using the different tools.

5 Conclusion and Future work

This paper reviewed several bug detecting tools and taint analysis framework drawing out their unique advantages, disadvantages and uses. However, not one particular tool is the silver bullet in the detecting bugs and finding vulnerabilities in programs or software. Each tool has its advantage and disadvantage. Also, there is no bug that is so tough that it cannot be discovered. The right tool must be applied to it. Combining several tools with different tradeoffs can improve the quality of the system and reduce false negatives and false positives. Developed tools can also be improved on their shortcomings.

The provides an opportunity for future work into developing tools that easy to use, effective and efficient