Statistical Debugging Of Dynamic Programming Languages Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Dynamic programming languages are considered as the development environments having strong expressive power and have no type declarations [24]. Dynamic programming languages are less typed in nature and this is the reason that they reduce the declaration of variables before their actual usage during execution process [24] [25]. In comparison with the dynamic typing, Static typing proceeds to define variables and its type during the declaration process which will be checked during the compilation time [26]. However in dynamic programming languages the type check of variables and objects is performed during the runtime. Moreover in dynamic programming languages the class structures and hierarchies can be adjusted during the run time [27].

Software debugging of source code and associated modules is an important and continuous activity performed by developers during development of software applications. In accordance to traditional approaches, debugging is normally performed manually in two steps [1]:

Analyzing the execution trace of a software application to figure out the cause of bug or failure.

Visualizing the unexpected response of the application during the testing.

Two types of debugging approaches are followed in normal scenarios to improve the quality of software [2]:

Static Analysis Based Approach:

Static analysis is performed by analyzing the program source code and by static identification of faults. The bugs identified in static analysis approach tend to contain a high rate of false positives [2].

Dynamic Analysis Based Approach:

Following the dynamic based approach for bug detection, a bug tracking is performed at runtime during the execution of a program. Dynamic analysis keeps track of the run time behavior and hence figures out the actual bug pattern/behavior by comparing the runtime behavior of correct and incorrect executions respectively [2]. The run time behavior of the fault occurrence is captured by utilizing the software practice. During the execution of a program, whenever a bug or failure is encountered, stack information including root cause of the bug and its description is collected and reported to a server automatically for analysis and follow up.

In the Dynamic Analysis based approach compilation level checking is implemented based on language specification and hence can detect common programming errors [2]. However there are other bugs which are more related to verification of correctness of coding methods and functions for execution of desired requirements being implemented [3]. Bug localization is the initial step towards automated debugging [1]. In this technique most of the source code is separated from the rest of the code that need further debugging. Selection and integration of an effective bug localization technique in the source code identifies the bug locations with accurate trace out information, hence it presents the appropriate summary related to a bug. This information can be fruitful in understanding the cause of bugs in a particular source code of the software application. The Statistical method of software debugging can handle the unsure and inconsistent bug existence by presenting the most accurate traces of software failures [5].

Many previous fault localization techniques based on the dynamic debugging revealed that actual fault indicator is different from the specified or pointed trace location [8] [9] [10] [11]. Initially invariant based technique was studied to understand the invariant properties related to a program's correct execution state that can be fruitful to relate the invariant properties of a program faults to variations respectively [12]. Another algorithm "DELTA BEBUGGING" was proposed by Zeller and Hilderbrandt, which detach related variables of associated functions and narrowing the states of multiple predicates implementation based on their pass and fail executions [13]. Algorithm presented by Zeller and Hilderbrandt algorithm was based on localization of the fault by presentation of a program states using memory graphs [13]. Agrawal et al. offered a bug localization technique implemented as Xslice [15] [16]. According to Xslice, bug localization can be analyzed by subtracting the distinct correct execution state from a distinct failed execution states respectively [15] [16]. Pan and Spafford in their studies proposed a chain based family of heuristics to bug affected areas of source code of a software program by using dynamic slicing technique [16] [17].

Later on the Tarantula technique was presented by Jones et al [18], based on which all the pass and fail trace information related to the testing of a program for bug localization will be collected and distinct statements for both pass and fail scenarios are highlighted [16]. Afterward Gupta et al. presented an algorithm "SLICECHOP" by combining the DELTA DEBUGGING [13] and the Dynamic Slicing [17] techniques previously proposed by Zeller and Pan respectively [19]. Basic theme of the SLICECHOP is to discover a simplified failure inducting input from the provided failing case by implementing Zeller and Hildebrandt's algorithm [20]. Base on these failure inducting inputs backward and forward dynamic slice are calculated from the incorrect output [16]. The insertion or intersecting point of a backward dynamic slice and forward dynamic slice i.e. the chop, is assumed as the bug localization trace summary [16]. Liblit shares the principal of predicate-based dynamic analysis [21]. Liblit focus on a set of dynamic execution statistics and analysis performed on those statistics [22]. It monitors a program's state, code-flow, control flow and data flow at run time and identifies major reason of a program failure [22]. Liblit proposed a method to solve complete list of predicates with their respective scores [23]. SOBER is a predicate based dynamic analysis technique [21]. SOBER can detect statistically more faults and bugs in a code than any other state-of-the-art technique like Liblit et al [16]. SOBER evaluates each predicate in each execution and provides more detailed information [16]. During each incorrect execution, SOBER ranks each predicate according to the behavior of predicate [16] [21]. SOBER technique is based on estimation of predicate likelihood either it is true or false [23].

Statistical debugging technique sketch out the program's source code to gather data flow, program flows, program state trace information and then perform the statistical analysis on collected information to figure out the bugs [6]. This instrumentation technique for the bug localization by using statistical debugging includes the insertion of some predicate based code before the decisive logical structures - loop statements, function calls and return values - of a program [7]. Base on the pass and fail - True / False - test case values of predicates, program execution trace out and related bug information is collected to figure out and modify the bug's actual location [6]. However there are few issues associated to the bug detection predicates. One of those problems is that the predicate implementation might affect the execution and termination state of a program [7]. Another problem with the bug detection predicates application is for complex software systems. Huge numbers of predicates are implemented, of which many are rationally repeated [6]. In order to work out on these above mentioned problems, specific algorithms or statistical instrumentation techniques are required.

Aims and objectives

The overall aim of thesis is to investigate the existing studies and tools related to statistical debugging and its application to modern dynamic programming languages like Ruby. To achieve this aim following objectives are set as goals to be achieved

Identify and evaluate the existing statistical debugging techniques and algorithms for dynamic programming languages.

To implement the most mature statistical debugging algorithm and adapt them to the Ruby language.

Evaluation and analysis of SOBER and LIBLIT algorithms on RUBY language programs.

Comparative validation of selected debugging algorithm on programs developed in RUBY language.

Research questions

What are the existing statistical debugging techniques adaptable for Dynamic programming languages?

How many existing techniques for statistical debugging are adapted to work in the context of various dynamic programming languages?

How to adapt statistical debugging algorithm SOBER and LIBLIT in RUBY?

How will SOBER and LIBLIT perform when these will be applied on sample programs developed in RUBY using default ratio?

Expected outcomes

The thesis aims to present a report with following outcomes:

A review of statistical debugging algorithms designed for fault and bug localization in dynamic programming languages.

Comparative analysis of existing statistical debugging approaches for effective fault localization by default ratio.

Analysis of results by implementing debugging algorithm SOBER on RUBY language sample and seeded fault programs.

Research Methodology

Reviewing and analyzing different algorithms and then implementing on RUBY require different research techniques. Techniques will not only focus on different studies and their results but also on implementation of SOBER in RUBY. Therefore there will be combination of different research techniques to answer all research questions. Techniques will focus on studies related to different statistical debugging techniques, implementation of SOBER in RUBY, analysis of results obtained from SOBER implementation in RUBY.

First research will focus on different techniques related to statistical debugging. During this research implementation of different techniques will be considered and their corresponding accuracy and precision will be analyzed. In second phase, information gathered from first phase will be used to figure out most appropriate and widely used techniques and those techniques will then be considered and discussed. This phase depends on the preliminary studies and information gathered in 1st phase. In third phase SOBER will be implemented for RUBY programming language for a simple program first then on a complete program with seeded bugs and in forth phase implementation results of SOBER in RUBY will be analyzed.

Figure 1: Research Design

Research Question

Research Method

Question # 01 &Question # 02

Literature Survey

Question # 03

Experiment, Test

Question # 04

Analysis of data and evaluation of results




Less familiarity with debugging algorithms

Explore algorithms available for statistical debugging and their implementation in software application

Limited case studies analytical data is available on which further conclusion can be made

Detailed literature review and analysis of case studies for software applications on which statistical debugging algorithms are implemented

Critical schedule due to lack of development experience in RUBY

Keep deadlines manage in order to compensate the threat of delay due to practice sessions of RUBY language

Time plan

Scheduled Milestones and Meetings:

20100117: Start writing the proposal

20100121: Meeting with supervisor

20100129: First draft of proposal to supervisor

20100201: Work on supervisor feed back

20100202: Update proposal draft

20100208: Final draft of proposal to supervisor

Phase 1:

20100209: Literature review

20100210:Report writing of analysis from literature review

20100301: End of literature review

Phase 2:

20100302: Algorithm selection

20100307:Observations from previous case studies and analysis

20100313:Report writing of comparative studies between available debugging


Phase 3:

20100320: RUBY Training and Algorithm implementation

20100410: Algorithm validation

Phase 4:

20100413: Cold test of implemented algorithm for simple programs

20100415: Cold test of implemented algorithm for bug seeded programs

20100417: Final cold test for experimentation on sample application

20100419: Report writing of analysis and statistical data collected from experiments.

20100514: Tell supervisor that we are OK for presentation

20100521: Send final report draft to opponents

20100523: Design presentation