Study On A Reverse Engineering Software Tool Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Source codes are written by application development teams. These are further maintained and enhanced by applications maintenance teams. In most of the cases these two teams are mentally and geographically disparate. Code interpretation is thus a time consuming activity in the maintenance phase of a software development lifecycle. Rigi is a mature research tool that provides functionality to reverse engineer software systems.

By virtue of the visualization techniques provided by Rigi, large systems can be analysed, interactively explored, summarized and documented. We discuss the two contrasting approaches that are available for visualizing software structures in the Rigi graph editor. In this paper we also describe Rigi's main components and functionalities and assess Rigi's impact on reverse engineering research. Furthermore we study the integration of different reverse engineering tools to create the "perfect reverse engineering" tool.

1. Introduction

Manual interpretation of large legacy systems is a cumbersome process. One way of understanding such a process is through computer-aided reverse engineering. Although there are many forms of reverse engineering, the common goal is to extract information from existing software systems to better understand them. Lengthy lines of code are represented in a form where many of its structure and functional characteristics can be analyzed. This analysis can then be used to improve subsequent development, ease maintenance and aid project management. This helps defend against brittle software systems that resist graceful change. Problems can be exposed and corrected if reverse engineering is applied as a precaution during Software Development. As maintenance and re-engineering costs for large legacy software systems increase, the importance of reverse engineering will grow accordingly [4]. Numerous reverse-engineering tools have been developed to help program understanding and thus facilitate software maintenance by providing methods to uncover the design of software systems [3]. The usability of these tools is critical to their effectiveness. Effectiveness of Rigi is mainly due to the structured graphical notation of code it presents. This form of code representation is more intuitive and also less intimidating than unending lines of poorly structured code.

Reverse Engineering

Reverse Engineering is the process of extracting and interpreting the architecture of source codes. With Reverse Engineering, a system's original logic can be analyzed. This understanding at the early stages is absolutely necessary to ensure smooth code enhancement.

Rigi's reverse engineering process involves parsing a subject software system, resulting in a graph where nodes represent a program's functions, data types, and arcs represent dependencies among them. A hierarchy is then imposed on these graphs by building subsystem abstractions [9].Software maintainers can subsequently use this divide and conquer approach to understand these sub-architectures individually.

1.2 Program Understanding & Software Maintenance

The application of reverse engineering technologies to large-scale legacy software systems offers the opportunity of easily understanding the millions of lines of code. There are diverse program understanding techniques such as structural re-documentation, pattern matching and run-time analysis. Effective understanding is also necessary for factoring and optimizing code [1]. It also helps in increasing a code's portability to different platforms.

A well interpreted software system can be used to exploit reusable components and add new features to the existing structure. Applications such as medical instrumentation and process control require a level of accuracy that is only possible with a thorough understanding of the software. Typically software gets developed, maintained and re-engineered many times during its lifetime. Because of these changes during evolution, the software architecture often degrades in that the code no longer follows the original design criteria. This is especially true for legacy software systems, which are typically large, complex, poorly structured, and resist change.

Some software systems are badly written from the start. The reason behind this can be either the inexperience of naive developers or the enormous pressure on developers to ship the product out the door, even if the existing bugs and poor documentation later cause maintenance nightmares [5]. Somebody still has to understand these high entropy systems. Thus, program understanding is an indispensable part of software preservation [7]. Program understanding can thus leverage the software developer's actions and knowledge, and augment the software development and maintenance processes.

Section 2 describes available visualization techniques in Rigi followed by the extensibility feature of Rigi. Section 3 outlines an integrated reverse engineering tool-kit for program understanding. Section 4 elaborates the current practices in the industry. Section 5 highlights the ongoing research associated with Rigi. Section 6 is conclusion.

2. Rigi

Reverse engineering is used to understand the system through the analysis of its internal components. Rigi is one such reverse engineering tool that helps decode legacy system by providing a graphical representation of the embedded code. In other words, Rigi operates on a source code and generates a graph where the nodes represent the procedures and the edges represent the procedure calls in the source code [13].

The Rigi user interface is a graph editor called rigiedit, which is used to analyze and modify the resulting graph. Initially rigiedit consists of an empty workbench window and a root window. The rigiedit is a programmable interface which can be modified to complement user specifications using Tcl/Tk. Rigi stores and retrieves the graph in a Rigi Standard Format (RSF) [13].

Figure 1: Rigi Workbench Window

Figure 2: Rigi Root Window

The Rigi workbench (Figure 1) can be used to customize features of Rigi by writing scripts using the Rigi Command Library (RCL). Rigi root window (Figure 2) is used to represent the root node in the hierarchy. The description of RCL commands is provided with the tool [13]. The following two subsections describe two alternative approaches for exploring software hierarchies in Rigi.

2.1 Visualization

Understanding a code with the help of graphs allows the developer to spend less time on interpreting the code and thus commence the development phase of the project lifecycle. However as the size of the software system increases, the graph becomes unwieldy. Rigi uses advanced graphics and abstraction techniques to minimize the complexity of the resulting graphs. Rigi supports two such visualization techniques for better interpretation of the graph. The first technique is the multiple window approach and the second technique it employs is called SHriMP which is an acronym for Simple Hierarchical Multi Perspective view [2]. A detailed explanation of the above two techniques is mentioned in the following sections.

2.1.1 Multiple Window Approach

In this technique Rigi generates a train of graphs that represent the system hierarchy. When we click on the node of the graph it results in another window that contains the children of the parent graph and this continues till the leaf nodes are reached in the hierarchy. The limitation of this approach is that that user can get lost in space: when multiple windows are opened it becomes difficult for the software engineer to study the structure of systems [3]. Let us study this concept with the help of the following diagram.

Figure 3: representing Rigi node.

Figure 4: representing Base node

Figure 5: representing src node

Figure 6: representing the children of src.

In the above figures Rigi's performance on the sample code, provided by the author, is studied. Rigi produces figure 3 as its first graph (root node). When double clicked we obtain figure 4 and this continues until we obtain figure 6. The edges in the graph represent the functional dependencies between the procedures. The above execution works well for a smaller code but in reality there are millions of lines of code to be reverse engineered. Thus there will be thousands of windows depicting the parent-children relationship. Imagine if you were to reverse engineer such a code, it would be an uphill task. This limitation is christened "lost in space" approach [3].

2.1.2 SHriMP

SHriMP is an acronym that stands for Simple Hierarchical Multi Perspective view. SHriMP is implemented in the Tcl/Tk language and is can easily integrate into a system that supports Tcl/Tk library. SHriMP is a type of visualization technique that alleviates the limitation of Multiple Window Approach by using two algorithms namely nested graph algorithm and SHriMP fish eye view algorithm [2]. SHriMP is used to magnify the graph so as to study only a particular section of the graph. The following two subsections discusses briefly about the algorithms used by SHriMP. Nested Graphs

Nested graphs are similar to the graphs discussed till now with one exception, it consists of a special node called composite node. The composite node has a unique property that implicitly communicates the parent child relationship in the graph thereby reducing the complexity of the graph to a certain extent [2]. However due to finite screen space the composite nodes needs to be resized depending on the size of the graph. SHriMP Fish eye view

The nested graph approach increases the time complexity of the algorithm as time is spent in construction of the composite node [2]. The SHriMP fish eye view incorporates a simple zoom in-zoom out technique that can be used for inspecting certain features of the graph. At the time of enlarging, the node that needs to be examined grows in size while the other subsequent nodes reduce simultaneously. The exact opposite procedure is observed while zooming out.

When this algorithm works on a grid or tree layout the parallel nodes remain or appear parallel even after enlarging.

Now we will discuss the limitations of the visualization approach. The graph production speed i.e. rendering speed is slow. This drawback seems innocuous for small graph but poses a huge threat to larger graphs [2].

Figure 7 depicts the fish eye visualization technique incorporated by Rigi [13]. The Fish eye view is used to dig into a node. In other words fish eye views are used to study the structure of a particular node.

Figure 7: fish-eye view

2.1.3 Recommendation

A problem often faced while using Rigi is that the user loses the track in the process of program discovery. In cases where there a thousands of lines of code the user gets lost in the numerous nodes [3]. There can be cases where the users forget the main aim and do not end up finding the main abstractions they aims at exploring. They often open several windows of the same view, failing to recognize that these views were already available. In such a scenario a find and explore strategy would be more suitable. This provides a way of emphasizing the relationship of the open windows to the corresponding composite nodes is needed. Some users misinterpret the parent-child relationships in the overview as call or data dependencies. The reason behind this ambiguity is the similarity between the main and overview windows [6]. This might be achieved by simply having different background colors for the different window types.

SHriMP views render a slow response of the interface. This is because of the complex fish-eye view they render. Since SHriMP views are based on direct manipulation, users expecting immediacy are disturbed by the slow response. ShriMP views present a complex view of the entire code [12]. Exploring such views requires much skill and background knowledge of the logic of the program. This defies the purpose of easy program understanding. It is thus desirable to have a filtering approach for these intimidating views. Figure 8 is a complex SHriMP view of IBM's SQL/DS graph in Rigi.[15]

In general, both the Multiple Window and SHriMP interfaces have pros and cons. Thus Rigi should include the ability to easily switch between interfaces when reverse engineering a software system [3].

Fig. 9 illustrates a hybrid approach to improve the overall functionality of Rigi. The figure shows a call dependency tree routed at the main function in a small program written in the C language. The program is written for a linked list and has a a node mylistprintf in it. Typically in conventional rigi browsing a user browses through the SHriMP or the Multi-window view. In such cases the user is burdened with an extra effort of maintaining a mental map of the code and the graphical layout. Hybrid strategies such as the one suggested in Fig. 9 aid code-interpretation by co-relating the graphs immediately with the code [5]. The user can zoom into the mylistprintf node and this gives the source code. Thus the tool maintains the mental-map for the user and reduces the developer's effort.

2.2 Extensibility

Generally all the tools provide fixed set of features. A user using a tool frequently often wishes there could be even more features that can suit his needs. It might occur to a user that he is repetitively doing a set of operations. It could be easier if there could be a shorter set of steps that could enable him to carry out these operations. In another scenario, the user wants to view the graph structures in a different perspective. In short it is desirable that a tool supports extensibility in such a way that a user can tailor it to own requirements.

Rigi supports a programmable reverse engineering approach which supports customization of user-interaction that highly improves the usability of the system. Different users possess different cognitive abilities and thus a static reverse engineering approach would never suit universally.

To meet this goal, a successful reverse engineering environment must provide mechanisms through which users can provide additional functionality. Instead of writing yet another command language, Rigi makes use of Rigi Command Language (RCL).

The SHriMP interface is implemented in the Tcl/Tk

language and currently has been integrated into the Rigi system [8]. Since SHriMP (through Rigi) is end-user programmable, the layout strategy can be dynamically changed. As a result, extending the Rigi editor with new visualization techniques, such as SHriMP, is feasible.

Figure 8: SHriMp view of IBM's SQL/DS graph in Rigi

Figure 9: Hybrid Visualization Approach

3. Integrated Reverse-Engineering Tool Kit

With the increase in the amount of legacy code the importance of understanding such outmoded code increases linearly. Understanding the legacy code involves development of mental models to study the system behavior. There are several reverse engineering tools that assist in understanding legacy codes. Each tool has its own unique predefined structure and environment. The drawbacks of one tool are overcome in the next version of the same tool. However there is no single tool that is termed as the perfect "reverse engineering tool" [14].

Recently researchers have come up with an idea to integrate different reverse engineering tools called RevEngE (Reverse Engineering Environment). RevEngE comprises of three such reverse engineering tools namely Ariadne, Art and Rigi [14].

3.1 Ariadne

Ariadne tool includes features for pattern matching, design recovery and program analysis. It includes techniques that perform code localization and system clustering [14]. Let us see what the above technical terminologies mean.

Code localization involves evaluating feature vector for every program fragment of the system and then using the pattern matching technique to compute the best match between two code fragments.

Clustering is achieved by studying the data types, variable names of the two program fragments and then suggesting means to consolidate data so as to avoid redundancy.

The source code that is fed as input to Ariadne engine is represented in the repository as an Abstract Syntax Tree (AST). Nodes in the AST represent the statements present in the source code while the edges/arcs represent the relationship between the nodes (program statements) [14]. For example a node corresponding to the switch statement will have 'n' outgoing arcs, where 'n' is equivalent to the number of switch cases. To study each node in detail we apply fine grain analysis algorithms such as slicing and metric calculation technique.

3.2 Art

Art is used to analyze textual redundancy that may appear in legacy code. Art extracts substrings from each local context so as to ensure that redundant matches are not missed due to different generation methods. After collecting substrings from the entire system, Art translates the resulting substring into a form that can be used for deeper analysis [14].

3.3 Rigi

Rigi is incorporated in the RevEngE toolset primarily to provide a visualization interface that can be used to reverse engineer legacy code. The output of the Ariadne and Art is then taken as input by Rigi and thus it generates the final graph [14].

However Rigi can be used to perform a wide spectrum of tasks and this is possible due to the scripting facility. The RevEngE toolset is one step towards achieving perfect "reverse engineering tool" [14].

4. Current Practices in Industry

In the industry the tool has to meet certain fixed expectations. These tools used are accountable in producing deliverables that are immediately used by clients. Users thus want their tool to have the below mentioned characteristics [1]:

bring results within the current release cycle

offer industrial strength robustness and scalability

come with long-term support

can be integrated into the normal flow of development

provide evidence of success in other industrial projects

The industry has stringent time-bound tasks. Trying to guess the logic behind the source code is not affordable. Manually creating sub-system abstractions and then excavating the original structure behind lengthy lines of legacy code is a demanding process. Instead, a reverse engineering tool should do these tasks.

Unfortunately, most software documentation is not well structured. Documentation in older systems predominantly consisted of the inline and block comments. This provided a perspective of the data structures and algorithms used. This narrow perspective was the author's point of view of the code and did not serve well in general [4]. Current Industry practices have developed formal methods of documentation which include more intuitive representation of the code rather than simple inline commentary.

So, the importance of Reverse Engineering increases in the current industrial practices. Reverse Engineering has gained a foothold in the nascent stages of software development in order to support long-life of the source codes.

5. Ongoing Research in Rigi

Rigi's programmable environment can be integrated with a popular World Wide Web browser to support hyperstructure hotlists [12]: an approach to managing link complexity, organizing conceptual themes, and aiding Internet navigation through the use of multiple virtual webs.

5.1 Hyperstructure Hotlists

When one attempts to understand a large body of information, the external structure of the system is just as important as the internal structure of each individual object. The previous statement holds true especially when the number of objects associated with the system is larger than individual objects.

The above phenomena of identifying objects and understanding their structure and then integrating such objects with the world wide web is termed as hyper structure hotlists [12].

Hyper structure hotlists differ from the traditional hotlist in the fact that the traditional hotlists are used for comprehending only the internal structure of the objects associated with the system [12]. Whereas "hyper" structure hotlists are used for analyzing the entire structure of the system and not just the internal artifacts.

Thus hyper structure hotlists is a concept that is used to reverse engineer a web browser using Rigi to identify relevant objects associated with the system. Let us understand what problem does hyper structure hotlist tries to solve. Almost all recent browsers provide the facility to include bookmarks that stores the links to websites of user's interest. This is not a big issue if the number of links were few. However when the number of links (bookmarks) associated with the browser is large then such a list becomes unwieldy. The above problem is referenced as the "hyper structure problem" [12]. Since there is no efficient method to cluster the links the users tend to forget why a particular page is referenced from the bookmark list (hotlist).

Researchers have come up with a solution that integrates a reverse engineering tool such as Rigi with each web browser [12]. Rigi displays a structured tree where the nodes in the tree represent the links and the arcs of the tree represent the cross references between different links. Rigi uses coloring techniques to highlight the cross referencing between the leaf nodes and the subnets [12]. Figure 10 shows the hyperstructure hotlists for the bookmarks of a netscape browser.

Figure 10: representing hyper structure hotlists.

Thus monitoring the cross references using a graphical representation is convenient to a user rather than analyzing the cross references represented in textual format.

6. Conclusion

Rigi is an interactive, visual tool designed to help developers better understand and re-document their software. Rigi includes parsers to read the source code of the subject software and produce a graph of extracted artifacts. However the parsing process still poses the biggest problem to the Rigi tool suite. Better feedback of the parser to the user during the parsing process and tighter integration into the rest of the Rigi tool suite might help to avoid similar problems in the future.