Systematic Literature Review In Software Engineering Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Systematic literature review, also called systematic review, was introduced in software engineering as a research strategy which helps in finding, evaluating and combining the results from relevant empirical studies related to a topic of interest [4]. By conducting the systematic review, it produces the evidence to help practitioners adopt proper technologies and prevent improper technologies so that the software product and/or process can be improved [17]. This paper intends to provide a general knowledge about systematic review and recommendation for improvement of current systematic review guidelines produced by Kitchenham [1]. By reviewing several related articles both in software engineering and medicine, including guidelines how to conduct the systematic review, experiences from conducting systematic review, studies that use systematic review as a research methodology and other associated articles, I found that there are still some limitations and also unclear issues, especially for inexperienced people. Thus, I try to include the good lessons learnt, useful best practices, and helpful information, which could be valuable for anyone who want to perform the systematic review.

1. Introduction

Empirical software engineering requires the scientific employment of qualitative and quantitative data to realize and improve the software product and process. To conduct empirical studies, the process steering steps to be performed and research strategy are very essential elements [11]. After utilizing the traditional method such as experiments, case studies and surveys to obtain the studies outcomes, the systematic literature review, also called systematic review, is then used to identify those relevant empirical studies and combine results to provide reliable evidence for researcher in a particular topic of interest.

This paper aims to provide the general idea of systematic review, process to conduct the review according to the existing guidelines and suggestion to improve the guidelines. The paper is structured as follows: section 1 gives an introduction to systematic review, explains the reason why systematic review is needed, discusses the differences from narrative review and pros and cons of systematic review. Section 2 explains process of conducting systematic review in detail according to Kitchenham's guidelines [1]. Since there are still some limitations and gaps could be improved in the guidelines, section 3 described those limitations and provides suggestions for improvement. Lastly, section 4 presents some concluding remarks.

1.1 What is Systematic literature review?

Systematic literature review is the methodology that researchers use to identify, aggregate and evaluate all available information to specific research topic [1, 17]. The use of a SLR is mainly proposed to provide an unbiased and systematic approach to answer a structural question which focuses to research topics. The studies or articles that used for systematic review are called primary studies, the systematic review itself is considered as the secondary studies [1]. The review process is very formal with strict procedures and sequence. Each step of the process must be well-defined and can be reproducible by other researchers. The selection criterion of primary studies and type of acceptable information are also defined before hand and thoroughly reviews. Identification of the primary studies is done by the use of searching technique. All search methodology and selection criteria are transparent for the reader by described in review protocol. Thus, this means other readers can replicate this review in systematic way. Finally, selected primary studies are analyzed and aggregated, in which forming answers to the research questions [1].

1.2 Why Systematic reviews is needed?

Originally, systematic review has been introduced in medical and clinical field as a way to help clinical practitioner finds the answer to their question relating to their practices. Before systematic review was implemented, researchers have difficulties in several areas that traditional review cannot satisfy their need. There was no structural way to review primary studies and to ensure that all related evidence has been evaluated [15, 16]. To summarize and evaluate the knowledge from primary studies is very difficult and also because each study may use different design and organization of information may be vary. The result from traditional review is difficult to evaluate if there is conflicting result [15, 17]. Many times, reader of the review have some doubt to the quality of researchers' work since the review method is not clear and explicit. In addition, due to broad scope of traditional review, the result of the review can easily be bias as the selection choice by reviewer may not be consistency without well defined guidelines of the selection [15]. Thus, systematic review methodology has been introduced to address these difficulties.

Systematic review process have increasingly recognized and replaced traditional reviews in many academic fields including software engineering discipline since it provides effective way to summarize and determine research result to help those researchers in their studies [17]. The researchers can utilize systematic review to provide background which assists the generation of new studies that previously difficult in the past with traditional review [1]. With systematic review, it helps reducing reviewer bias since it uses objective and reproducible criteria for primary sources selection with strict assessment of the resources. It also help s researchers to combine result from several small studies in which can help them conclude more precise and dependable result. Moreover, it assists in identifying gap that researchers could make further investigation [1].

1.3 Differences between Systematic and Unsystematic review

There are several key differences between systematic review and unsystematic review or traditional review. This section will discuss each difference of these two types of review.

First, the search for primary studies of these two reviews is widely different. With traditional review, the scopes of primary studies search usually have wide range of coverage with no strict rules how to search. In contrast, the search of primary studies in systematic review is very focused on the topics, researchers need to identify question and predefined search rules that can be reproducible by other researchers [6, 15].

Second, the primary sources selection process is also different. With systematic review, selection process has to be predetermined; researchers must decide which type of resources is acceptable for the review so that the selection process is explicit and transparent, and then select primary sources with these strict rules and criteria. Unlike systematic review, traditional review selection process does not have specified criteria on the selection process and dependent on researchers experiences to select the primary sources [1, 17].

Finally, the evaluation method of systematic review followed strict evaluation rules to evaluate each of selected primary sources while traditional review is variable based on individual researchers methods [1].

1.4 Advantages and disadvantages of Systematic review

Though there are several advantages that researchers can utilize systematic review method in their studies, there are still some disadvantages remain comparing with traditional review. Researchers will need to aware and select review type that is appropriate with their situation. This section will discuss both advantage and disadvantage of systematic review.


- One of the major advantages of the systematic review is that it improves the precision and completeness of the result. Because systematic review process uses well-defined method to search and select for primary studies, this will result in less biased sources comparing with traditional review [1, 6].

- The systematic review process helps researchers to identify the consistency or inconsistency of the result from its selection process. If result is consistent, it provides strong answer to researchers' questions. If not, researchers can identify the gap and then study the variance. This benefit will be difficult to identify through traditional review process [1, 17].

- Systematic review can apply statistic technique (meta-analysis) to help combining data from more than one primary studies which will give more precise answer to researchers question than using only one primary study. With traditional review, it will be more difficult for researchers to compare and conclude the result from several primary studies to answer their focus question [1, 15].


- Because of its limited focus and predefined method, as it is the advantage of systematic review, it can be disadvantage in some cases. Since the process does not allow comprehensive coverage of the evidence to answer research topics. Thus, researchers must carefully determine their condition and select technique appropriately. In general, traditional review is more useful if researchers want to obtain more broad perspective of their research topic comparing to systematic review [15].

- Traditional review is more useful if the researched topic is the brand new topic, since not much primary studies will be available. Traditional review will have more information coverage than systematic review [15].

- Due to its strictness in process and methodology, systematic review process usually needs more time and effort from researchers to conduct the review [1].

2. Systematic review guidelines

As first introduced in medical research studies, systematic review has also been brought to software engineering field by B. Kitchenham (2004). Since software engineering research method is less strict and less experimented-dependent comparing with medical studies, the revision is needed in the process of systematic review to adapt to the characteristic of software engineering studies. The guidelines emphasize the distinction to medical systematic reviews and guide the software engineering researchers how to perform a systematic review. There are several activities involves in the systematic literature review and they are specified in the guidelines. In this section, I refer to the guidelines suggested by Kitchenham [1] which describes a systematic review process into three main phases: Planning the review, Conducting the review and Document or Reporting or the review. Each of them consists of order of stages. The implementation each phase involves iteration, feedback and refinement in order to move to next stage and finally reach satisfactory outcomes as illustrated in figure 1. Please note that there are some more optional stages described in the guidelines. But I only refer to the stages that are essential to be performed

Fig. 1. Phases in Systematic Review

2.1 Planning the review

In the first phase of the review, the final outcome which would be produced is a review protocol. It is considered as a plan which defines the research questions that will be addressed by the review and basic review procedures. The planning phase consists of the following stages:

- Identification of the need for a review

- Specifying the research question(s)

- Developing a review protocol

- Evaluating the review protocol

2.1.1 Identification of the need for a review

Before conducting the review, the reasons why the systematic review could answer the research questions or could be useful for further must be given. Originally, the need for a systematic review initiates from the demand to fairly sum up all existing information about some phenomenon. Probably the reasons are to represent more general conclusions instead of just obtaining from individual studies, or may be carried out to lead up to further research activities. In particular, Kitchenham notes that the researchers should first make sure that a new systematic review is really needed before starting the review [1]. And they should consider finding any existing systematic reviews relates to the topic of interest. It could be possible that they do not even need a new systematic review if there existed. Besides, the already published systematic review could help construct a protocol.

2.1.2 Specifying the research question(s)

This is the most important stage of the systematic review process. The research questions can be seen as a goal of the review since they drive the whole process of systematic review. To be in detail, the search process is conducted with the aim to classify primary studies that discuss the research questions. Furthermore, the data extraction and analysis processes must extract and synthesize the data in such a way to answer the questions.

Kitchenham notes that asking the right question is the important issue in any systematic review. She provides some guidelines questions to help in construct the correct questions. She also discusses the characteristics and various types of research questions which are proper for the systematic review. For detailed structure of the questions, the PICOC (Population, Intervention, Comparison, Outcomes, and Context) criteria are used to define the review question elements. Besides, Kitchenham discusses about the different kinds of experimental designs to derive the acceptable studies. In particular, she argues whether studies based on primary studies of one particular type should be accepted for systematic reviews in software engineering.

2.1.3 Developing a review protocol

A review protocol is a concrete plan which details the process and strategy to perform a particular systematic review. A pre-defined protocol is crucial to minimize the possibility of researchers' bias. The protocol contains all the essentials of the review and some other planning information. The elements described in protocol are: background, research questions, planned search strategy, study selection criteria and procedures, quality assessment criteria and procedures, data extraction strategy, data synthesis strategy and project timetable.

Kitchenham suggests that the review protocol should be piloted during its development to find mistakes in the data search procedures, in which it can help to improve the review methodology.

2.1.4 Evaluating the review protocol

Because the review protocol is significant for the systematic review, it should be evaluated before execution. The evaluation procedures are done by asking researchers or experts to review the protocol and the agreements among all reviewers must be reached.

2.2 conducting the review

This is the execution phase which follows the plan defined in the review protocol. The final outcomes of the systematic review are generated in the end of this phase. To conduct the review, the subsequent stages must be carried out:

- Identification of research

- Selection of primary studies

- Study quality assessment

- Data extraction and monitoring

- Data synthesis

2.2.1 Identification of research

Since the systematic review aims to find all possible of available publications relating to the research question and make conclusion in a fair manner, how to generate the search strategy and publication bias are the critical issues that Kitchenham discusses in this stage. The search strategies defined in the protocol are used to discover the relevant publications. In general, the search strategies are done iteratively by trial searches using different combinations of search terms and in consultations with relevant experts. Typically the search term can be obtained by separate the research questions into individual elements base on PICOC criteria and then create a list of synonyms and relevant words. Another good way to derive the search term is by analyzing the heading of journals. Kitchenham notes that the search strategy should be designed to detect articles that report pessimistic results in order to illustrate researchers' bias. Other major concerns regarding the systematic review are the completeness and repeatability. Kitchenham suggests that the review process must be transparent and replicable. By providing sufficient detail when documenting the review, this enables the study to be replicable and allows the external reader to evaluate the search terms. And the search terms should be documented properly.

2.2.2 Selection of primary studies

The purpose of the selection process is to assess if obtained primary studies have any actual relevance to the research questions. So that we can identify ones that provide direct information for the review. This process should proceed according to the plan defined in the protocol. Kitchenham explains that the study selection is a multistage process. Firstly, base on the research questions, the researchers should define the study criteria to point out the direct relevant studies. These inclusion and exclusion criteria should be piloted to ensure the reliability and correctness when interpreted. The exclusion criteria should be applied first in order to exclude any irrelevant studies. Kitchenham suggests keeping the record of excluded publication with the reason of exclusion just after all unrelated publications have been filtered out. Then the inclusion criteria are applied to remaining studies. Kitchenham also mentions about how to increase trustworthiness of the process in an attempt to decrease the possibility of bias.

2.2.3 Study quality assessment

After applying the inclusion and exclusion criteria to select the primary studies, the quality of primary studies is also considered significantly important to be assessed. Kitchenham describes how important of the quality assessment for example, to allow researchers to evaluate differences in the study, and to weight the importance of each study when synthesize the results. She also discusses about the hierarchy of the evidence described in medical guidelines. Base on their assumption, this hierarchy can be used to control the sorts of study included in the systematic review and it is a ground for the initial quality evaluation. To explain, the top of the hierarchy is the evidence from systematic reviews and controlled experiments, which is believed in the medical area that it is more reliable than the bottom level evidence such as the evidence from expert opinions. However, this argument was later proved that it is not always true. After that, Kitchenham describes about how to define and to use the quality instruments. Basically, checklists are used to assess quality in detailed. To construct the checklists, factors that could bias study results are considered.

2.2.4 Data extraction and monitoring

Once the primary studies have been selected, the next step is to extract the relevant information. The extraction process should be performed as defined in review protocol which will describe the extraction forms used to collect the data from the filtered primary studies and also the procedure of data extraction. Kitchenham discusses what should be contained in the data collection form. Not only the information to facilitate the answer of review question and the criteria for quality assessment are included, but also the basic information such as name of reviewer, date of performing data extraction and publication detail must be given. Importantly, the extraction form must be piloted before implementation. Kitchenham suggests that there should be two or more researchers perform data extraction independently. And they have to set agreements either by consensus or by using additional researchers to resolve disagreements on the data. If each paper cannot be assessed by at least two researchers, some checking technique, such as random sample of primary studies, has to be employed to ensure that the data are extracted correctly.

Monitoring data is also important to perform in this stage. Kitchenham notes that multiple publications of the same study should not be contained in the systematic review since it can lead to bias. It is sometimes needed to contact the authors to make sure if those publications refer to the same study or not and also to derive the required information if the data obtained from studies are missing or we need some unpublished data.

2.2.5 Data synthesis

Data synthesis aims to gather and summarize the data extracted from selected primary studies. Same as other stages, the activities to be performed should be defined in the review protocol. Base on Kitchenham research on various options of data combination from several types of studies, sensitivity analyses is suggested to perform to find out the impacts on the synthesis results where some studies are higher quality than others.

2.3 Reporting the review

The purpose of this last phase is to write the results of the review. The guidelines explain that there are three main stages in this phase:

- Specifying dissemination mechanism

- Formatting the main report

- Evaluating the report

The final report does not only include the answers to the intended review questions, but it also needs to specify the dissemination strategy so that the researcher can expose the result efficiently. Kitchenham presents seven mechanisms to disseminate the systematic reviews results. They are:

a) Academic journals and/or conferences

b) Practitioner-oriented journals and/or magazines

c) Press Releases to the popular and specialist press

d) Short summary leaflets

e) Posters

f) Web pages

g) Direct communication to affected bodies

Generally, the results are documented in two forms of report: in a technical report or in a conference or journal paper. After writing the reports, it is necessary to perform evaluation. Kitchenham discusses evaluation technique for each type of reports. One effective technique is organizing a peer review. The structure and contents of report papers can be seen in the Kitchenham's guidelines [1].

3. Improvement suggestions on Systematic review guidelines

This section aims to present limitations and to provide suggestion for improvement on each step of the systematic review guidelines on software engineering. The recommendations are collected base on lessons learned and experiences from various articles which utilize systematic review as literature review technique.

The study reported in [7] reveals that one of the significant problems of publishing low quality systematic review is that some people conducting systematic review do not understand exactly what systematic review is and how to perform it. So, they end up with having no obvious research questions, explicit search strategy and so on.

The subsequent suggestions should be always kept in mind before starting and while performing the systematic review [7, 8]:

- Thoroughly review and study the guidelines, e.g. Kitchenham's guideliness.

- Review several SLR examples and experiences to help you understand the process.

- Make sure you understand each step of activities and be reasonable for everything you do.

- Record the decisions made during conducting the systematic review as much as possible since this information will be needed for writing the final report.

According to [2], they suggest to perform training on systematic reviewing in the very first step in order to make the reviewers familiar with the specific terms in the area where they will conduct systematic review. Furthermore, this helps the reviewers get better understanding about the review process and activities.

3.1 Planning the review

During this phase, the main activities to be performed are specifying the purpose of conducting a systematic review, formulating research questions, developing and evaluating a review protocol.

3.1.1 Identification of the need for a review

In order to identify the clear statement of the objective of the review, the researchers should use the checklist to help pointing out the reasons and ensure their needs. Regarding this issue, there are several useful checklists provided in Kitchenham's guidelines [1].

In addition to identify rationale of the review, Staples and Niazi [3] collected information from case studies, surveys, and reports to check if the intended research questions are possible to be answered by systematic review. This is founded very useful since they come to know what is common and uncommon in the research questions. Sometimes people thought the question is normal and feasible to be answered by other researches but in the later phases when they try searching for the related literature, they found that their question is very uncommon and then they have to discard those questions.

Another crucial point is that the researchers should try to identify the existing systematic review related to their topics of interested in order to avoid conducting a duplicate review. Nevertheless, it is quite difficult to find published systematic review in the area of software engineering comparing with medicine. Since there is no powerful scientific database which collects systematic reviews of studies related to software engineering empirical studies, like the Cochrane ( which stores a large amount of systematic reviews of medical research. Although there are currently several services providing access to sources of software engineering publications, there are still many restrictions of those publications as followings [17]:

- The available studies is limited and disintegrated properly since many researchers in this field are focused on their own style to generate result rather than structural review process.

- It is difficult to combine the result of software engineering review because the quality of review is so variable with no agreed standard for systematic review for this field.

- There is no guidelines that is well-accepted, though some guidelines has been proposed but it neither addresses all necessary topics nor provided sufficient detail.

3.1.2 Specifying the research question(s)

As mentioned before, the research questions are specified as part of the review protocol and will be used to construct the search string for searching related primary studies. Basically they will be revised repeatedly during piloting the review protocol and should not be changed when the protocol is committed.

The most important point when formulate the research questions is to make them as obvious and concrete as possible. Other than structuring the question by using the PICOC criteria which is shown in [1], it is essential to specify rationale to formulate a particular question. Tabulate the question and purpose such in table 1 could be helpful.

Table 1. Research question

Research question Purpose



Brereton et al [4] recommend that during protocol construction, researchers should anticipate to refine their research question both for increasing their understanding and making the automated search more effective. There are several systematic reviews, for example the - Systematic literature review of guidelines for conducting systematic literature reviews' in [4], which firstly define a few research questions. Later on, after investigation some information sources, those questions are extended in more detailed questions

Regarding [5], the research questions are not only the questions that needed to be answer by the review, but also the question providing some idea in the area of such a topic for better comprehension. This is also confirmed by Staples and Niazi [3] since their research questions are part of a larger research project. By performing the systematic review on these questions will help them understand better in the project background. By selecting clear and narrow research questions, it helps confining the scope of a systematic literature.

[4] proposes another method that may help to scope the research question. That is a systematic pre-review mapping study. The idea is to map out sorts of studies relating the systematic review question have been conducted. The mapping process can be considered as a quick data extraction but the studies described are not very details. The further information about mapping study can be found at [12].

3.1.3 Developing a review protocol

As explained in the section 2, the protocol provides information of the plan for conduction the review, including, for example, the procedure to be performed, the search strategy for selecting primary studies, the allocation of reviewers to some specific activities and the quality assessment criteria for evaluate primary studies. Lacking of a protocol, some process such as the selection of primary studies or data analysis and synthesis may be motivated by researcher bias [14]. And because one of the key features of the systematic review is repeatable, a well-documented review protocol is needed to guarantee the reproducibility of the review.

Many experiences on systematic review show that developing a review protocol is an iteration process that needs several revisions to get the complete protocol. Thus, the researchers should expect protocol changes, take a long time and allot appropriate time for it.

Brereton et al [4] suggests all members in systematic review team should participate actively in constructing the review protocol, in which helps all of them get insights about the protocol and understand the process of data extraction.

Additionally, piloting the review protocol is highly suggested to be performed. Not only because it supports discovering misunderstandings and mistakes in the data extraction and aggregation process, but also it may specify that the researchers need to change the method planned to deal with the research questions.

As mentioned above, the search strategy must be documented in the protocol, enabling the reader of a review to evaluate how accurate and complete this is. However, due to the restriction of existing software engineering search engines that are not well-supported systematic reviews like in medicine, the software engineers should conduct resource-dependent searches. In particular, they might have to use different search string for each searchable sources which have different form interfaces and search syntaxes [4].

The followings are recommendations to improve the search strategy.

- Searches should be performed to title as well as abstract [6]. However, before making a decision to accept or reject a primary study, searching on summary and content is founded very useful. Since mostly in software engineering, the abstract and titles are not much indicative [6] and not so dependable for primary studies selection [4].

- Any search strategy should be made up of multiple keywords and it is essential to use a various combinations of terms to get the very successful search [6].

- Using more standardized vocabulary will promote the search results [6].

- For some software engineering topics, the publications in related fields should be searched as well. i.e. information systems, psychology, economics, quality, artificial intelligence [6]

- To get the most out of relevant publications, synonyms of the main search term should be used for search [6].

- When you want to search from study type, try applying search on the synonyms of that study type as well [6].

- It is likely that adding other more general terms to synonyms of the key search term detects more relevant studies. However, this tends to increase the number of irrelevant articles as well. So, it should only be used when there are a number of systematic review resources available in detecting and rejecting irrelevant articles [6]

- It is not necessary to search on all the search fields because it is not considerably benefits the response and also requires a big effort [6].

- Construct search strings using Boolean - - AND- - to link the key terms and - - OR- - to group synonyms [4]. An example could be the following:

(experiment OR - empirical study- ) AND

('software cost estimation- OR 's oftware effort estimation- )

- Include search fields that typically contain the key terms of the study type such as title and abstract [6].

- To get the key search terms, beak down the research questions into many single words pertaining to the sorts of study, technology of interested, and the response test [4].

- After deriving the key search terms, using various combinations of those terms to perform trial searches [6].

- Specifying the year of the paper first published in the search string can help lessen the amount of irrelevant articles [4].

- Be careful when using the "Basic" or "Advances" search forms because some search engines may produce different results even though the key searched terms are the same [3].

Some search resource encloses various fields which each using different terminology. Some search term, particularly any acronym, may return very large number of articles in other fields, leading the resource to be exceeded and finally the returned list of publications will be truncated [3].

Assessment of the search response is necessary. One popular and useful approach is to randomly pick some papers from the list of papers derived by using the selection criteria. The search criteria and keywords would be considered as good enough when there are some papers that are going to be selected based on the selection criteria. Otherwise, refine the search criteria and query strings.

Another approach to check the completeness of the search strings is using the primary studies that are related. If many of the known primary studies are detected in the search, the search keyword is good enough. If not, the search criteria and keyword have to be refined.

3.1.4 Evaluating the review protocol

Since the review protocol is the key element for conducting the systematic review, it needs to be assessed. In addition to the peer-review by internal team members, the followings suggestions may help in protocol evaluation

- Try to perform independent assessment and validation from external reviewer, especially from the expert in related field that having experiences in systematic review.

- Share the protocol within community to get more idea and suggestions[7]

- Piloting data extraction process and the form specified in the protocol can be a good way for evaluation and also assist in performing the data extraction process.

3.2 Conducting the review

Before review implementation, it is beneficial to validate search strategy and to pilot study selection. This is considered necessary particularly when you are not familiar with the systematic review's topic.

During the review process, [8] advice to record information related to search and selection process as much as possible since this information is needed for writing final report. For example, search strategy used, databases searched, date of coverage provided by each database, order of precedence in removing duplicate publications, and number of duplicated items removed.

3.2.1 Identification of research

In this execution phase, the researchers have to perform search base on the search strategy defined in the protocol. Since the search strategy can be vary depending on how much restrict or complete of the results we need. To explain, in some case the researchers require the articles published in a certain period and only from some specific sources. So the search strategy has to specify the archive to be used and the years to be searched. While in some case the researchers need to derive a complete set of articles as much as possible. This mean the restricted search cannot be used and the search strategy being used is different.

According to the fact that the terminology used in software engineering is still lack of standardize which obstruct to effective searching. Therefore, the review team should try searching from several sources since it is impossible to find all primary studies from a single one. In some case, for example [9], recommends that manual search from software engineering journals is as necessary as automated search from digital libraries in order to get the most complete set of papers relevant to software cost estimation. Nevertheless, there are some limitations of manual search which need to be taken into account, e.g. required search effort, possible to miss some relevant studies.

Keep in mind that the search terms defined in the protocol might not be used with all search engines because each source is built differently. Brereton et al [4] found that in some search engines, the realization of a Boolean search string is also dependent on the order of the terms but independent of brackets. Thus, it is acceptable to use different groups of keywords for each search engine, each derived from the terms initially specified in the protocol.

3.2.2 Selection of primary studies

In this stage, the purpose is to select the papers relating to the topic of interest by investigation the primary studies collected from the previous stage. Typically, the process should be done as followings:

1. There should be at least 2 researchers firstly review the studies discovered by the initial searches in order to reject the irrelevant papers. The paper should be selected if they could not be agreed on among the researchers.

2. The remaining papers which are not rejected will be reviewed against the inclusion/exclusion criteria. Any disagreement has to be resolved.

Because the results of several experiments can be presented in different articles and, therefore, the reviewers should carefully review on any duplicated empirical studies. Another suggestion assist in identifying empirical studies in software engineering is analyzing the bibliographical references of other empirical studies by means of keyword search. For example [10],

(elicitation OR "requirements gathering- OR

"requirements acquisition- ) AND (capture OR empirical OR experiment

OR study OR review OR evaluation)

Additionally, highlighting relevant statements during the selection process will facilitate data extraction to be faster and easier.

3.2.3 Study quality assessment

To evaluate the quality of each selected study, one basic approach is using the quality criteria or checklist. One popular criteria being used for quality assessment is proposed by DARE (The CRD Database of Abstracts of Reviews of Effect) which is already included in Kitchenham's guidelines[1]. It consists of four basic questions:

1. Does the literature search tend to cover all related studies?

2. Are the inclusion and exclusion criteria described properly?

3. Do the reviewers evaluate the quality/validity of the integrated studies?

4. Are the basic information sufficiently described?

Each paper should be assessed by at least 2 reviewers and then the answer of each question is scored according to whether the study presents obvious, unambiguous findings based on evidence. Any disagreements have to be discussed until agreements are reached. For some question which is hard to score or unknown, try to contact the authors of the paper and ask them to provide further information in order to re-score question properly.

The complete checklists for assessing the quantity of quantitative analysis and quality of quality analysis have been provided in the Kitchenham's guidelines [1]

After the studies have been assessed, it is essential to validate the results. There are 2 basic validation exercises that could be done.

1. Inter-rater reliability: the inter-rater reliability test can be run on papers founded in initial search. One group of primary researchers investigates each paper in detail to detect the relevant papers that should be accepted. Another group of independent researchers examines some accepted and rejected papers which are picked randomly. Then they both cross-check if they agree on those papers. The high level of agreement increases significant confidence in acceptance/rejection decisions [2, 13].

2. Independent assessment: By having an independent expert in the topic of interest to verify how each paper addressed the research questions. Generally, this method would be performed as a final validation on the set of accepted papers. Any disagreements between primary researchers and independent expert must be discussed. If there are some papers that could not be agreed on, the third arbitrator is needed to judge whether the papers should be selected or not.

3.2.4 Data extraction and monitoring

Once the data extraction strategy and forms are defined, they should be piloted before real implementation in order to reduce the possibilities of bias [4]. There are two fundamental approaches for piloting data extraction.

1. The medical standard recommend to have each reviewer individually extracts data from an article and then compares the two extraction forms. Any disagreements have to be discussed and resolved.

2. Bereton et al (2005) suggests this second method by having a single reviewer extracts the data and another reviewer checks the data extraction. [4] found that this method is a bit faster than the first one and thus it may be helpful if there are a great number of primary studies to review. However, Kitchenham [5] refers to one of systematic literature review which suggests that this method can cause problems when data is complex or there are a large number of primary studies.

To correctly record the information derived from selected studies, the followings are advisable to do:

- Follow the procedures described in the protocol and ask for explanation if the guidelines are unclear or do not understand.

- Do not let any weeks in the instruction pass to other reviewers, and thus allow the extraction process being undertaken incorrectly.

- Use some management software designed to handle bibliographies and references detail for each study, like Endnote (

- Classify papers according to the properties into categories would be helpful in data analysis and aggregation phase.

- Record how each study address research questions in a separate form

- In order to conclude data properly, expect to refine the data model during the data extraction process if it is not so difficult to do [3].

- Pay attention to duplicate papers- they present results of the same study even though their titles or abstractions are different [5]. Keep only the most complete one and leave out the rest.

- Do not hesitate to contact the author of the study to ask for further information if some data is missing or detailed data is required.

3.2.5 Data synthesis

In this stage, the data extracted in the previous stage will be combined in an appropriate way for answering the research questions. In medical area, most of systematic reviews focus on formal meta-analysis of quantitative data. In contrast to IT and software engineering field, they tend to be purely qualitative [4]. Anyway, the Kitchenham's guidelines [1] is mostly explained at a comparatively high level and does not state clearly how much should be done in data synthesis process. So it would be advantageous to provide more suggestion in the guidelines about categorization and conclusion of qualitative data. And also to have more information of statistic methods should be integrated in guidelines for quantitative analysis and further details about how to perform meta-analysis.

There are several methods exist in literature to deal with data aggregation, such as content analysis, meta-ethnography, meta-analysis, case surveys or vote-counting techniques [2]. The final outcomes of the synthesis will be a set of pieces of knowledge answering each review questions, in which together will become the final results of the systematic literature review.

One basic approach to cope with data synthesis is recording data in tabular format and the researchers should also make clear how the aggregated data really address the research questions.

3.3 Reporting the review

Documenting report for the review is not easy task. Although the review protocol can be used as the source for the final report as suggested by the medical guidelines[1], one problem usually occurs is lacking of necessary record keeping during the review process. The solution might be that the team review members make common decision on what to be written [4]. Thus, to prevent this problem, it is advisable for review team to keep record of detailed and formal information produced throughout the review process.

Brereton et al [4] notes that 'The software engineering community needs to establish mechanisms for publishing systematic literature reviews which may result in papers that are longer than those traditionally accepted by many software engineering outlets or that have appendices stored in electronic repositories' . I totally agreed with this because another problem often found in writing report is the researchers are forced to write entirely report within specific length limit. Thus, it would be useful to publish some details of a review on a trusted website.

4. Conclusion

This paper presented an introduction in the systematic literature review as methodology used for synthesize the empirical studies in software engineering. It explained the motivation for performing a systematic review. It also drew the differences between conducting a systematic review and unsystematic one as well as its advantages and disadvantages. Most importantly, it pointed out the limitations and problems founded in existing guidelines and many systematic review articles. Thus, the suggestions are provided, as a main part of the paper, to help people especially the novice who would like to construct their work by conducting a systematic review has better understanding on the review process in which brings about effective implementation and finally achieve the reliable research results.