This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The checkpoint involved writing codes for branch prediction using single bit predictor, two bit predictor, two level GAg predictor and two level PAg predictor. The report contains the explanation of algorithm used to write programs for above techniques and the analysis report of the result obtained from different techniques.
The benchmarks used in this checkpoint are 401.bzip2, 403.gcc, 429.mcf and 445.gobmk. Bzip2 is is benchmark coming under the category of compression. Each input set is compressed and decompressed at three different blocking factors ("compression levels"), with the end result of the process being compared to the original data after each decompression step. GCC is a C language optimizing compiler and in the benchmark GCC executes different instructions. 429.mcf benchmark written in C, is derived from MCF which is a vehicle scheduling program for public mass transportation. Gobmk benchmark is based on artificial intelligence and is derived from the game Go. This benchmark executes various instruction related to analyzing of a game in any arbitrary position.
Algorithms for prediction techniques:
The algorithm for single bit predictor and two bit predictor are the same. The only differcence is that the two bit predictor uses a saturating counter to decide whether next branch will be taken or not. Whereas in single bit predictor the decision is made based upon the result of previous execution of that branch.
The Branch Target Buffer(BTB) is selected to be of length 1024 as given in the assignment. Each BTB buffer along with instruction address also contains a valid bit and bits to have a count of replacement of instruction address operation.
The Gag predictor algorithms contain a history register of length 8 bit which contains the history of the previous 8 branch outputs . The branch history register is a global history register. The pattern history table is selected to be of length 256. The branch histroy table is updated using a saturating 2-bit counter ( like two bit predictor) . The branch prediction for a branch is predicted using the status of saturating counter in history table entry corresponding to the current branch history register value.
The Pag predictor contains a global history table of 4096 blocks. The History register is not global, instead it is a per-address history register for each address. The history table is updated according to saturating counter. The value in current branch history register corresponding to values in the block corresponding is used to predict the outcome of executing branch.
Description of Programs:
Single bit predictor- The program for single bit predictor is written in the file onebit.cpp. Branch Target buffer represented by BTB in program is of size 1024 and each buffer contain items valid, prediction, tag and replacement. Valid bit is set when a branch is first set and the BTB is updated. Prediction gives the prediction of next branch which is predicted through one bit prediction. Tag contains the program counter in case of a taken branch or the address of the branch target. The output file gives the number of branches executed, number of taken branches that were encountered, number of correct prediction, number of BTB replacements and percentage of correct prediction.
Two bit predictor- The program for single bit predictor is written in the file twobit.cpp. The most part of this algorithm is same as that of single bit predictor. Here along with valid, prediction and replace count, each BTB is associated with a saturating 2-bit counter. The value of counter is updated after every branch execution using an update function. It is incremented for taken branches and decremented if the branch is not taken. The value of counter is used to predict the outcome of current executing branch. Values 10 and 11 predicts branch as taken and values 00 and 01 predict that the branch is not taken.
GAg predictor: The program is written in the file Gag.cpp. The predictor program contains a branch target buffer having space to store the address of branches, a valid bit and space to store count of buffer replacements. There is a global history register and a global history table. History register stores the outcome of previous branches, whether taken or not taken and a history table contains a saturating counter for each pattern of history register. History table is of length 256 and history register thus should have 8 bits for representing history of branch outcomes. The outcomes are updated similar to the two-bit predictor. It is updated in saturating counter of history table and the outcome is appended to the right most bit of history register.
Pag predictor: The program is shown in the file Pag.cpp. The branch target buffer contains a branch target buffer with address space called tag, valid bit buffer replacement count. The predictor has history register dedicated to each of the branch addresses and a global pattern history table. Updates of branch execution are updated in corresponding history register, since there are history registers for each branch addresses. The result is also used to update saturating counter of history table entry corresponding to current history register pattern. The history table contains 4096 entries and hence history register is of 12 bits. The valid bit in above programs is used to indicate whether the particular entry is a valid data obtained from execution of previous branches or it is an unchanged data after initialization.
The benchmarks 401.bzip2, 403.gcc, 429. mcf and 445.gobmk are used for testing prediction accuracy. The execution of onebit.cpp, twobit.cpp, Gag.cpp and Pag.cpp produced the output files onebit.so, twobit.so, Gag.so and Pag.so respectively. With the help of the pin command the files above have been executed. This execution gives .sh files as output and further execution of .sh files gives the final output files having information like number of branches encountered, prediction accuracy. The following command shows the example for executing .so file using pin command for 10 billion instructions. We use the following command and run for 10 billion instructions.
bench-run.pl --bench spec-cpu2006:int:401.bzip2:train --build base --prefix "pin -t
/home/students/sapo1601/twobit/obj-intel64/twobit.so -o twobit_bzip.out -l 10000000000 --" --log LOG
The outputs of the execution are present in the output files 5593_bzip2.out, 5593_gcc.out, 5593_mcf.out, 5593_gobmk.out and so on for other prediction techniques. They have the information like total number of branches executed, number of taken branches, number of correct predictions, prediction accuracy, number of replacements and valid bit, replacement count of each branch target buffers.
TASK 2: Idea of a new scheme for branch prediction
The prediction techniques two bit and GAg have their own advantages and disadvantages. While predicting the result of execution of a branch the two bit predictor considers only the preceding results of one particular branch and does not consider the results of all other branches. In GAg the prediction is based on the result of previous branches as well. This can sometimes be disadvantageous because some independent branches will experience interference by other branches in this scheme. To avoid this problem there are hybrid prediction schemes which takes the better of two prediction techniques.
The execution of both schemes to identify the best of two requires more resources and memory and may not be feasible for programs where majority are correlated branches and very few are independent branches. Executing both techniques for all branches will not help in optimizing resources and this can be avoided by monitoring the outputs of each branch.
The technique in this scheme is basically a GAg scheme with certain modifications. In this technique interference is reduced for the independent branches by applying bi-model prediction scheme only to those branches. The program in written in file gag_edit.cpp. The algorithm for this scheme is described below:
1. First start the prediction for all branches using GAg scheme. Monitor the results of each branch.
The number of wrong prediction in each branch is stored in separate variable.
( BTB[index].wrong is used in program to keep count of error in prediction in each branch).
2. An error is set level for the error in each branch to reach. If the error in a branch crosses that error level then it is inferred that the branch is experiencing interference by other branches. From this instant the prediction scheme is changed for that particular branch.
3. The level at which prediction technique for a branch has to switch from GAg to bi-model technique should be calculated considering various applications, and accuracy required. This technique can be developed more as an application specific instead of a general technique.
4. When the error level is reached, bi-model prediction technique is used on it for the next succeeding iterations. If the address in BTB is changed after that the error is reset to 0 and GAg technique is started again for that branch
In the program gag_edit.cpp the error for each branch is monitored using variable BTB[index].wrong. The maximum error level is selected as 500. The branch history register and pattern history table is updated even while the branch is predicted using two bit prediction technique. This is because to make sure that other branches are not affected by these changes.
The program was executed for few benchmarks. The prediction accuracy was observed to be better
than the GAg scheme. I have observed that this techniques will perfom better only if the application has some correlated branches and few independent branches. If the application has more number of independent branches the two bit predictor would perform better.
The Table 2 shows the improved performance when GAg technique is modified to give better output for independent branches.
We can observe in Table 1 that the performance of GAg technique has improved after modifying to suite few independent branches in the applications. The prediction accuracy has improved in 401.bzip2,
403.gcc and 445.gobmk applications whereas it has reduced negligibly in 429.mcf. The performance can be improved by modifying error level according to application. This technique can be used when there are few independent branches among many correlated branches, as only bi-model or only GAg will not be enough and applying both of them requires more resources. This technique first detects the branches that cross the error level and applies bi-model technique for only those branches and GAg technique for remaining correlated branches.
The results of the programs are shown in the table. The prediction accuracy varies with respect to the application and the techniques used for prediction. The prediction accuracy for each technique for each application is calculated and shown in Table 2.
Miss rate is given by the ratio of number of branch target address that were not found in BTB to the total number of branch instruction executions. The branches miss rates for each application is shown in the following Table 3.
The miss rate is independent of the prediction technique. It depends on number of buffers in BTB and type of the benchmark used.
Increasing bits in History register of GAg scheme:
The GAg scheme includes a global history register and a global history table. The behavior of all branches is updated into same history register and same pattern history table. Thus the effect of increasing length of global history register can produce either improved or negative effects. In some cases slightly increasing the bits in history register might increase the number of correlated branches in history register and while predicting the accuracy might improve. But in most of the cases the number of correlating branches for the currently executing branch will be small and in those cases increasing number of bits in history register will badly affect the prediction accuracy. This is because as the bits in history register increases the number of unrelated branches will also increase causing interference to current branch. In case of independent branches increasing the bits will increase the number of interfering branches and the accuracy falls.
The following table shows the performance of GAg prediction technique when the number of bits in history register is 8, 10 and 12 bits.
From Table 4 we can see that in above 3 applications the performance of GAg scheme is degrading while the number of bits in history registers is increased. The reason behind this is that the interference caused by unrelated branches which affects the prediction of currently executing branch.
Application with most global correlation:
We can observe that in single bit and two bit predictor the prediction is based on the behavior of only that particular branch in previous instances. The output of other branches is not taken into consideration to predict the output of currently executing branch. But in GAg and PAg the output of other branches is also taken into consideration to predict the output of currently executing branch. Thus there is no global correlation in single bit and two bit predictors whereas GAg and PAg uses global correlation for branch prediction.
Among GAg and Pag, GAg uses global history register and global pattern history table whereas PAg uses only global pattern history table and a separate history register for each branch. The output of a branch in PAg is updated to only corresponding history register and other history register are unaffected by the output of this branch. Though the updating of this output in pattern history table will affect prediction of other branches this is a quite small amount of influence. Whereas in GAg the output of a branch is updated into global history register and global history table which influences the output
of all branches by more extent compared to PAg. Thus the effect of global correlation is more in GAg. Due to this reason the performance decreases in Gag.