This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The realization of functions such as log () and antilog () in hardware is presented, due to their importance in several computing applications. We present an approach to compute log () and antilog () in hardware. Our approach is based on a table lookup, followed by an interpolation step. The interpolation step is implemented in field programmable gate array (FPGA), results an area-efficient, fast design. We perform interpolation efficiently, without the need to perform multiplication or division, and our method performs both the log () and antilog () operation using the same hardware architecture. We compare our work with existing methods, and show that our approach results in significantly lower memory resource utilization, for the same approximation errors. Also our method scales very well with an increase in the required accuracy, compared to existing techniques.
Keywords- Field-programmable gate arrays (FPGAs), floating point arithmetic, logarithmic arithmetic, VLSI.
THE functions log () and antilog () finds uses in many areas such as digital signal processing (DSP), 3-D computer graphics, scientific computing, artificial neural networks, logarithmic number systems (LNS), and other multimedia applications. Our approach provides a good solution for field programmable gate array (FPGA)-based applications that require high accuracy with a low cost in terms of required lookup table (LUT) size. Hence, the use of dedicated hardware to compute log () and antilog () is
of great value. Two methods those are well researched and used for the generation of the logarithm function are digit-recurrence algorithms and LUT-based approaches. Out of these methods the digit-recurrence methods are efficient from an area and accuracy perspective but have longer latencies
and convergence problems. The LUT-based methods are widely used to approximate the logarithm and antilogarithm functions. LUTs combined with polynomial approximations. The main objective of all these works is to utilize minimum circuit area while retaining the accuracy of the approximation.
The main idea of our approach is to use LUTs along with linear or quadratic interpolation and approximates the multiplication required for interpolation, while computing a more accurate log () and antilog (). We show that the most cost effective implementation is a LUT with a linear interpolation, implemented in a manner that optimizes the area and delay while providing good accuracy. Our method is generate the logarithm of a number and also shows that a similar methodology can be used to generate the antilogarithm of a number. The number format used is similar to the IEEE 754 single precision floating point format that has 32 bits. The leading bit is the sign bit, followed by an 8-bit exponent E and a 23 bit mantissa M. The value of a number X, represented in this format is given by
X=1.m*2E-127 , 0<m<1 (1)
We also assume that the number is positive since the logarithm of a negative number does not exist. We target 10 or more bits of accuracy in this work.
II. PREVIOUS WORK
The approaches for the approximate the binary logarithm of a number was given by Mitchell. In his method, the logarithm of a number is found by attaching the mantissa part of the number to the exponent part. This method is extremely easy to implement but gives an absolute error as high as 0.086 which is only 3.5 bits of accuracy. There are various authors who have reduced this error by using error correction techniques implemented by simple logic gates without involving multiplications or divisions. Although these methods are better than Mitchell's approach, they all give less than seven accurate bits Compared to these methods; our approach is applicable for applications that require a much higher accuracy. In recent times, most elementary functions are generated using LUTs. This approach, initially proposed by Brubak involves the computation of a function using a single LUT. The accuracy of the approach depends solely on the size of the LUT used. Another method improves Brubaker's method by concatenating the lower order bits of the mantissa to the value looked up from the table, gives only a small error improvement of one bit. Kmetz proposes to store the error that occurs due to the Mitchell approximation in the table and add it to the mantissa of the number. This results in a further improvement in error (by3 bits) with the overhead of an add operation.
In our approach, we follow the same scheme of storing the error values of the Mitchell approximation in a table. Other methods make use of bigger LUTs to give more accurate results, with a good speed of computation. We try to find the optimum table size to use for a required accuracy. Our approach focuses on using a smaller size table along with a simple linear function to interpolate between the table values. The main contribution of this work is to approximate the multiplication required for the interpolation by a method that is fast and results in a small error. The multipliers in our target FPGA are restricted to a speed of 135 MHz, and by avoiding their use; we can achieve much greater speeds, as our results indicate. Also our method lends itself elegantly to perform antilog () operation using the same hardware architecture and accuracy as log ().
Fig1: error due to Mitchell approximation
We show that a good improvement in error performance can be achieved with the use of a small LUT and some additional hardware. We compare our results with existing methods.
III. OUR APPROACH
This work uses a LUT-based approach combined with a linear interpolation to generate the logarithm of a number. The multiplication required in this linear interpolation is avoided, resulting in an area and delay reduction. The idea is described in the following sections.
A. Interpolation Approach
Mitchell's approximation is given by the following equation:
Log2(1+ m) ≈ m, 0<m<1 (2)
The error due to this approximation is given by:
Em = log2(1+ m) - m, 0<m<1 (3)
The error curve shown in Fig. 1 is sampled at points (depending on the size of the LUT required). These values are rounded depending on the width of the word required and stored in the LUT. The LUT is addressed by the first t bits of the mantissa portion of the number and other remaining k-t bits gives the decimal value of mantissa portion. We investigate the option of interpolating between the values stored in the table.
This is done by the following equation:
log2(1+ m) ≈ m+a+(b-a). n1, 0<m<1 (4)
Here m is the mantissa part, a is the error value from the table accessed by the first t bits and b is the next table value adjacent to a. Also k is the total number of MSBs in the mantissa used for the interpolation and n1 is the decimal value of the last k-t bits of the mantissa. Here the size of the LUT is 64 word and the width of each word is 16 bit. We investigate the case where a limited number of interpolations are done. We tabulate the maximum error incurred when the previous algorithm is implemented for t, t+1, and so on until t+8 mantissa bits and ignoring the rest. This is the same as doing different levels of interpolation from 0 to 8.
1 2 3…………………t…………….k
2t values for LUT Decimal value
Index Stored in k-t bits
We find an error value from the table based on the first bits and interpolate between this value and the next value based on the remaining bits.
The third term in (4) requires a multiplication. In order to circumvent the multiplication (which is expensive in terms of area and delay), we investigate the option of interpolating repeatedly between any two adjacent values stored in the table. The essential idea is that the multiplication of (b-a) and n1 is simplified by taking the antilogarithm of the sum of the logarithm of (b-a) and the logarithm of n1 . In order to perform this operation with a small delay, we consider the following options.
1)log2(b-a) may be approximated by either of the following two options:
a) Mitchell approximation;
b) LUT: For the LUT option the constants log2(b-a) for each of the intervals is stored in the original LUT. Recall that we stored the error values obtained by using a Mitchell approximation for the log function. The log2(b-a) values are stored along with the error values and are indexed by the same address lines.
2) Antilog of (log2(n1) + log2(b-a)) may be approximated by either of the following two options:
a) Mitchell approximation;
b) LUT: To obtain the antilogarithm of a number by this method, we need to construct another LUT. The error due to the Mitchell approximation of the antilogarithm function is stored in this LUT. This antilog LUT is utilized to compute the antilogarithm of a number, since the multiplication of (b-a) and n1 is performed by taking the antilogarithm of the sum of the logarithm of (b-a) and the logarithm of n1.
Fig 2: Block diagram of log engine
Fig 3: Architecture of interpolation block
The above figure (2) shows the block diagram of the log () engine which is essentially an implementation of (4). The architecture of the interpolator block is shown in Fig. 3. Here we only show the operations on the mantissa m. One of the adders is a three input fixed point adder as shown in Fig. 3. It is implemented as 2 adders. Since (b-a) takes both negative and positive values for 0<m<1, the log2(b-a) values stored in the lookup tables are actually the logarithm of the absolute values of(b-a) . It is found that (b-a) changes sign from positive to negative for m > 0.4427. This is equivalent to comparing the decimal value of the first six bits of the mantissa with 28, as shown in Fig. 3. Hence, if the first six bits have a value greater than 28, the comparator block sends a control signal to the ADD/SUB block instructing it to perform a subtract operation. The leading one detector (LOD) block is used to find the log2(n1). The LOD detects the first bit that has a value 1.It then uses the remaining bits as the mantissa portion to access the lookup table. Since there are 7 bits given as input to the LOD in this example, the LOD finds the first position that has a value 1and the remaining bits which can be as wide as 6 bits is used to access the LUT. The decimal part of log2(n1) which is indicated by the position of the first bit with value 1, is directly sent to the three input fixed point adder to be added to the decimal part of log2(b-a). The output of the adder after the antilog stage has to be shifted to the right or left depending on the decimal output of the fixed point adder. Also there is a 27 term in the denominator of (4) and this is accounted for by a constant right shift of 7 bits at the output of the adder of the antilog LUT. In Fig. 3, the blocks representing a LUT for log2(b-a) and a LUT for log2(n1) are identical and can be implemented using a dual port RAM. Also note that, the address of each of the three LUTs shown in Fig.3 has a width of 6 bits. In each case the 6 bit address word is obtained by rounding off the value of the next LSB in the word from which the address word is derived. The round () operation is implemented using an adder . All the quantities that are rounded off in this manner are annotated as such in fig 3.
The following are the most important applications of logarithmic () computations.
The computations of log () and antilog () are uses in many areas such as digital signal processing (DSP), 3-D computer graphics, scientific computing, artificial neural networks, logarithmic number systems (LNS), and other multimedia applications.
It provides a good solution for field programmable gate array (FPGA)-based applications that require high accuracy with a low cost in terms of required lookup table (LUT) size.
It gives more accurate result up to 10 bits in fractional part without need to perform multiplication and division operation.
Its gives area efficient and fast design without delay.
The implementation of functions such as log() and antilog () in hardware is important, due to their prevalence in several computing applications. In this paper, we present an approach to compute log() and antilog() in hardware. Our approach is based on a LUT, followed by an interpolation step. The novelty of our approach lies in the fact that we find the log() of a number efficiently using interpolation, without the need to explicitly perform multiplication or division. While computing the logarithm of a number, we perform the multiplication required during the interpolation step, by utilizing an antilogarithm operation. The antilogarithm operation is also performed by utilizing a LUT. We compare our work with existing methods, and show that our approach results in significantly lower memory utilization for the same accuracy. Also our method scales extremely well to accommodate higher accuracies. One interesting addition to our method would be to explore the use of LUTs of sizes other than powers of 2. This will give our method more scalability.