Unepected Mathematics: Benford’s Law and Other Surprising Distributions
4961 words (20 pages) Essay in Mathematics
18/05/20 Mathematics Reference this
Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
Benford’s law is a surprising mathematical concept which at first seems rather counter-intuitive. It explains the distribution of the leading digits in a large set of data. A simple example displays its initial peculiarity. Imagine we look at the current share price of every company on the FTSE 350, an index of the 350 largest UK companies. Within this set of data, each share price has the possibility of the first digit being any number between 1 and 9 (
$d\in \{1,\dots 9\})$. The average person would believe that each price would have an equal chance of starting with each number between 1 and 9, so if one of the 350 prices was selected at random the probability that the first digit was 1 would be about
$\frac{1}{9}$(11.1%) and the probability of the first digit being a 9 would also be about
$\frac{1}{9}$(11.1%). However, this is in fact not the case at all. If you were to calculate this, the probability of the first digit being 1 would actually be closer to 30% than 11%. Furthermore, the probability of the leading digit being a 9 would only be just over 4%! I initially read about this strange distribution in an economics context. I was keen to investigate the mathematics behind this and test the limits and applications of it. The physicist Frank Benford first discovered this in 1938 when he noticed that pages closer to the beginning of his log tables were increasingly becoming more worn than those closer to the end, meaning that they were searching for numbers that were starting with a 1 much more than higher numbers. Benford started to test this theory across newspapers, populations and river lengths. He found more or less the same result every single time. Numbers starting with a 1 turned up approximately 30% of the time, almost all the time. Eventually, this was formed into a mathematical law which includes an equation which displays the exact probabilities of each number between 1 and 9 occurring as the leading digit. The aim of this investigation is to explore the explanations and applications behind Benford’s Law and to touch upon other equally strange distributions and examine if they link to Benford’s law in any way. NB: For the purposes of this exploration, all logarithms will be assumed as base 10 if the base is not stated.
Why Study Leading Digits?
Before discussing the law’s explanations and applications, it is first useful to understand how leading digits are studied in the world of mathematics and their importance. Scientific notation (also known as standard form) is a pivotal part of this. The notation follows the system that a positive number x can be expressed in the form
$S\left(x\right)\times {10}^{n}$in which
$1\le \left|S\left(x\right)\right|<10$, meaning that the number is expressed first as a value of 1 or greater but less than ten, multiplied by 10 to a given exponent which reaches the initial number. For example, in this format the number 865,000,000 would be expressed as
$8.65\times {10}^{8}$. The initial number before the exponent is known as the significand.^{1} This system allows numbers spanning many different magnitudes to be expressed in a similar fashion, for example comparing atomic radii to planetary radii.
An Overview of the Law
After Benford’s experiments, he discovered the approximate percentages for the probability of each number occurring as the leading digit. The pattern followed in a way as shown on the graph below:
2
For the purposes of explaining the law and the derivation of its equation, the leading digit between 1 and 9 is represented as d with the probability of the digit being the leading digit is represented as P(d).
The basic explanation of the law states that the space between digit d and d+1 is proportional to the quantity of P(d) on a logathrimic scale. A logarithmic scale is one which is non-linear^{3} and based on orders of magnitude meaning each increasing unit on the scale is the unit on the previous value multiplied with a constant.
Deriving the Equation of the Law
By understanding logathrimic scales we can begin to better understand how the percentages in Benford’s law are derived. When we are working with many values spanning multiple orders of magnitude, as Benford’s law does, the basic explanation states that:
The leading digit d will be 1 when
$\mathrm{log}1\le \mathrm{log}d<\mathrm{log}2$and similarly, d will be 9 when
$\mathrm{log}9\le \mathrm{log}d<\mathrm{log}10$. On a linear scale the difference between 2 and 1 would be equal to the difference between 10 and 9. However on a logathrimic scale the differences are as follows:
$\mathrm{log}2\u2013\mathrm{log}1=0.301$$\mathrm{log}10\u2013\mathrm{log}9=0.0458$
Logarithmic Interval |
Difference of Interval |
$\mathrm{log}2\u2013\mathrm{log}1$ |
0.301 |
$\mathrm{log}3\u2013\mathrm{log}2$ |
0.176 |
$\mathrm{log}4\u2013\mathrm{log}3$ |
0.125 |
$\mathrm{log}5\u2013\mathrm{log}4$ |
0.0969 |
$\mathrm{log}6\u2013\mathrm{log}5$ |
0.0792 |
$\mathrm{log}7\u2013\mathrm{log}6$ |
0.0669 |
$\mathrm{log}8\u2013\mathrm{log}7$ |
0.0580 |
$\mathrm{log}9\u2013\mathrm{log}8$ |
0.0512 |
$\mathrm{log}10\u2013\mathrm{log}9$ |
0.0458 |
If we apply this to all the numbers between 1 and 9 the results are as follows:
These log calculations are in fact the probabilities of each number from 1 to 9 occurring as the leading digit! This can be seen on the graph of the results below, which follows the exact same pattern as the graph shown above.
From this we can see that the probability P(d) is given by the log of the digit subtracted from the log of the digit plus one i.e:
$\mathit{P}\mathit{\left(}\mathit{d}\mathit{\right)}\mathit{=}{\mathbf{log}}_{\mathit{10}}\left(\mathit{d}\mathit{+}\mathit{1}\right)\mathit{\u2013}{\mathbf{log}}_{\mathit{10}}\left(\mathit{d}\right)$
=
${\mathbf{log}}_{\mathit{10}}\mathit{\left(}\frac{\mathit{d}\mathit{+}\mathit{1}}{\mathit{d}}\mathit{\right)}$
Further explanations and details of the Law
Aside from the initial explanation of the law and the derivation of the equation, there are more detailed explanations and perspectives to the law and how it works. One of these is the Geometric Explanation.^{1}This approach to the law follows the idea that in a model of a number n in a constant growth rate, n will spend a greater amount of time ‘hanging around’ the lower digits than the higher ones. To better explain this, I will refer back to an economically minded example of a geometric series, compound interest. A geometric series is a series in which there is a constant ratio r between each term u and (u+1). ^{4}Therefore the deductive rule follows as U_{n}=U_{1}r ^{n-1}. For this compound interest example, let us assume I invest $2000 in 2019 for my retirement in a very generous savings account with an annual 7% interest rate for the long term of 60 years. This function is modelled by the equation
${U}_{\mathit{n}}=2000\times {1.07}^{n}$. Note the absence of the subtraction of 1 from n in the exponent. This is due to the fact that we wish to calculate the value as compounding at the end of each year so the subtraction of 1 is not useful. This model shows that the balance in the savings account at the end of the 60 years will be
$2000\times {1.07}^{60}=\$115,892.85$. However, we are more interested in where the balance lies at the end of each year over the whole period rather than just the end. See the appendices for the full balance sheet at the end of each year. When we examine this table from a Benford perspective, we can see that the balance does indeed tend to stay towards low numbers for the first digit and quickly accelerates through the higher numbers. For example, the period between when the balance is $10,000 and $20,000 lasts from 10 years from 2043 to 2053 during which the first digit is 1 on the balance sheet. The table below illustrates this for the point between $10,000 and $99,000 in the account.
Value interval |
Time Spent (years) |
$10,000-$20,000 |
10 |
$20,000-$30,000 |
6 |
$30,000-$40,000 |
4 |
$40,000-$50,000 |
4 |
$50,000-$60,000 |
3 |
$60,000-$70,000 |
3 |
$70,000-$80,000 |
3 |
$80,000-$90,000 |
2 |
$90,000-$99,000 |
1 |
From a reflective standpoint, perhaps this is why we as humans focus so heavily on financial achievements which essentially get us ‘back to one’ such as setting targets at one million or one billion. A more recent example of this is Apple making headlines for being the first company to achieve a net worth of $1trillion. Maybe the fact that financially we spend such a long time at these lower values makes them more psychologically valued to us as humans once we reach them at the next order of magnitude.
Scale Invariance
Another aspect of Benford’s Law which adds to its uniqueness is it’s universality. What I mean by this is that if a situation follows Benford’s law, it will tend to continue to follow Benford’s law no matter what operators are imposed upon it. For example, if I took the data set used in the previous explanation, the list of the investment balance year upon year and converted it into every single commonly used currency in the world, from the Euro to the Pound and Vietnamese Dong, the chances are the data would continue to satisfy Benford’s law in almost every single currency. Since each value in the list would have the same operation applied to it, this means it is still likely to span many orders of magnitude which is the main condition for Benford’s Law to apply.
Extensions of the law: Digits beyond the first
Another aspect of Benford’s Law is that it can be extended to further digits rather than just the first digit of the number.^{5 }It is possible to calculate the probability of a number occurring as the 2^{nd} or 3^{rd} digit. To do this we must manipulate the equation into a series in sigma notation which allows us to express a series of additions in one notation. If we have a digit between 0 and 9 (NB: zero can now be included as it is not possible to have zero as the first digit of a number but it is certainly possible to have it as a following digit) then the probability that this digit will be the n^{th} digit in a number is given by the equation:
$\sum _{\mathit{x}\mathit{=}{\mathit{10}}^{\mathit{n}\mathit{\u2013}\mathit{2}}}^{{\mathit{10}}^{\mathit{n}\mathit{\u2013}\mathit{1}}}{\mathit{log}}_{\mathit{10}}\mathit{(}\mathit{1}\mathit{+}\frac{\mathit{1}}{\mathit{10}\mathit{x}\mathit{+}\mathit{d}}\mathit{)}$
In which d represents a number between 0 and 10 and n represents the n^{th} digit which the probability is wanted to be calculated for. However, this is only particularly useful up to the 3^{rd} digit as once the calculation is past the 3^{rd} digit the numbers follow a more expected distribution and tend closer to each number appearing 10% of the time i.e truly random.
Applications of Benford’s Law:
Benford’s law has one major application which makes it particularly useful, fraud detection. Due to the fact that Benford’s law is present in every aspect of life when numbers are distributed, any large sets of data which do not follow Benford’s Law could be argued to be fraudulent, particularly financial data. Programs which test for compliance with Benford’s Law are often used by tax institutions or banks during audits or to check if data submitted to them is possibly fraudulent. Benford’s Law was also used as part of fraud detection in the 2009 Iranian election^{6}. This raises the question as to if it is moral to use mathematical laws in legal proceedings or as evidence in prosecutions. This morality debate is even more prevalent when there is a certain degree of uncertainty within the law, or limitations to the law as will be discussed below.
Limitations to Benford’s Law:
Not every single set of data will be able to follow Benford’s law, for example telephone numbers, human height in meters or feet and page numbers of small documents. Benford’s law also does not apply to data which is generated by humans themselves or written within specific ranges. The chance of Benford’s Law being useful highly depends on how many orders of magnitude the data set spans. For example, the earlier example of human height in meters or feet doesn’t follow the law as it only spans one order of magnitude. In meters almost all human heights will start with a 1, possibly with a few that start with 2 or less than 1. The same applies if human height is measured in feet, there would have to be a human over 3 meters tall in order to exceed the 10ft boundary into the next order of magnitude! Also, if there an extremely large number of orders of magnitudes, then the law also may not apply. For example, Benford’s law wouldn’t apply to the data set of all real numbers, as clearly if these numbers continue to go on forever then then the probability for each digit from 1 to 9 to be the leading digit will be the same.
Further analysis: Similar Laws?
Benford’s law is surprisingly not alone in its strangeness. Contrary to what one may think after reading about the uniqueness of Benford’s law, there are a few other patterns and principles which exist through many different areas of life. Some of these have mathematical patterns which could link to Benford’s law. One of these is Ziph’s Law which relates to language and literature rather than numerical data. Ziph’s Law states that in a large set of words, if the most frequent word is taken, the second most frequent word will appear half as often as the most frequent word and the third most frequent word will appear half as often as the second most frequent word. Essentially, the frequency of a word will be inversely proportional to how often the word appears overall. For example, the most common word in the English language is the word ‘the’ which accounts for 7% of all words appears twice as much as the second most common word ‘of’ which accounts for 3.5% of all words. An equation for Ziph’s law has been created in the context of the English language which states that in a distribution of X number of words in the language, the frequency of each word occurring in relation to its rank of how common it is follows this equation:
$\frac{\mathit{1}\mathit{/}\mathit{k}}{\sum _{\mathit{x}\mathit{=}\mathit{1}}^{\mathit{X}}\mathit{1}\mathit{/}\mathit{n}}$
In which X is the number of words in the English language and k is their sequential rank of how common they are in the language. Some have argued that Benford’s law is simply a special case of Ziph’s law however I personally believe they should be held as separate laws. Ziph’s law could better be considered as literature’s version of Benford’s law.
Conclusion
Overall, Benford’s Law is deeply rooted into the way numbers are distributed in the real world and it’s useful applications cannot be denied. The law which at first seems strange and unexplainable can indeed be explained and analysed as I have demonstrated throughout this investigative report. The geometric analysis behind Benford’s Law is key to its explanation. Understanding Benford’s Law is now extremely useful as a student deeply interested in the field of economics and finance. I had always been curious into how institutions such as HMRC are able to detect fraud and prosecute those who avoid tax or commit fraudulent actions. Through conducting this exploration, I have been able to gain a greater understanding of mathematics while also being able to explore this economic aspect of fraud detection. Overall, I now have a greater understanding of how mathematics can connect with other fields, even literature which the average person might say is the ‘furthest you can get from mathematics’ is seen to have a mathematical distribution through Ziph’s Law. This exploration continues to demonstrate how mathematics is rooted in every part of life even if we cannot notice it at first
Bibliography:
1 = http://assets.press.princeton.edu/chapters/s10527.pdf
3= “Slide Rule Sense: Amazonian Indigenous Culture Demonstrates Universal Mapping Of Number Onto Space”
4= Fannon, P. (2012). Mathematics for the IB Diploma Standard Level. Cambridge: Cambridge University Press. p155
6= https://physicsworld.com/a/benfords-law-and-the-iranian-e/
7= https://aclweb.org/anthology/W98-1218
Appendices:
Table of financial investment:
Year |
Balance |
2019 |
$2,057.18 |
2020 |
$2,201.19 |
2021 |
$2,355.27 |
2022 |
$2,520.14 |
2023 |
$2,696.55 |
2024 |
$2,885.31 |
2025 |
$3,087.28 |
2026 |
$3,303.39 |
2027 |
$3,534.63 |
2028 |
$3,782.05 |
2029 |
$4,046.79 |
2030 |
$4,330.07 |
2031 |
$4,633.17 |
2032 |
$4,957.50 |
2033 |
$5,304.52 |
2034 |
$5,675.84 |
2035 |
$6,073.15 |
2036 |
$6,498.27 |
2037 |
$6,953.14 |
2038 |
$7,439.86 |
2039 |
$7,960.65 |
2040 |
$8,517.90 |
2041 |
$9,114.15 |
2042 |
$9,752.14 |
2043 |
$10,434.79 |
2044 |
$11,165.23 |
2045 |
$11,946.80 |
2046 |
$12,783.07 |
2047 |
$13,677.89 |
2048 |
$14,635.34 |
2049 |
$15,659.81 |
2050 |
$16,756.00 |
2051 |
$17,928.92 |
2052 |
$19,183.94 |
2053 |
$20,526.82 |
2054 |
$21,963.70 |
2055 |
$23,501.16 |
2056 |
$25,146.24 |
2057 |
$26,906.47 |
2058 |
$28,789.93 |
2059 |
$30,805.22 |
2060 |
$32,961.59 |
2061 |
$35,268.90 |
2062 |
$37,737.72 |
2063 |
$40,379.36 |
2064 |
$43,205.92 |
2065 |
$46,230.33 |
2066 |
$49,466.45 |
2067 |
$52,929.11 |
2068 |
$56,634.14 |
2069 |
$60,598.53 |
2070 |
$64,840.43 |
2071 |
$69,379.26 |
2072 |
$74,235.81 |
2073 |
$79,432.32 |
2074 |
$84,992.58 |
2075 |
$90,942.06 |
2076 |
$97,308.00 |
2077 |
$104,119.56 |
2078 |
$111,407.93 |
2079 |
$115,892.85 |
If you need assistance with writing your essay, our professional essay writing service is here to help!
Find out moreCite This Work
To export a reference to this article please select a referencing style below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: