Benford’s law: Mathematics delights when results are unanticipated

Dhruv Sharma
4 min readFeb 9, 2019

In an American travel and leisure company, the audit director discovered something that looked odd in claims made by the health care department.
First, two digits of healthcare payments showed an unusual spike in number’s starting with 65 on conformity to Benford’s law.
And surprisingly a careful audit revealed 13 fraudulent checks for the amounts b/w $6500 and $6599, thus confirming the fraud.

In his book The Golden ratio: the story of phi,
“Mathematics more often tends to delight when it exhibits the unanticipated results rather than expected ones”, quotes Mario Livio.
Before giving the delightful mention and above example of Benford’s law.

What is Benford’s law:

Benford’s law is an observation and predictions about the frequency distribution of leading digits in many real-life sets of numerical data.
Simply stated, in many naturally occurring collections of numbers, the most significant digit is likely to be small.

A dataset of numbers obeying the law has 1 as the most significant digit in nearly 30% of the dataset while 9 will be having only a 5% occurrence.
As opposed to my expected distribution where I was expecting each digit to occur in 11% of the dataset.

Although the law is generalized for multiple digits as well, for most significant digit predictions are as below:

  • 1 - 30.1%
  • 2 - 17.6%
  • 3 - 12.5%
  • 4 - 9.7%
  • 5 - 7.9%
  • 6 - 6.7%
  • 7 - 5.8%
  • 8 - 5.1%
  • 9 - 4.6%

What are datasets that obey the law and under what conditions:

  • Larger the dataset, more are the chances of it following the Benford's law
  • Dataset spanning across the multiple levels of magnitude are highly likely to follow the law.
  • Some of them being, a dataset of various populations or the death toll in various earthquakes or various financial numbers of a firm etc.
  • Surprisingly, even Fibonacci numbers obey the law.

Type of datasets that rebel against the law:

  • Phone numbers, quite obvious as they start from some definite prefix
  • Human adult height’s and weight’s
  • IQ scores, and a lot more.
  • Your programming language’s rand() function doesn’t comply with Benford’s law as well.

Why a dataset may not obey the law:

  • Law doesn’t hold well when numbers are drawn from a defined range.
  • It applies not to properly random sources of data but to data produced by random life processes.
  • More or less it requires processes that exhibit exponential growth, just as Fibonaccis demonstrate.

Benford’s law in Controversial Iranian election 2009:

Mahmoud Ahmadinejad the current Iranian president had beaten his main rival Hossein Mousavi in general elections of 2009. By 12th June protests broke out in Iran following the announcement that Ahmadinejad had won 63% of the votes.

On 14 June the Iranian Ministry released the results for 366 voting areas giving Ahmadinejad over 24 million votes and Mir-Hossein Mousavi around 13 million votes, there were large irregularities in results and people were surprised.

Although there were concerns raised on the situation in Iran by world leaders, Ahmadinejad was named the president once more on August 5.

The thing of interest here was observations and work of cosmologist Boudewijn Roukema, from the Nicolas Copernicus University in Poland.
Where he noticed a strange anomaly that 7 occurs as a first digit more often than would be expected by Benford’s law in the votes for Mehdi Karroubi, who came in third place.

Roukema went on to publish a research paper on the same.
P.S. I haven’t read it yet 😅

Benford’s law in “The accountant” starring Ben Affleck:

Watched entire movie for a couple of scenes [timeline: 30 to 43 minutes].
Damn! the movie was nice 😄
Affleck’s character combs through years of financial data and identifies fraudulent transactions in the data of a firm thanks to the unusual frequency of the number 3 in their data.

As quoted and reviewed by Ernie Cooper(a forensic accountant who investigates internal fraud’s in firms and a former FBI special agent) on theWRap:

“It was very realistic when he came in and went to the conference room and he gets all those boxes”.
“That’s very, very classic, You look at the numbers, you look at the years, you look at trends. You look at revenue, you look at expenses. That analytics is what he was doing mentally.”

Wolff(Ben Affleck) identified a series of suspicious transactions by the unusual frequency of the number 3 in their dollar values.
Cooper said that was a use of Benford’s law, which lays out the predicted distribution of numbers in a naturally occurring set of data

“You apply Benford’s law to all your data,” Cooper said.
“Say your disbursements to vendors for four or five years. You’re going to have millions of transactions. And what it does — exactly what he does — it identifies patterns that are against the normal. Because numbers go a certain way.”

If you have come this far, this may interest you:

--

--

Dhruv Sharma

SDE at Amazon, exp with scala, spark, aws, ruby on rails, Django . Morning runner. Wanna be eveything at once... :D