C++: Benford’s Law

Bjarne-stroustrup
 

Benford’s law, also called the first-digit law, refers to the frequency distribution of digits in many (but not all) real-life sources of data. In this distribution, the number 1 occurs as the first digit about 30% of the time, while larger numbers occur in that position less frequently: 9 as the first digit less than 5% of the time. This distribution of first digits is the same as the widths of gridlines on a logarithmic scale. Benford’s law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.

This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.

A set of numbers is said to satisfy Benford’s law if the leading digit d (d \in \{1, \ldots, 9\}) occurs with probability

P(d) = \log_{10}(d+1)-\log_{10}(d) = \log_{10}\left(1+\frac{1}{d}\right)

For this task, write (a) routine(s) to calculate the distribution of first significant (non-zero) digits in a collection of numbers, then display the actual vs. expected distribution in the way most convenient for your language (table / graph / histogram / whatever).

Use the first 1000 numbers from the Fibonacci sequence as your data set. No need to show how the Fibonacci numbers are obtained. You can generate them or load them from a file; whichever is easiest. Display your actual vs expected distribution.

//to cope with the big numbers , I used the Class Library for Numbers( CLN ) 
//if used prepackaged you can compile writing "g++ -std=c++11 -lcln yourprogram.cpp -o yourprogram"
#include <cln/integer.h>
#include <cln/integer_io.h>
#include <iostream>
#include <algorithm>
#include <vector>
#include <iomanip>
#include <sstream>
#include <string>
#include <cstdlib>
#include <cmath>
#include <map>
using namespace cln ;

class NextNum {
	public :
	NextNum ( cl_I & a , cl_I & b ) : first( a ) , second ( b ) { }
	cl_I operator( )( ) {
		cl_I result = first + second ;
		first = second ;
		second = result ;
		return result ;
	}
	private :
	cl_I first ;
	cl_I second ;
} ;

void findFrequencies( const std::vector<cl_I> & fibos , std::map<int , int> &numberfrequencies  ) {
	for ( cl_I bignumber : fibos ) {
		std::ostringstream os ;
		fprintdecimal ( os , bignumber ) ;//from header file cln/integer_io.h
		int firstdigit = std::atoi( os.str( ).substr( 0 , 1 ).c_str( )) ;
		auto result = numberfrequencies.insert( std::make_pair( firstdigit , 1 ) ) ;
		if ( ! result.second ) 
		numberfrequencies[ firstdigit ]++ ;
	}
}

int main( ) {
	std::vector<cl_I> fibonaccis( 1000 ) ;
	fibonaccis[ 0 ] = 0 ;
	fibonaccis[ 1 ] = 1 ;
	cl_I a = 0 ;
	cl_I b = 1 ;
	//since a and b are passed as references to the generator's constructor
	//they are constantly changed !
	std::generate_n( fibonaccis.begin( ) + 2 , 998 , NextNum( a , b ) ) ;
	std::cout << std::endl ;
	std::map<int , int> frequencies ;
	findFrequencies( fibonaccis , frequencies ) ;
	std::cout << "                found                    expected\n" ;
	for ( int i = 1 ; i < 10 ; i++ ) {
		double found = static_cast<double>( frequencies[ i ] ) / 1000 ;
		double expected = std::log10( 1 + 1 / static_cast<double>( i )) ;
		std::cout << i << " :" << std::setw( 16 ) << std::right << found * 100 << " %" ;
		std::cout.precision( 3 ) ;
		std::cout << std::setw( 26 ) << std::right << expected * 100 << " %\n" ;
	}
	return 0 ;
}
Output:
                found                    expected
1 :            30.1 %                      30.1 %
2 :            17.7 %                      17.6 %
3 :            12.5 %                      12.5 %
4 :             9.5 %                      9.69 %
5 :               8 %                      7.92 %
6 :             6.7 %                      6.69 %
7 :             5.6 %                       5.8 %
8 :             5.3 %                      5.12 %
9 :             4.5 %                      4.58 %

SOURCE

Content is available under GNU Free Documentation License 1.2.