fbpx

C++: Entropy

Bjarne-stroustrup
 

Calculate the information entropy (Shannon entropy) of a given input string.

Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable X is defined as:

H(X) = \sum_{x\in\Omega} P(x) I(x)

where the information content I(x) = − logbP(x). If the base of the logarithm b = 2, the result is expressed in bits, a unit of information. Therefore, given a string Sof length n where P(si) is the relative frequency of each character, the entropy of a string in bits is:

H(S) = -\sum_{i=0}^n P(s_i) \log_2 (P(s_i))

For this task, use “1223334444” as an example. The result should be around 1.84644 bits.

#include <string>
#include <map>
#include <iostream>
#include <algorithm>
#include <cmath>

double log2( double number ) {
	return log( number ) / log( 2 ) ;
}

int main( int argc , char *argv[ ] ) {
	std::string teststring( argv[ 1 ] ) ;
	std::map<char , int> frequencies ;
	for ( char c : teststring )
	frequencies[ c ] ++ ;
	int numlen = teststring.length( ) ;
	double infocontent = 0 ;
	for ( std::pair<char , int> p : frequencies ) {
		double freq = static_cast<double>( p.second ) / numlen ;
		infocontent += freq * log2( freq ) ;
	}
	infocontent *= -1 ;
	std::cout << "The information content of " << teststring 
	<< " is " << infocontent << " !\n" ;
	return 0 ;
}
Output:
The information content of 1223334444 is 1.84644 !

SOURCE

Content is available under GNU Free Documentation License 1.2.