# C++: Entropy

Calculate the information entropy (Shannon entropy) of a given input string.

Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable X is defined as:

$H(X) = \sum_{x\in\Omega} P(x) I(x)$

where the information content I(x) = − logbP(x). If the base of the logarithm b = 2, the result is expressed in bits, a unit of information. Therefore, given a string Sof length n where P(si) is the relative frequency of each character, the entropy of a string in bits is:

$H(S) = -\sum_{i=0}^n P(s_i) \log_2 (P(s_i))$

For this task, use “1223334444” as an example. The result should be around 1.84644 bits.

#include <string>
#include <map>
#include <iostream>
#include <algorithm>
#include <cmath>

double log2( double number ) {
return log( number ) / log( 2 ) ;
}

int main( int argc , char *argv[ ] ) {
std::string teststring( argv[ 1 ] ) ;
std::map<char , int> frequencies ;
for ( char c : teststring )
frequencies[ c ] ++ ;
int numlen = teststring.length( ) ;
double infocontent = 0 ;
for ( std::pair<char , int> p : frequencies ) {
double freq = static_cast<double>( p.second ) / numlen ;
infocontent += freq * log2( freq ) ;
}
infocontent *= -1 ;
std::cout << "The information content of " << teststring
<< " is " << infocontent << " !\n" ;
return 0 ;
}

Output:
The information content of 1223334444 is 1.84644 !

SOURCE

Content is available under GNU Free Documentation License 1.2.