Calculate the information entropy (Shannon entropy) of a given input string.
Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable X is defined as:
where the information content I(x) = − logbP(x). If the base of the logarithm b = 2, the result is expressed in bits, a unit of information. Therefore, given a string Sof length n where P(si) is the relative frequency of each character, the entropy of a string in bits is:
For this task, use “1223334444” as an example. The result should be around 1.84644 bits.
#include <string> #include <map> #include <iostream> #include <algorithm> #include <cmath> double log2( double number ) { return log( number ) / log( 2 ) ; } int main( int argc , char *argv[ ] ) { std::string teststring( argv[ 1 ] ) ; std::map<char , int> frequencies ; for ( char c : teststring ) frequencies[ c ] ++ ; int numlen = teststring.length( ) ; double infocontent = 0 ; for ( std::pair<char , int> p : frequencies ) { double freq = static_cast<double>( p.second ) / numlen ; infocontent += freq * log2( freq ) ; } infocontent *= -1 ; std::cout << "The information content of " << teststring << " is " << infocontent << " !\n" ; return 0 ; }
- Output:
The information content of 1223334444 is 1.84644 !
Content is available under GNU Free Documentation License 1.2.