C++: Multisplit

Bjarne-stroustrup
 


It is often necessary to split a string into pieces based on several different (potentially multi-character) separator strings, while still retaining the information about which separators were present in the input.

This is particularly useful when doing small parsing tasks.
The task is to write code to demonstrate this.

The function (or procedure or method, as appropriate) should take an input string and an ordered collection of separators.

The order of the separators is significant:
The delimiter order represents priority in matching, with the first defined delimiter having the highest priority. In cases where there would be an ambiguity as to which separator to use at a particular point (e.g., because one separator is a prefix of another) the separator with the highest priority should be used. Delimiters can be reused and the output from the function should be an ordered sequence of substrings.

Test your code using the input string “a!===b=!=c” and the separators “==”, “!=” and “=”.

For these inputs the string should be parsed as "a" (!=) "" (==) "b" (=) "" (!=) "c", where matched delimiters are shown in parentheses, and separated strings are quoted, so our resulting output is "a", empty string, "b", empty string, "c". Note that the quotation marks are shown for clarity and do not form part of the output.

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

int main( ) {
	std::string str( "a!===b=!=c" ) , output ;
	typedef boost::tokenizer<boost::char_separator<char> > tokenizer ;
	boost::char_separator<char> separator ( "==" , "!=" ) , sep ( "!" )  ;
	tokenizer mytok( str , separator ) ;
	tokenizer::iterator tok_iter = mytok.begin( ) ;
	for ( ; tok_iter != mytok.end( ) ; ++tok_iter )
	output.append( *tok_iter ) ;
	tokenizer nexttok ( output , sep ) ;
	for ( tok_iter = nexttok.begin( ) ; tok_iter != nexttok.end( ) ;
	++tok_iter ) 
	std::cout << *tok_iter << " " ;
	std::cout << '\n' ;
	return 0 ;
}
Output:
a b c

SOURCE

Content is available under GNU Free Documentation License 1.2.