Thursday, April 28, 2011

Int tokenizer

I know there are string tokenizers but is there an "int tokenizer"?

For example, I want to split the string "12 34 46" and have:

list[0]=12

list[1]=34

list[2]=46

In particular, I'm wondering if Boost::Tokenizer does this. Although I couldn't find any examples that didn't use strings.

From stackoverflow
  • What you're looking for is 2 separate actions. First tokenize the string, then convert each token to an int.

  • i am not sure if you can do this without using string or char* because you have to but both numbers and spaces into same set...

  • Yes there is: use a stream, e.g. a stringstream:

    stringstream sstr("12 34 46");
    int i;
    while (sstr >> i)
        list.push_back(i);
    

    Alternatively, you can also use STL algorithms and/or iterator adapters combined with constructors:

    vector<int> list = vector<int>(istream_iterator<int>(sstr), istream_iterator<int>());
    
    GMan : Your second version is sexy :O
    avakar : Sexy, but unnecessarily verbose. `vector list(istream_iterator(sstr), istream_iterator());` would do just fine. :)
    Konrad Rudolph : avakar: oddly, *no*, your code doesn’t work. You either need to use the explicit constructor (as done by me) or include an extra pair of braces around one of the arguments; otherwise, your code will *not* work – instead, this is the declaration of a function prototype called `list` with return type `vector`. Try it out!
    Konrad Rudolph : I meant parentheses, not braces.
    Milan : Have worked on a similar project where one was forced to split "numbers". If you choose to use stringstream for this it will make your program at least 5-10times slower than when using char array / strings to find and split them that way. At the end just convert to int and it's done. This performance loss was tested by me on gcc 4.3 and VS 2008 on a c2d setup when the function splitting int's was run a few billion times vs the string splitting one.
    avakar : Konrad, indeed it won't work. Kudos for spotting that. I'd need a compiler to point it out to me.
  • You will want to split the string up at spaces, then use boost::lexical_cast to convert the text form of the number into the binary form.

    #include <boost/lexical_cast.hpp>
    #include <boost/tokenizer.hpp>
    #include <algorithm>
    #include <iostream>
    #include <iterator>
    #include <string>
    
    int main(void)
    {
        // the integers
        const std::string intText = "12 34 46";
        std::vector<int> intArray;
    
        // tokenize integers
        boost::tokenizer<> tokens(intText);
    
        // read integers
        for (boost::tokenizer<>::const_iterator it = tokens.begin();
         it != tokens.end(); ++it)
        {
         intArray.push_back(boost::lexical_cast<int>(*it));
        }
    
        // print out array
        std::copy(intArray.begin(), intArray.end(), std::ostream_iterator<int>(std::cout, "\n"));
    }
    
    GMan : This is the longer version of the Konrads answer. His is best if you know you'll be using spaces, but since you said "boost", here's your boost answer. This one will let you changed your delimiters easily.

0 comments:

Post a Comment