-1

I am looking forward to best implementation of string tokenizer. I have seen a lot of implementation, but some of them doesn't work with multiple delimiters in a row. I can do it by myself but I don't know some already existed functions , so maybe it has been already implemented in a correct and fast way.
I need to split for example such string

"This__should_______be____split_into____seven___strings"

in this case delimiter is underscore. What is the most correct and elegant way to do this ?

EDIT

Sorry , I have not mentioned . I need to do this with only default libraries without different external like boost and others.

uftsyo
  • 1
  • 2

2 Answers2

1

Using the ever useful boost string algorithms:

std::vector<std::string> words;
std::string sentence = "This__should_______be____split_into____seven___strings";
boost::split(words, sentence, boost::is_any_of("_"));
words.erase(
    std::remove_if(
        words.begin(), words.end(), 
            [](const std::string &s){return s.empty();}));

DEMO

Edit: Given the updated requirements:

std::vector<std::string> words;
std::string word = "";
char prev = '\0';
std::string sentence = "This__should_______be____split_into____seven___strings";
for (char c : sentence)
{
    switch (c)
    {
    case '_':
    {
        if (prev != '_')
        {
            words.push_back(word);
            word = "";
            prev = '_';
        }
        break;
    }
    default:
    {
        word += c;
        prev = c;
        break;
    }
    };
}
if (!word.empty())
{
    words.push_back(word);
}

DEMO

sjdowling
  • 2,954
  • 2
  • 19
  • 30
0

Simple C tokenizer, tested and works with the given string. You can use this method in C++ too. Note: it will work only with null-terminated strings.

char *text = "This__should_______be____split_into____seven___strings";
char *p = text;
char buf[20];
while (*p != '\0')
{
    char *start;
    int len;

    while (*p != '\0' && *p == '_')
        ++p;

    if (*p == '\0')
        break;

    start = p;
    while (*p != '\0' && *p != '_')
        ++p;

    len = p - start;
    strncpy(buf, start, len);
    buf[len] = '\0';
    printf ("%s\n", buf);
    buf[0] = '\0';    
}