1

I have some user input following this format:

Playa Raco#path#5#39.244|-0.257#0-23

The # here acts as a separator, and the | is also a separator for the latitude and longitude. I would like to extract this information. Note that the strings could have spaces. I tried using the %[^\n]%*c formatter with scanf and adding # and |, but it doesn't work because it matches the whole line.

I would like to keep this as simple as possible, I know that I could do this reading each char, but I'm curious to see best practices and check if there is a scanf or similar alternative for this.

Norhther
  • 662
  • 2
  • 12
  • 27
  • 3
    A common approach is to use [strtok](https://man7.org/linux/man-pages/man3/strtok.3.html) – kaylum Jun 30 '21 at 23:43
  • If you are curios to why something didn't work, then maybe show us that something? – HAL9000 Jun 30 '21 at 23:45
  • 1
    @HAL9000 I'm not curious about why it didn't work. `[^\n]%*c` Is going to match the whole line. I'm asking about best practices, if you want I can add a working function that extracts the tokens doing a linear search, but that's not what I'm asking. – Norhther Jun 30 '21 at 23:46
  • 3
    I hate to recommend `scanf`, but `%[^#]#%[^#]#%[^#]#%[^#]#%[^|]|%[^#]#%s` ought to work. (And just looking at that ghastly mess -- I can't believe I just typed it -- reminds me why I hate to recommend `scanf`!) – Steve Summit Jun 30 '21 at 23:46
  • @SteveSummit Well... That works, not sure if it's best practices tho, but I appreciate it! Thanks! – Norhther Jun 30 '21 at 23:51
  • 2
    Or, alternatively, `%[^#]#%[^#]#%d#%lf|%lf#%s`. – Steve Summit Jun 30 '21 at 23:53
  • But, "best practice"? Depends who you ask. Some people like `scanf`, and for them, the jawbreaker formats I've just constructed would indeed be, if not "best", then at least acceptable practice. But if your heart's not set on `scanf`, then kaylum is right, `strtok` would be an excellent starting point for a better practice. – Steve Summit Jun 30 '21 at 23:55
  • @SteveSummit for this case, I'm gonna say that your first comment was not a "good practice" (from my point of view) because the verbose nature of it. I'm definetely going to check `strtok`, but I would also accept your second comment. – Norhther Jun 30 '21 at 23:58
  • Me, I'd use a variation on the [`getwords` function discussed here](https://www.eskimo.com/~scs/cclass/notes/sx10h.html), but unfortunately there's [nothing like it in the standard library](https://stackoverflow.com/questions/49372173). – Steve Summit Jul 01 '21 at 00:00
  • 1
    Another option would be a regular expression parser. – Steve Summit Jul 01 '21 at 00:02
  • @Norhther, I was trying to comment on the fact that you had tried adding `#` and `|` without telling us how. Sorry for the confusion. – HAL9000 Jul 01 '21 at 00:35
  • @SteveSummit nothing wrong with the approach, but better `fgets()` then `sscanf()` rather then `scanf()` alone. Using a `scanf()`/`sscanf()` approach has the benefit of not modifying the original string. `strcspn()`/`strspn()` can be used to do the same thing as `strtok()` but without modifying the original string. You may as well write up your solution as an answer, or we need to find a dupe and close. – David C. Rankin Jul 01 '21 at 02:15
  • @DavidC.Rankin I said "`scanf`", but I meant "any member of the *scanf family" -- and, yes, of course, `sscanf` is the preferred choice here. I wouldn't have had time to write up an answer tonight, so I'm glad you did. – Steve Summit Jul 01 '21 at 04:29
  • @SteveSummit - glad to do it, but I just wanted to make sure I gave you first shot if you had the time `:)` – David C. Rankin Jul 01 '21 at 04:35

1 Answers1

5

As mentioned in the comments, there are many ways you can parse the information from the string. You can walk a pair of pointers down the string, testing each character and taking the appropriate action, you can use strtok(), but note strtok() modifies the original string, so it cannot be used on a string-literal, you can use sscanf() to parse the values from the string, or you can use any combination of strcspn(), strspn(), strchr(), etc. and then manually copy each field between a start and end pointer.

However, your question also imposes "I would like to keep this as simple as possible..." and that points directly to sscanf(). You simply need to validate the return and you are done. For example, you could do:

#include <stdio.h>

#define MAXC 16     /* adjust as necessary */

int main (void) {
    
    const char *str = "Playa Raco#path#5#39.244|-0.257#0-23";
    char name[MAXC], path[MAXC], last[MAXC];
    int num;
    double lat, lon;
    
    if (sscanf (str, "%15[^#]#%15[^#]#%d#%lf|%lf#%15[^\n]",
                name, path, &num, &lat, &lon, last) == 6) {
        printf ("name : %s\npath : %s\nnum  : %d\n"
                "lat  : %f\nlon  : %f\nlast : %s\n",
                name, path, num, lat, lon, last);
    }
    else
        fputs ("error: parsing values from str.\n", stderr);
}

(note: the %[..] conversion does not consume leading whitespace, so if there is a possibility of leading whitespace or a space following '#' before a string conversion, include a space in the format string, e.g. " %15[^#]# %15[^#]#%d#%lf|%lf# %15[^\n]")

Where each string portion of the input to be split is declared as a 16 character array. Looking at the format-string, you will note the read of each string is limited to 15 characters (plus the nul-terminating) character to ensure you do not attempt to store more characters than your arrays can hold. (that would invoke Undefined Behavior). Since there are six conversions requested, you validate the conversion by ensuring the return is 6.

Example Use/Output

Taking this approach, the output above would be:

./bin/parse_sscanf
name : Playa Raco
path : path
num  : 5
lat  : 39.244000
lon  : -0.257000
last : 0-23

No one way is necessarily "better" than another so long as you validate the conversions and protect the array bounds for any character arrays filled. However, as far as simple as possible goes, it's hard to beat sscanf() here -- and it doesn't modify your original string, so it is safe to use with string-literals.

David C. Rankin
  • 75,900
  • 6
  • 54
  • 79