5

For once I thought I found a good use for sscanf() but after reading about how it handles integers, it appears not. Having a string that should look like this: 123,456,678 I thought I could safely and concisely parse it with this code:

unsigned int x[3];
if( sscanf( s, "%u,%u,%u", x+0, x+1, x+2 ) == 3 )
    …

If conversion fails I'm not really interested in knowing why, nor am I worried about getting incorrect data. If there's something other than numbers in there, scanf() should surely create a matching error and abort, and it knows I'm looking for an unsigned integer, so anything negative should also be a matching error? Nope.

I got suspicious when I read about the conversion specifier %u: Matches an optionally signed decimal integer. Why would this not be a matching error? What happens if it is signed?

Quoting from ISO/IEC 9899:201x 7.21.6.2 ¶ 10, The fscanf function (emphasis mine):

Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

It appears to read as if scanf() treats every integer-looking conversion specifier the same, reads the input as some kind of signed integer of unspecified size, and then writes to the output bypassing all normal conversions.

For example converting any integer (negative or positive) into an unsigned integer of smaller size is well behaved according to normal implicit conversions, but not with scanf():

unsigned int x;
x = -1;                   /* Well defined: (-1) + (UINT_MAX+1) = UINT_MAX */
sscanf( "-1", "%u", &x ); /* Undefined behavior? */

Please tell me I'm wrong and that I have missed some part of the standard. One thing that I can't really find a reference to is this part of the section quoted above: "the input item (…) is converted to a type appropriate to the conversion specifier". If the conversion specifier is %u then anything negative is of course not appropriate, nor is anything that does not fit into an unsigned integer. However, I could not find anything in the standard telling me exactly what an "appropriate type" is.

I found a handful of questions dealing with this directly or indirectly, but not in much detail. The question most similar to mine is C: How to prevent input using scanf from overflowing? but it's framed in a way that's not as specific. A few answers (1, 2) mentions the issue but offer no detail or references.

The goal of my question is to get an answer detailing exactly why this can not be interpreted in any way other than undefined behavior, and preferably some rationale as to why this makes sense - fully knowing that some things in C are inconsistent and you I have to accept it.

pipe
  • 654
  • 9
  • 24
  • Yes, it can fail. if the origin string `s` is NULL or not appropialtly terminated. – wildplasser Feb 24 '21 at 00:51
  • 1
    See also [scanf %u negative number?](https://stackoverflow.com/questions/38684386/scanf-u-negative-number). – dxiv Feb 24 '21 at 00:52
  • How do you figure `scanf` “writes to the output *bypassing* all normal conversions”? The standard says it does a conversion. Conversions are specified in C 2018 6.3. For `%u`, the appropriate type is `unsigned`. So matched input of `-1` will result in −1 being converted to `unsigned`. Conversion of −1 to `unsigned` will yield `UINT_MAX`. – Eric Postpischil Feb 24 '21 at 01:31
  • A problem here is that it never specifies how the input sequence is converted to the destination type. E.g. it doesn't say that this happens as if by `strtoul` . – M.M Feb 24 '21 at 01:32
  • Re “… I could not find anything in the standard telling me exactly what an "appropriate type" is”: Is the type that is supposed to be passed (by address) for the conversion. For `%u`, it is `unsigned`. – Eric Postpischil Feb 24 '21 at 01:33
  • @EricPostpischil I don't agree that's implied. Other parts of the specification of `scanf` contradict the "as if by `strtoul`" approach, e.g. the predicate "if the result of the conversion cannot be represented in the object" can never be satisfied, because the result of `strtoul` and family are always representible in the result type (with that result perhaps being `ULONG_MAX` with errno set to `ERANGE` for example). Also, what about `%llu` with input `ULONG_MAX+1` (as string) ? If converted as if by `strtoul` the result should be `0`. – M.M Feb 24 '21 at 02:47
  • We *could* come up with some reasonable theory about how it should behave, but the fact is that the standard doesn't actually specify whatever theory we settle on . It is just a reasonable interpolation of an underspecification . – M.M Feb 24 '21 at 02:48
  • @M.M: The predicate is satisfiable for `%f`. – Eric Postpischil Feb 24 '21 at 03:10
  • I take this extreme comment thread as a sign that my question is somewhat valid - there's at least _arguably_ some confusion and unclarity. Hope I have time to digest it before it's cleared out... – pipe Feb 24 '21 at 05:07
  • 1
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2483.pdf – cpplearner Feb 24 '21 at 10:29
  • @M.M: Upon further study, I think I am wrong about a number-to-number conversion being involved. I think the standard intends a direct conversion from the numeral in the string to the final type. (With either interpretation, `scanf` may have undefined behavior given a sufficiently large input, either because it is too big for the `unsigned` destination or because it is too big for the widest integer the implementation supports.) I will delete my inapplicable comments. – Eric Postpischil Feb 25 '21 at 23:54
  • @EricPostpischil we're back to square one with the behaviour on input `-1` then , since this is out of range for `unsigned int` and it's not clear whether the intent is that out-of-range input is undefined behaviour, or whether the intent is to behave like `strtoul` . – M.M Feb 26 '21 at 01:10

0 Answers0