Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<iomanip>: std::put_time should not crash on invalid/out-of-range tm struct values #4882

Closed
TheStormN opened this issue Aug 8, 2024 · 6 comments · Fixed by #4883
Closed
Labels
bug Something isn't working fixed Something works now, yay!

Comments

@TheStormN
Copy link
Contributor

TheStormN commented Aug 8, 2024

Describe the bug

If an invalid or out-of-range tm struct is passed to std::put_time() it will lead to crash.

Command-line test case

C:\Temp>type repro.cpp
#include <iostream>
#include <iomanip>
#include <sstream>
#include <ctime>

int main() {
    using namespace std;

    time_t t = time(nullptr);
    tm currentTime;
    localtime_s(&currentTime, &t);
    currentTime.tm_hour = 25; // set invalid hour

    stringstream ss;
    ss << put_time(&currentTime, "%Y-%m-%d-%H-%M");
}

C:\Temp>cl /EHsc /W4 /WX /std:c++20 .\repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.36.32546 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

repro.cpp
Microsoft (R) Incremental Linker Version 19.36.32546.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:repro.exe
repro.obj

C:\Temp>.\repro.exe
!!!App Crashed!!!

Expected behavior

The program should NOT crash.

Additional context

The crash is happening because of strftime which validates the data and if something is wrong it leads to crash if no handler is set.

@StephanTLavavej
Copy link
Member

Please note that cplusplus.com is widely regarded as low-quality, versus cppreference.com which is widely preferred. (Compare https://cplusplus.com/reference/iomanip/put_time/ to https://en.cppreference.com/w/cpp/io/manip/put_time .) However, even cppreference is unofficial (a community-edited wiki) and occasionally contains inaccuracies or simplifications. The C++ Working Paper (currently WG21-N4988) is the ultimate source of authority for our implementation work.

In this case, the important questions are "what do we do for out-of-range fields", "do we set failbit or badbit or nothing", and "do time_put versus put_time() have different behavior regarding their error bits".

N4988 [ext.manip]/10 says that put_time() calls time_put::put() with an ostreambuf_iterator and then

if (end.failed())
  str.setstate(ios_base::badbit);

[locale.categories.general]/2 says: "The put() members make no provision for error reporting. (Any failures of the OutputIterator argument can be extracted from the returned iterator.)"

C23 7.29.3.5 "The strftime function"/3 says: "If any of the specified values is outside the normal range, the characters stored are unspecified."

Disclaimer: Despite working on this codebase for 17 years, I still find it difficult to understand iostreams/locales and their Standardese. However, this wording appears to be saying to me:

  • The C Standard says that strftime has unspecified behavior for out-of-range values. This is distinct from undefined behavior. Unspecified means that the function has to do something (and can't crash), but the Standard doesn't say what has to happen, and it can change at any time for any reason. Storing the characters "MEOW" would be valid for unspecified behavior, for example.
  • The C++ wording indicates that the time_put facet can't report errors, only the returned ostreambuf_iterator can communicate errors. put_time() translates iterator .failed() into badbit (which is reasonable - badbit indicates that something went wrong with the actual process of writing). But in this case, nothing is causing the iterator itself to fail to write.
  • I conclude that we shouldn't be reporting errors at all in this case ( 😕 ❔ ❕ 🤔 ) and should instead be doing something like replacing out-of-range values with zero-ish values.

Corrections to my interpretation would be gratefully appreciated.

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Aug 8, 2024
@TheStormN
Copy link
Contributor Author

To me, badbit means that something got broken with the stream itself, which is not the case here.
So, I think we should communicate the error via the returned ostreambuf_iterator that parsing failed, but the stream is still in a good shape, i.e. failbit.

Silently replacing values seem me like undefined behavior. It would be best to not print anything and just report failure.

I'm also open to other interpretations or suggestions.

@StephanTLavavej
Copy link
Member

We need to figure out what the Standards say we must/can do. Judgement calls can be a factor when deciding what to do within the bounds of what the Standards say we can do, but otherwise we are absolutely constrained by the normative text.

What you're describing sounds like inventing behavior that we "think" should happen - which would be appropriate in a library we're designing ourselves, but not in the Standard Library which has been (literally) designed by committee.

Note - I'm not saying you're wrong on this specific point, just that it needs to be justified by citing specific paragraphs in the Standards. I can't find such paragraphs that say that failbit is allowable here.

Silently replacing values seem me like undefined behavior.

Considering the C Standard in isolation, this is incorrect by definition. The Standard decides what's undefined (anything can happen, including crashing, including backwards in time) versus unspecified, versus implementation-defined, versus well-defined. The sentence I cited clearly says that out-of-range values result in unspecified characters being stored. That not only means that silent replacement is something that implementations can decide to do, it also means that implementations aren't allowed to do other things, such as crash or report failure (by returning 0 from strftime).

As I mentioned, I am less certain about the whole interaction between the C and C++ Standards here, but that uncertainty is "on me" - either there should be no actual ambiguity in what they require/allow here, or there's a defect in the library wording (hopefully not!).

Although it isn't a definitive technique, we can also check the observable behavior of the other implementations libstdc++ and libc++ to see what they do - if they are united then it suggests that the Standard says their behavior is correct, if they diverge then that suggests that the Standard allows a range of behavior, or there are implementation bugs.

@TheStormN
Copy link
Contributor Author

TheStormN commented Aug 8, 2024

Yes, sorry, I didn't wanted to imply that we should implement stuff how we "feel" it. I guess I just over trusted the "community-wikies" and I'm quite new into reading the actual standard papers. They are like studying law in university. :) But I guess there is no escaping here. I will try to dig in more tomorrow.

I will also cross-reference what the other major compilers do as you suggested.

@TheStormN
Copy link
Contributor Author

TheStormN commented Aug 9, 2024

@StephanTLavavej So I've spend a few hours researching what should be done here and to be honest it is not clearly defined. For example all functions which are supposed to create time object from string will do failbit if something is wrong in the string and no valid object can be generated. However, this is not explicitly stated for the put functions. The best I found is the generic statement about Formatted output functions [ostream.formatted]:

31.7.6.3 Formatted output functions [ostream.formatted]
    31.7.6.3.1 Common requirements [ostream.formatted.reqmts]
        1 Each formatted output function begins execution by constructing an object of class sentry. If that object
          returns true when converted to a value of type bool, the function endeavors to generate the requested
          output. If the generation fails, then the formatted output function does setstate(ios_base::failbit),
          which can throw an exception. If an exception is thrown during output, then ios_base::badbit is set290
          in *this’s error state. If (exceptions()&badbit) != 0 then the exception is rethrown. Whether or not
          an exception is thrown, the sentry object is destroyed before leaving the formatted output function. If no
          exception is thrown, the result of the formatted output function is *this.
        2 The descriptions of the individual formatted output functions describe how they perform output and do not
          mention the sentry object.
        3 If a formatted output function of a stream os determines padding, it does so as follows. Given a charT
          character sequence seq where charT is the character container type of the stream, if the length of seq is
          less than os.width(), then enough copies of os.fill() are added to this sequence as necessary to pad
          to a width of os.width() characters. If (os.flags() & ios_base::adjustfield) == ios_base::left
          is true, the fill characters are placed after the character sequence; otherwise, they are placed before the
          character sequence.

Where If the generation fails, then the formatted output function does setstate(ios_base::failbit), which can throw an exception. is the only thing which can lead to my initial assumption.

I've also tested what other compilers are doing(GCC 14 and Clang 18). They are pretty much forwarding all the input to strftime which is doing its undefined behavior thing. So for example if you set invalid month like 15, they will just output 16(15+1). In case a name is requested for an invalid value, they just print a single ?. The ios state is always in good() shape. I've prepared an example of that - https://godbolt.org/z/qzMdr3jf9
However I would not say this behavior is much compiler specific. It is C library specific - I guess glibc.

So, how we handle that situation for MS STL std::put_time is again quite open, but ofc we should not crash.

Given all that I've changed my mind a bit how we should handle this stuff here. We can also keep the stream in good() condition, no matter what. However, replicating the exact behavior of the other compilers will make us duplicate more and more of what strftime is doing which seems a bit of an overkill in general.

As a middle ground I propose for every invalid tm struct value to just print single ?, no matter if the representation is requested as numeric(with or without leading zero) or name. This way we can keep the code more simple and not have to make unneeded magics just to fully replicate other compilers undefined behavior.

@TheStormN
Copy link
Contributor Author

Ok, I've pushed changes to my PR to update it to use my new understandings. I'm still open to suggestions for the best approach given the standard is not very explicit.

@TheStormN TheStormN changed the title <iomanip>: std::put_time should set failbit on invalid/out-of-range tm struct, instead of crash <iomanip>: std::put_time should not crash on invalid/out-of-range tm struct values Aug 10, 2024
@StephanTLavavej StephanTLavavej added the fixed Something works now, yay! label Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Something works now, yay!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants