Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<iomanip>: std::put_time should copy unknown conversion specifiers instead of crash #4820

Closed
TheStormN opened this issue Jul 8, 2024 · 13 comments · Fixed by #4840
Closed
Labels
bug Something isn't working fixed Something works now, yay!

Comments

@TheStormN
Copy link
Contributor

TheStormN commented Jul 8, 2024

Describe the bug

I know that behind the scenes std::put_time does use strftime which leads to crash if some kind of invalid input is provided. I haven't exactly read the standard, but the cplusplus.com docs on the function says if anything wrong happens, it should set the failbit on the stream and not crash the program. While undefined behavior is allowed for strftime on invalid input, this does not seems to be the case for std::put_time and this is why I'm reporting this as a bug.

Now, some ideas from my research on the topic. According to MS docs the only way to recover from strftime error case is to use _set_invalid_parameter_handler/_set_thread_local_invalid_parameter_handler https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/set-invalid-parameter-handler-set-thread-local-invalid-parameter-handler?view=msvc-170

The first one seems unfeasible as it will modify the global CRT invalid handler for C functions. The second option about thread local storage might be a better option if set and unset before/after calling std::put_time and using it to properly flag the stream error. This would still trigger an assert in debug mode, but at least it will not crash release versions.

But in general, the best solution for me would be for the UCRT team to modify the existing strftime implementations or create a completely new variant which can handle errors in a more graceful manner, but if this is not possible at least the STL team should try to handle this case somehow by perhaps using the workaround I've proposed.

Command-line test case

C:\Temp>type repro.cpp
#include <iostream>
#include <iomanip>
#include <sstream>
#include <ctime>

int main() {
    std::time_t t = std::time(0);
    std::tm* now = std::localtime(&t);
    std::ostringstream output;
    output << std::put_time(now, "%E%J%P"); // some random invalid formatting string

    if(output.good()) {
        std::cout << "All good\n";
    } else {
        std::cout << "Date time formatting failed due to invalid input\n";
    }
}

C:\Temp>cl /EHsc /W4 /WX /std:c++20 .\repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.36.32546 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

repro.cpp
Microsoft (R) Incremental Linker Version 19.36.32546.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:repro.exe
repro.obj

C:\Temp>.\repro.exe
!!!App Crashed!!!

Expected behavior

That would be for the program to NOT crash and output "Date time formatting failed due to invalid input".

(Edited: See analysis below.)

STL version

VS 2022 17.6.16

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Jul 8, 2024

While undefined behavior is allowed for strftime on invalid input

See WG14-N3220 7.29.3.5/2:

The strftime function places characters into the array pointed to by s as controlled by the string pointed to by format. The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format string consists of zero or more conversion specifiers and ordinary multibyte characters. A conversion specifier consists of a % character, possibly followed by an E or O modifier character (described later), followed by a character that determines the behavior of the conversion specifier. All ordinary multibyte characters (including the terminating null character) are copied unchanged into the array. If copying takes place between objects that overlap, the behavior is undefined. No more than maxsize characters are placed into the array.

IIUC "%E%J%P" is just a valid input for strftime. The %J part is just not a conversion specifier, and thus should be copied to the output as-is.

but the cplusplus.com docs on the function says if anything wrong happens, it should set the failbit on the stream and not crash the program

The standard wording for put_time is specified in [ext.manip]/10 and [locale.time.put.members]/1. Quoting the latter:

Effects: The first form steps through the sequence from pattern to pat_end, identifying characters that are part of a format sequence. Each character that is not part of a format sequence is written to s immediately, and each format sequence, as it is identified, results in a call to do_put; thus, format elements and other characters are interleaved in the output in the order in which they appear in the pattern. Format sequences are identified by converting each character c to a char value as if by ct.narrow(c, 0), where ct is a reference to ctype<charT> obtained from str.getloc(). The first character of each sequence is equal to '%', followed by an optional modifier character mod and a format specifier character spec as defined for the function strftime. If no modifier character is present, mod is zero. For each valid format sequence identified, calls do_put(s, str, fill, t, spec, mod).

Ditto, there's nothing wrong with put_time. As a result, the program should neither crash nor set failbit. (But you posted the program with a wrong parameter order - the output operand should be std::put_time(now, "%E%J%P").)

I believe it's UCRT's strftime that is buggy, and we can still rely on it if it gets fixed. However, a workaround would be needed if we want put_time to behave correctly as soon as possible.


Personal thought on workaround:

Don't call strftime/_Strftime in UCRT at this moment. Write a "corrected strftime" instead.

@TheStormN
Copy link
Contributor Author

@frederick-vs-ja Thanks a lot for the input. Yes my example got a bit wrong while I was trying to do a minimal version of it. I've edited my post to correct it.

Personal thought on workaround:

Don't call strftime/_Strftime in UCRT at this moment. Write a "corrected strftime" instead.

Unfortunately this would not solve all the cases. For example std::format does also call std::put_time which calls strftime. I guess there might be more similar cases. Going to chase a huge codebase for all variants which lead to strftime is a bit an overkill.

From past experiences I think requesting any kind of behavioral changes to the UCRT usually gets ignored. Perhaps the best approach here would be for the STL team to create their own version and not use the UCRT provided one. This way not only they will avoid crashes but will also better conform to the C++ standard.

@frederick-vs-ja
Copy link
Contributor

Going to chase a huge codebase for all variants which lead to strftime is a bit an overkill.

Fortunately, MSVC STL is not so that huge. I think this is the only concentrated relavent entry to UCRT. std::format eventually enters here when formatting chrono types.

_Count = _Strftime(&_Str[0], _Str.size(), _Fmt, _Pt, _Tnames._Getptr());

AFAIK _Strftime doesn't call strftime, but both of them call _Wcsftime_l which raises non-conforming errors.

Perhaps it's still doable to add a _Strftime_v2 function which behaves as corrected _Strftime/strftime.

@TheStormN
Copy link
Contributor Author

Ah, as a workaround you meant the STL codebase. I thought you implied the application code on which I'm working on.

Yep, I think this is doable and not too complicated. Also it should not affect binary compatibility.

@StephanTLavavej Any thoughts on this one?

@StephanTLavavej
Copy link
Member

StephanTLavavej commented Jul 10, 2024

@frederick-vs-ja I think you missed this paragraph, WG14-N3220 7.29.3.5/6:

If a conversion specifier is not one of the ones previously specified, the behavior is undefined.

I think that the second part of your analysis is correct - while C says that unrecognized conversion specifiers are UB, C++ says that unrecognized conversion specifiers are copied unchanged ([locale.time.put.members]/1 "Each character that is not part of a format sequence is written to s immediately").

It seems that no UCRT changes are necessary here - instead we should alter the STL's wrapper layer to recognize unknown conversion specifiers and ensure that they're copied unchanged (after 5 seconds of thought, escaping them with %% would be minimally intrusive but I don't know how easy/viable that is).

@StephanTLavavej StephanTLavavej changed the title <iomanip>: std::put_time should set failbit instead of crash on invalid input <iomanip>: std::put_time should copy unknown conversion specifiers instead of crash Jul 10, 2024
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Jul 10, 2024
@TheStormN
Copy link
Contributor Author

TheStormN commented Jul 10, 2024

@StephanTLavavej Will that also cover the cases where some more broken string is passed? For example, lets say I just forward user input to the std::put_time and in all cases I would not expect a crash(which strftime() will do) but some error which can be handled. In general I think in C++ no matter what is the format string, the app should not crash.

P.S. I think a much better solution would be for the UCRT team to alter the UB of strftime() to be more C++ friendly. :)

@StephanTLavavej
Copy link
Member

Can you provide a specific example of a "broken string" that does not involve an unknown conversion specifier?

@TheStormN
Copy link
Contributor Author

TheStormN commented Jul 11, 2024

Hey, for example "%Y-%-m-%-d" but I guess that will also be taken care of by the escaping? Anyway, whichever workaround you choose, still the checks which strftime is already doing will have to be replicated. I think if the UCRT team alters the UB to conform to C++ we will have a big win here.

@StephanTLavavej
Copy link
Member

Yeah, %- is an unknown conversion specifier, so I think that's the same issue.

Getting UCRT changes is extremely difficult - I think we should pursue STL changes here.

@TheStormN
Copy link
Contributor Author

TheStormN commented Jul 11, 2024

In that case I would recommend to not use strftime at all and create a complete STL rewrite.

I guess you should have access to UCRT source code, so coping most stuff from there would make this task much easier.

P.S. This will also solve the issues with std::format crashes when used with Chrono types.

@TheStormN
Copy link
Contributor Author

TheStormN commented Jul 14, 2024

@StephanTLavavej I've submitted a PR with simple fix: #4840

However, by looking at the code there I see significant effort to avoid _Strftime crashing. I still do think that if UCRT can't be changed, an new STL version of that function should be created which will not even require hacks to expand the string if more space is needed. Also strftime on Windows is really slow as it is doing char->wchar->char conversion and because of the current way std::put_time is using it by literally invoking it for each format specifier.

So is there any interest in doing STL native version of time_put to avoid calling _Strftime is general? This will also greatly benefit chrono related formatting.

@StephanTLavavej
Copy link
Member

Improved performance and resistance to UCRT/STL specification divergence would be worthwhile, but that has to be balanced against the complexity and maintenance burden of such a change. Our immediate reactions were "rarrrrrgh!" (@barcharcraz) and "uhhhhhhhh?" (@StephanTLavavej) 😹.

We wouldn't want to encourage anyone to work on that, but we won't reject the idea out of hand.

@TheStormN
Copy link
Contributor Author

Well, the fmt library on which the std::format implementation is based already did it and as you already have an agreement with the author, we can take some code from there. :))

@StephanTLavavej StephanTLavavej added the fixed Something works now, yay! label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Something works now, yay!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants