CLI toml-test decoder #125

moorereason · 2021-12-22T16:52:41Z

Is your feature request related to a problem? Please describe.

I've been doing differential fuzzing against a few TOML decoders to see where they may handle inputs differently. I would like to add tomlplusplus into the mix.

Describe the solution you'd like

Can tomlplusplus provide a CLI toml-test decoder? The decoder takes a given TOML input via STDIN and outputs annotated JSON to STDOUT. Details are in the toml-test README.

Additional context

I've filed several bug reports in other projects and submitted new test cases to toml-test based upon this fuzzing.

Thanks for considering.

The text was updated successfully, but these errors were encountered:

marzer · 2021-12-22T20:33:25Z

There's no toml-test decoder for toml++ currently, though my test suite consumes the test inputs from toml-test and generates C++ unit tests to validate using the test suite, e.g.: conformance_burntsushi_valid.cpp

I update the conformance tests every time I do any work on the library so it's generally up-to-date.

"Why did you bother writing a C++ code generator and not just provide an encoder/decoder?", I hear you ask.

It made sense at the time. Third-party libs in C++ can be very annoying, with or without CMake and friends, so in the early days of toml++ this approach seemed easier. I am going to do another batch of toml++ work at some point in January so if I get time I'll revisit this.

moorereason · 2021-12-22T22:24:11Z

I'm not advocating that you stop generating unit tests. Every(?) TOML package I've looked at generates local unit tests from the suite. However, I've only found 5 packages that have a toml-test decoder implementation.

I find a toml-test decoder useful for differential fuzzing and submitting bug reports. For example, I'm not a C++ programmer, but I can use a Go fuzzer to find and report bugs in a C++ decoder such as toml11 (e.g. ToruNiina/toml11#180 & ToruNiina/toml11#181). No C++ skills required! 😄

marzer · 2021-12-22T23:14:37Z

I'm not advocating that you stop generating unit tests.

Oh, certainly. I didn't think you were. Apologies if I gave you that impression. Was just explaining why this repo does the seemingly-harder thing first, instead of the probably-easier-thing, heh.

I do see the appeal. There's even a few tests I have to manually skip the code generation for because various C++ compilers don't like the in-code equivalent (e.g. GCC doesn't like whitespace after a trailing \ in a string literal, and there's a few tests in toml-test that have those). I'll get around to it eventually, hopefully as part of the next release.

…ML to streams also: - added `date_time` converting constructors from `date` and `time` - added `is_key<>` and is_key_or_convertible<>` metafunctions - exposed `TOML_NAMESPACE_START` and `TOML_NAMESPACE_END` macros to help with ADL specialization scenarios - added encoder and decoder for `toml-test` (closes #125)

marzer · 2022-01-03T19:12:24Z

@moorereason I've implemented this in the v3 branch - see v3/toml-test for setup info.

Every(?) TOML package I've looked at generates local unit tests from the suite.

Just re-read this. How do you figure that? It certainly isn't true for the C++ TOML ecosystem. In fact toml++ is the only one that chose that route, as far as I can tell - the others (the still-maintained ones, at least) appear to have done the sensible in the first place and built toml-test decoders 😄

In any case, toml++ has now joined the club and provided both an encoder and decoder for toml-test. Let me know how you get on with it.

moorereason · 2022-01-04T00:16:53Z

Sweet! I'm trying to get through my existing corpus, and I'm getting a UTF-8 error on a case.

Given an input of 0="\u0800", the tt_decoder returns:

terminate called after throwing an instance of 'nlohmann::detail::type_error'
  what():  [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0x80

Not sure what to do with that.

marzer · 2022-01-04T00:53:53Z

O_o

Are you using any custom tests that aren't currently a part of toml-test? I built it from master today and didn't experience that.

If so, can you share the toml/json files?

moorereason · 2022-01-04T02:09:46Z

Are you using any custom tests that aren't currently a part of toml-test?

Yes, but I'm doing differential fuzzing against other toml-test decoder implementations via STDIN and comparing the results.

The TOML document in question is 10 characters long:

0="\u0800"

I've tested that input with 6 other decoders (in Go, C99, C++, Fortran, and Dart), and they all return valid and equal results.

marzer · 2022-01-04T09:32:54Z

Oh, right, fuzzing. Ok well I'll do some digging and see what's going on.

marzer · 2022-01-04T11:16:12Z

Ah, I see, it's the JSON library I'm using. Doesn't like the JSON generated from that snippet. Guess I need to RTFM to figure out why 😅

also: - fixed extended-precision fractional times causing parse error instead of truncating per the spec (closes #127) - fixed some non-spec vertical whitespace being accepted as line breaks (closes #128) - added `format_flags::allow_unicode_strings`

marzer · 2022-01-04T22:00:09Z

@moorereason That's fixed now in the v3 branch.

marzer · 2022-01-04T23:55:04Z

Oh, since you're adding new upstream tests to toml-test, you might like to know that the unicode-related crash in the JSON lib was actually caused by my lib converting the unicode scalar escape sequence \u0800 into the wrong utf-8 byte sequence.

I had some unit tests in place for those conversions but they weren't broad enough. I've added a bunch more now:

tomlplusplus/tests/user_feedback.cpp

Lines 196 to 211 in b41e12f

    
           SECTION("github/issues/125") // https://github.com/marzer/tomlplusplus/issues/125 
        
           { 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u0800")"sv, "\xE0\xA0\x80"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u7840")"sv, "\xE7\xA1\x80"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\uAA23")"sv, "\xEA\xA8\xA3"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\uA928")"sv, "\xEA\xA4\xA8"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u9CBF")"sv, "\xE9\xB2\xBF"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u2247")"sv, "\xE2\x89\x87"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u13D9")"sv, "\xE1\x8F\x99"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u69FC")"sv, "\xE6\xA7\xBC"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u8DE5")"sv, "\xE8\xB7\xA5"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u699C")"sv, "\xE6\xA6\x9C"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u8CD4")"sv, "\xE8\xB3\x94"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u4ED4")"sv, "\xE4\xBB\x94"sv); 
        
           	parse_expected_value(FILE_LINE_ARGS, R"("\u2597")"sv, "\xE2\x96\x97"sv); 
        
           }

- might be worth adding a similar set to toml-test (I don't know if it would fit with the test format though)

moorereason added the feature New feature or enhancement request label Dec 22, 2021

moorereason assigned marzer Dec 22, 2021

marzer added the v3 label Dec 22, 2021

marzer added implemented in v3 Fixes + features which were implemented in v3 release. and removed v3 labels Jan 3, 2022

marzer closed this as completed in f3bd22b Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI toml-test decoder #125

CLI toml-test decoder #125

moorereason commented Dec 22, 2021

marzer commented Dec 22, 2021 •

edited

Loading

moorereason commented Dec 22, 2021

marzer commented Dec 22, 2021 •

edited

Loading

marzer commented Jan 3, 2022 •

edited

Loading

moorereason commented Jan 4, 2022

marzer commented Jan 4, 2022

moorereason commented Jan 4, 2022

marzer commented Jan 4, 2022

marzer commented Jan 4, 2022

marzer commented Jan 4, 2022

marzer commented Jan 4, 2022 •

edited

Loading

CLI toml-test decoder #125

CLI toml-test decoder #125

Comments

moorereason commented Dec 22, 2021

marzer commented Dec 22, 2021 • edited Loading

moorereason commented Dec 22, 2021

marzer commented Dec 22, 2021 • edited Loading

marzer commented Jan 3, 2022 • edited Loading

moorereason commented Jan 4, 2022

marzer commented Jan 4, 2022

moorereason commented Jan 4, 2022

marzer commented Jan 4, 2022

marzer commented Jan 4, 2022

marzer commented Jan 4, 2022

marzer commented Jan 4, 2022 • edited Loading

marzer commented Dec 22, 2021 •

edited

Loading

marzer commented Dec 22, 2021 •

edited

Loading

marzer commented Jan 3, 2022 •

edited

Loading

marzer commented Jan 4, 2022 •

edited

Loading