Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI toml-test decoder #125

Closed
moorereason opened this issue Dec 22, 2021 · 11 comments
Closed

CLI toml-test decoder #125

moorereason opened this issue Dec 22, 2021 · 11 comments
Assignees
Labels
feature New feature or enhancement request implemented in v3 Fixes + features which were implemented in v3 release.

Comments

@moorereason
Copy link

Is your feature request related to a problem? Please describe.

I've been doing differential fuzzing against a few TOML decoders to see where they may handle inputs differently. I would like to add tomlplusplus into the mix.

Describe the solution you'd like

Can tomlplusplus provide a CLI toml-test decoder? The decoder takes a given TOML input via STDIN and outputs annotated JSON to STDOUT. Details are in the toml-test README.

Additional context

I've filed several bug reports in other projects and submitted new test cases to toml-test based upon this fuzzing.

Thanks for considering.

@moorereason moorereason added the feature New feature or enhancement request label Dec 22, 2021
@marzer
Copy link
Owner

marzer commented Dec 22, 2021

There's no toml-test decoder for toml++ currently, though my test suite consumes the test inputs from toml-test and generates C++ unit tests to validate using the test suite, e.g.: conformance_burntsushi_valid.cpp

I update the conformance tests every time I do any work on the library so it's generally up-to-date.

"Why did you bother writing a C++ code generator and not just provide an encoder/decoder?", I hear you ask.

It made sense at the time. Third-party libs in C++ can be very annoying, with or without CMake and friends, so in the early days of toml++ this approach seemed easier. I am going to do another batch of toml++ work at some point in January so if I get time I'll revisit this.

@moorereason
Copy link
Author

I'm not advocating that you stop generating unit tests. Every(?) TOML package I've looked at generates local unit tests from the suite. However, I've only found 5 packages that have a toml-test decoder implementation.

I find a toml-test decoder useful for differential fuzzing and submitting bug reports. For example, I'm not a C++ programmer, but I can use a Go fuzzer to find and report bugs in a C++ decoder such as toml11 (e.g. ToruNiina/toml11#180 & ToruNiina/toml11#181). No C++ skills required! 😄

@marzer
Copy link
Owner

marzer commented Dec 22, 2021

I'm not advocating that you stop generating unit tests.

Oh, certainly. I didn't think you were. Apologies if I gave you that impression. Was just explaining why this repo does the seemingly-harder thing first, instead of the probably-easier-thing, heh.

I do see the appeal. There's even a few tests I have to manually skip the code generation for because various C++ compilers don't like the in-code equivalent (e.g. GCC doesn't like whitespace after a trailing \ in a string literal, and there's a few tests in toml-test that have those). I'll get around to it eventually, hopefully as part of the next release.

@marzer marzer added the v3 label Dec 22, 2021
marzer added a commit that referenced this issue Jan 3, 2022
…ML to streams

also:
- added `date_time` converting constructors from `date` and `time`
- added `is_key<>` and is_key_or_convertible<>` metafunctions
- exposed `TOML_NAMESPACE_START` and `TOML_NAMESPACE_END` macros to help with ADL specialization scenarios
- added encoder and decoder for `toml-test` (closes #125)
@marzer marzer added implemented in v3 Fixes + features which were implemented in v3 release. and removed v3 labels Jan 3, 2022
@marzer
Copy link
Owner

marzer commented Jan 3, 2022

@moorereason I've implemented this in the v3 branch - see v3/toml-test for setup info.

Every(?) TOML package I've looked at generates local unit tests from the suite.

Just re-read this. How do you figure that? It certainly isn't true for the C++ TOML ecosystem. In fact toml++ is the only one that chose that route, as far as I can tell - the others (the still-maintained ones, at least) appear to have done the sensible in the first place and built toml-test decoders 😄

In any case, toml++ has now joined the club and provided both an encoder and decoder for toml-test. Let me know how you get on with it.

@moorereason
Copy link
Author

Sweet! I'm trying to get through my existing corpus, and I'm getting a UTF-8 error on a case.

Given an input of 0="\u0800", the tt_decoder returns:

terminate called after throwing an instance of 'nlohmann::detail::type_error'
  what():  [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0x80

Not sure what to do with that.

@marzer
Copy link
Owner

marzer commented Jan 4, 2022

O_o

Are you using any custom tests that aren't currently a part of toml-test? I built it from master today and didn't experience that.

If so, can you share the toml/json files?

@moorereason
Copy link
Author

Are you using any custom tests that aren't currently a part of toml-test?

Yes, but I'm doing differential fuzzing against other toml-test decoder implementations via STDIN and comparing the results.

The TOML document in question is 10 characters long:

0="\u0800"

I've tested that input with 6 other decoders (in Go, C99, C++, Fortran, and Dart), and they all return valid and equal results.

@marzer
Copy link
Owner

marzer commented Jan 4, 2022

Oh, right, fuzzing. Ok well I'll do some digging and see what's going on.

@marzer
Copy link
Owner

marzer commented Jan 4, 2022

Ah, I see, it's the JSON library I'm using. Doesn't like the JSON generated from that snippet. Guess I need to RTFM to figure out why 😅

marzer added a commit that referenced this issue Jan 4, 2022
also:
- fixed extended-precision fractional times causing parse error instead of truncating per the spec (closes #127)
- fixed some non-spec vertical whitespace being accepted as line breaks (closes #128)
- added `format_flags::allow_unicode_strings`
@marzer
Copy link
Owner

marzer commented Jan 4, 2022

@moorereason That's fixed now in the v3 branch.

@marzer
Copy link
Owner

marzer commented Jan 4, 2022

Oh, since you're adding new upstream tests to toml-test, you might like to know that the unicode-related crash in the JSON lib was actually caused by my lib converting the unicode scalar escape sequence \u0800 into the wrong utf-8 byte sequence.

I had some unit tests in place for those conversions but they weren't broad enough. I've added a bunch more now:

SECTION("github/issues/125") // https://github.com/marzer/tomlplusplus/issues/125
{
parse_expected_value(FILE_LINE_ARGS, R"("\u0800")"sv, "\xE0\xA0\x80"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u7840")"sv, "\xE7\xA1\x80"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\uAA23")"sv, "\xEA\xA8\xA3"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\uA928")"sv, "\xEA\xA4\xA8"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u9CBF")"sv, "\xE9\xB2\xBF"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u2247")"sv, "\xE2\x89\x87"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u13D9")"sv, "\xE1\x8F\x99"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u69FC")"sv, "\xE6\xA7\xBC"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u8DE5")"sv, "\xE8\xB7\xA5"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u699C")"sv, "\xE6\xA6\x9C"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u8CD4")"sv, "\xE8\xB3\x94"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u4ED4")"sv, "\xE4\xBB\x94"sv);
parse_expected_value(FILE_LINE_ARGS, R"("\u2597")"sv, "\xE2\x96\x97"sv);
}
- might be worth adding a similar set to toml-test (I don't know if it would fit with the test format though)

@marzer marzer closed this as completed in f3bd22b Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or enhancement request implemented in v3 Fixes + features which were implemented in v3 release.
Projects
None yet
Development

No branches or pull requests

2 participants