-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add corrupted files in bad_data #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad_data/README.md
Outdated
@@ -22,3 +22,7 @@ These are files used for reproducing various bugs that have been reported. | |||
|
|||
* PARQUET-1481.parquet: tests a case where a schema Thrift value has been | |||
corrupted | |||
* arrow_issue_41321.parquet: test case of https://github.com/apache/arrow/issues/41321 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try to unify the file naming in this directory? We already have PARQUET-1481.parquet
(a JIRA reference) so perhaps something like ARROW-GH-41321.parquet
? (related: #57)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me edit it 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I would merge this in 1 day if no negative comment. 7.1k is a bit large here but it's not too large since generate a file like this is also hard. |
Merged! Thanks all! |
Those 2 files triggered libparquet c++ issues apache/arrow#41317 and apache/arrow#41321 . They have been generated through a local run of oss-fuzz on synthetic test data of the GDAL regression test suite, and can be licensed under Apache-2