Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add corrupted files in bad_data #48

Merged
merged 4 commits into from
Aug 15, 2024
Merged

Conversation

rouault
Copy link
Contributor

@rouault rouault commented May 8, 2024

Those 2 files triggered libparquet c++ issues apache/arrow#41317 and apache/arrow#41321 . They have been generated through a local run of oss-fuzz on synthetic test data of the GDAL regression test suite, and can be licensed under Apache-2

@mapleFU mapleFU requested review from pitrou and wgtmac August 13, 2024 10:25
@mapleFU mapleFU changed the title Add 2 corrupted files Add corrupted files in bad_data Aug 13, 2024
Copy link

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Thank you @rouault and @mapleFU

I think we may need a parquet reviewer to approve / merge this PR as well

@@ -22,3 +22,7 @@ These are files used for reproducing various bugs that have been reported.

* PARQUET-1481.parquet: tests a case where a schema Thrift value has been
corrupted
* arrow_issue_41321.parquet: test case of https://github.com/apache/arrow/issues/41321
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try to unify the file naming in this directory? We already have PARQUET-1481.parquet (a JIRA reference) so perhaps something like ARROW-GH-41321.parquet? (related: #57)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me edit it 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mapleFU
Copy link
Member

mapleFU commented Aug 14, 2024

I would merge this in 1 day if no negative comment. 7.1k is a bit large here but it's not too large since generate a file like this is also hard.

@mapleFU mapleFU requested a review from pitrou August 15, 2024 02:55
@mapleFU mapleFU merged commit 89ec47e into apache:master Aug 15, 2024
@mapleFU
Copy link
Member

mapleFU commented Aug 15, 2024

Merged! Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants