Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure support for uncompressed ARC files #101

Closed
anjackson opened this issue Mar 18, 2014 · 9 comments · Fixed by iipc/webarchive-commons#14
Closed

Ensure support for uncompressed ARC files #101

anjackson opened this issue Mar 18, 2014 · 9 comments · Fixed by iipc/webarchive-commons#14
Labels
Milestone

Comments

@anjackson
Copy link
Member

Following this report on the mailing list, it seems OpenWayback does not cope with uncompressed ARC files at present.

Therefore, we need another unit test like this one (in iipc/webarchive-commons):

https://github.com/iipc/webarchive-commons/blob/master/src/test/java/org/archive/io/warc/WARCReaderFactoryTest.java

...but for ARCs.

@anjackson anjackson added this to the 2.0.0.BETA.2 Release milestone Mar 18, 2014
@anjackson anjackson added the bug label Mar 18, 2014
@egh
Copy link
Contributor

egh commented Mar 18, 2014

I added a test for this in a local branch, and it seems to work fine out of the box. I'm not sure what the issue is.

@egh
Copy link
Contributor

egh commented Mar 18, 2014

Was able to duplicate error using the blackbox example ARC, uncompressed, with the default WAR build. Let me see if I can write a test.

@egh
Copy link
Contributor

egh commented Mar 18, 2014

Iterating over uncompressed arcs work. It's probably something to do with seeking.

@hhockx hhockx mentioned this issue Mar 19, 2014
@csrster
Copy link
Contributor

csrster commented Mar 19, 2014

I have a possible fix based on a change in ResourceFactory.getResource() . Should I contribute it through the fork/branch/pull-request process ?

You can reproduce the bug in a unit test like this:

public void testGetResource() throws Exception {
String file = "myfile.arc";
Long offset = 123xxx765L;
ArcResource resource = (ArcResource) ResourceFactory.getResource(file, offset);
assertTrue(resource.getArcRecord().getPosition() < resource.getRecordLength());
}

as the bug manifests itself in reading beyond the end of the arcrecord while parsing the http headers.

csrster added a commit to netarchivesuite/openwayback-netarchivesuite that referenced this issue Mar 19, 2014
@egh
Copy link
Contributor

egh commented Mar 19, 2014

Thanks, Colin!

csrster added a commit to netarchivesuite/openwayback-netarchivesuite that referenced this issue Mar 26, 2014
…ee the comment on line 174 of ResourceFactory.java.
@egh
Copy link
Contributor

egh commented Apr 2, 2014

See closed pull request #104 .

@csrster
Copy link
Contributor

csrster commented Apr 3, 2014

All good by me.

@anjackson
Copy link
Member Author

So, trying to reproduce this in webarchive-commons along, I found that uncompressed ARCs should never have worked when invoked the fpath way. The reference pull request for webarchive-commons addresses that...

BUT it's symptoms are not consistent with the issue reported here. Not sure how to proceed unless we can make this bug more reproducible.

@anjackson
Copy link
Member Author

Probably sensible to wait for the next BETA release to re-test this issue.

@anjackson anjackson modified the milestones: 2.0.0.BETA.3 Release, 2.0.0.BETA.2 Release May 8, 2014
@kris-sigur kris-sigur modified the milestones: 2.0.0.BETA.3 Release, 2.0.0 Release Sep 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants