You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a warc.gz file fails with an error - i can't see the detailed errors message:
e.g.
/home/prod/warc validate ./470779-431-20241101123640123-00000-kb-prod-har-013.kb.dk.warc.gz
Total time: 22.941703205s, files: 1, records: 25546, processed: 25546, errors: 1, duplicates: 0
Is that not possible?
When I use jwat i get following:
Summary of '/home/release_software_dist/PROD/har-013/470779-431-20241101123640123-00000-kb-prod-har-013.kb.dk.warc.gz'
Exception while processing '/home/release_software_dist/PROD/har-013/470779-431-20241101123640123-00000-kb-prod-har-013.kb.dk.warc.gz'
StartOffset: 96236201 (0x5bc72a9)
Offset: 96280576 (0x5bd2000)
java.io.IOException: java.util.zip.DataFormatException: Data missing!
at org.jwat.gzip.GzipReader$GzipEntryInputStream.read(GzipReader.java:645)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at org.jwat.common.FixedLengthInputStream.read(FixedLengthInputStream.java:103)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at java.security.DigestInputStream.read(DigestInputStream.java:161)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:119)
at org.jwat.archive.ManagedPayload.manageRecord(ManagedPayload.java:254)
at org.jwat.archive.ManagedPayload.manageWarcRecord(ManagedPayload.java:219)
at org.jwat.tools.tasks.test.TestFile2.apcWarcRecordStart(TestFile2.java:199)
at org.jwat.archive.ArchiveParser.parse(ArchiveParser.java:154)
at org.jwat.tools.tasks.test.TestFile2.processFile(TestFile2.java:58)
at org.jwat.tools.tasks.test.TestTask$TaskRunnable.run(TestTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.zip.DataFormatException: Data missing!
at org.jwat.gzip.GzipReader.readInflated(GzipReader.java:566)
at org.jwat.gzip.GzipReader$GzipEntryInputStream.read(GzipReader.java:641)
... 33 more
-->
The text was updated successfully, but these errors were encountered:
where the output includes the error detail but the summary info for all files has been removed.
v3 is a refactor of v2 where the output logic has been simplified. The summary information will come back at some point. In addition we would like the exit code of the program to reflect the validation status such that warcaeology would be more suitable to be used in scripts.
When a warc.gz file fails with an error - i can't see the detailed errors message:
e.g.
/home/prod/warc validate ./470779-431-20241101123640123-00000-kb-prod-har-013.kb.dk.warc.gz
Total time: 22.941703205s, files: 1, records: 25546, processed: 25546, errors: 1, duplicates: 0
Is that not possible?
When I use jwat i get following:
Summary of '/home/release_software_dist/PROD/har-013/470779-431-20241101123640123-00000-kb-prod-har-013.kb.dk.warc.gz'
GZip.Warnings: 0
Warc.isValid: true
Warc.Records: 25546
Warc.Errors: 0
Warc.Warnings: 0
Job summary
GZip files: 0
Arc files: 0
Warc files: 0
Errors: 0
Warnings: 0
RuntimeErr: 1
Skipped: 0
Time: 00:00:37 (37507 ms.)
TotalBytes: 91.8 mb
AvgBytes: 2.4 mb/s
Exception while processing '/home/release_software_dist/PROD/har-013/470779-431-20241101123640123-00000-kb-prod-har-013.kb.dk.warc.gz'
StartOffset: 96236201 (0x5bc72a9)
Offset: 96280576 (0x5bd2000)
java.io.IOException: java.util.zip.DataFormatException: Data missing!
at org.jwat.gzip.GzipReader$GzipEntryInputStream.read(GzipReader.java:645)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at org.jwat.common.FixedLengthInputStream.read(FixedLengthInputStream.java:103)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at java.security.DigestInputStream.read(DigestInputStream.java:161)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:124)
at org.jwat.common.ByteCountingPushBackInputStream.read(ByteCountingPushBackInputStream.java:119)
at org.jwat.archive.ManagedPayload.manageRecord(ManagedPayload.java:254)
at org.jwat.archive.ManagedPayload.manageWarcRecord(ManagedPayload.java:219)
at org.jwat.tools.tasks.test.TestFile2.apcWarcRecordStart(TestFile2.java:199)
at org.jwat.archive.ArchiveParser.parse(ArchiveParser.java:154)
at org.jwat.tools.tasks.test.TestFile2.processFile(TestFile2.java:58)
at org.jwat.tools.tasks.test.TestTask$TaskRunnable.run(TestTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.zip.DataFormatException: Data missing!
at org.jwat.gzip.GzipReader.readInflated(GzipReader.java:566)
at org.jwat.gzip.GzipReader$GzipEntryInputStream.read(GzipReader.java:641)
... 33 more
-->
The text was updated successfully, but these errors were encountered: