Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test 481 reports error in valgrind memcheck #612

Closed
stweil opened this issue Sep 29, 2015 · 12 comments
Closed

Test 481 reports error in valgrind memcheck #612

stweil opened this issue Sep 29, 2015 · 12 comments

Comments

@stweil
Copy link
Contributor

stweil commented Sep 29, 2015

Running the test manually with a debug version of opj_compress built from latest git code:

/usr/bin/valgrind "--tool=memcheck" --leak-check=full --show-reachable=yes --verbose --track-origins=yes --num-callers=50 bin/opj_compress -i /openjpeg-data/input/nonregression/Bretagne2.ppm -o /tmp/Bretagne2_3.j2k -EPH -M 38

Result:

==11019== Conditional jump or move depends on uninitialised value(s)
==11019==    at 0x4E5EAB5: opj_mqc_byteout (mqc.c:219)
==11019==    by 0x4E5EBAB: opj_mqc_renorme (mqc.c:241)
==11019==    by 0x4E5ED47: opj_mqc_codelps (mqc.c:269)
==11019==    by 0x4E5F196: opj_mqc_encode (mqc.c:408)
==11019==    by 0x4E67BA4: opj_t1_enc_clnpass_step (t1.c:853)
==11019==    by 0x4E6838A: opj_t1_enc_clnpass (t1.c:991)
==11019==    by 0x4E69FB9: opj_t1_encode_cblk (t1.c:1617)
==11019==    by 0x4E69CDF: opj_t1_encode_cblks (t1.c:1536)
==11019==    by 0x4E71B96: opj_tcd_t1_encode (tcd.c:2038)
==11019==    by 0x4E704CA: opj_tcd_encode_tile (tcd.c:1226)
==11019==    by 0x4E47608: opj_j2k_write_sod (j2k.c:4296)
==11019==    by 0x4E569E9: opj_j2k_write_first_tile_part (j2k.c:10472)
==11019==    by 0x4E562A4: opj_j2k_post_write_tile (j2k.c:10258)
==11019==    by 0x4E55A21: opj_j2k_encode (j2k.c:10009)
==11019==    by 0x4E604E3: opj_encode (openjpeg.c:737)
==11019==    by 0x407183: main (opj_compress.c:1864)
==11019==  Uninitialised value was created by a heap allocation
==11019==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==11019==    by 0x4E6FFAD: opj_tcd_code_block_enc_allocate_data (tcd.c:1083)
==11019==    by 0x4E6FCC5: opj_tcd_init_tile (tcd.c:1008)
==11019==    by 0x4E6FE71: opj_tcd_init_encode_tile (tcd.c:1041)
==11019==    by 0x4E55DCD: opj_j2k_pre_write_tile (j2k.c:10110)
==11019==    by 0x4E55822: opj_j2k_encode (j2k.c:9956)
==11019==    by 0x4E604E3: opj_encode (openjpeg.c:737)
==11019==    by 0x407183: main (opj_compress.c:1864)
[...]

All other reported contexts seem to be caused by the same uninitialized tile data.

@mayeut
Copy link
Collaborator

mayeut commented Sep 29, 2015

This report duplicates #438 but is much clearer. Closing #438

@stweil
Copy link
Contributor Author

stweil commented Sep 30, 2015

The bug occurs after a code sequence like this:

opj_mqc_byteout reads *bp, increments bp, writes to *bp
opj_mqc_flush increments bp without writing to *bp
opj_mqc_byteout reads *bp which was never written to

Other sequences which work look like this:

opj_mqc_byteout reads *bp, increments bp, writes to *bp
opj_mqc_flush increments bp without writing to *bp
opj_mqc_restart_init_enc decrements bp
opj_mqc_byteout reads *bp which was written to before

So the opj_mqc_restart_init_enc is missing in the error case.

@boxerab
Copy link
Contributor

boxerab commented Apr 20, 2016

This problem only occurs in BYPASS and RESTART encode modes, which are known to be broken anyways. I think we should disable these two modes until they are working correctly, and mention this in the release notes. Then the release can proceed more quickly - seems like many people are waiting for it. My 2cents.

@julienmalik
Copy link
Collaborator

See also #674

@julienmalik
Copy link
Collaborator

I took a little time to investigate further.
This uninitialized read is actually at the end of the code block buffer, not at the start.

Allocation is done here :
https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L1070-L1091

I don't have enough knowledge of the internals here. I understand p_code_block->data is supposed to store encoded stream, so I don't get why it is allocated based on a number of pixels, and there is no comment in the code in tcd.c.
Except if it supposes encoded stream is always shorter than raw data stream, which might be wrong here with BYPASS and RESTART mode ? (I have no idea what these "modes" mean).

@boxerab
Copy link
Contributor

boxerab commented Apr 22, 2016

How do you know that the uninitialized read happens at the end of the code block buffer?

@julienmalik
Copy link
Collaborator

By debugging

@boxerab
Copy link
Contributor

boxerab commented Apr 22, 2016

Well, in order to resolve this, we need to first fix bypass and restart mode

@julienmalik
Copy link
Collaborator

I tend to think they are the same. As said in #674 the out of bounds read happens only in RESTART and BYPASS mode, not with the other modes.

Extending the allocated buffer, even a lot, does not change anything.
The error happens at during encoding of the first code block, when the mq coder reaches the end of the buffer, whatever its size. It seems like in this mode the mq coder enter some sort of infinite loop and fills up the buffer.

@boxerab
Copy link
Contributor

boxerab commented Apr 22, 2016

Julien, thanks for pursuing this. Do you have time to check a few things?

  1. does this infinite loop happen for the 2.1 release from 2014? The reason I ask is perhaps the infinite loop is due to my PR that was recently merged.

Also, this is how I fixed the valgrind error in my fork:

GrokImageCompression/grok@3d9ee7a

Can you try these changes and see if the problem goes away?

Thanks!
Aaron

@boxerab
Copy link
Contributor

boxerab commented May 2, 2016

if (pass->term && bpno > 0)

is the important line. For bpno == 0, there is no termination. That is why you are seeing the encoder go into an endless loop. Simply remove bpno > 0 and it should fix the valgrind error. Of course, lossless encoding may still be lossy, but that is a bigger issue.

julienmalik added a commit to julienmalik/openjpeg that referenced this issue May 2, 2016
@julienmalik julienmalik mentioned this issue May 2, 2016
detonin added a commit that referenced this issue May 3, 2016
RESTART mode is now working, BYPASS still broken but much better
@mayeut mayeut modified the milestones: OPJ v2.2.0, OPJ v2.1.1 Jul 13, 2016
@rouault
Copy link
Collaborator

rouault commented Jun 13, 2017

Fixed per #949

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants