Clarify the name tokeniser uncomp_len calculation (PR #803) #803

jkbonfield · 2025-01-07T14:45:07Z

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0').

Fixes #802

github-actions · 2025-01-07T14:48:22Z

Changed PDFs as of bece1f7: CRAMcodecs (diff).

CRAMcodecs.tex

cmnbroad · 2025-01-07T15:04:26Z

Looks good - thanks!

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802

github-actions · 2025-01-07T16:04:22Z

Changed PDFs as of 4982e03: CRAMcodecs (diff).

zaeleus · 2025-01-22T20:14:31Z

CRAMcodecs.tex

+the number of read names.  This is followed the array elements
+themselves.  Note the uncompressed size is calculated as the sum of


The serialised data stream starts with two unsigned little endian 32-bit integers... This is followed the array elements themselves.

This is unrelated to the length calculation, but note there is also a 1 byte flag between the 2 integers and the data stream:

Bytes Type Name

4 uint32 uncomp_length

4 uint32 num_reads

1 uint8 use_arith

Agreed. It also had a poor and nebulous "array elements" term which I expanded on.

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802 Also clarify the name tokeniser serialisation description. Acknowledge the 1-byte "use_arith" field and replace the nebulous "array elements" with a more descriptive text about token streams.

github-actions · 2025-01-28T21:01:32Z

Changed PDFs as of 7de0ae0: CRAMcodecs (diff).

cmnbroad reviewed Jan 7, 2025

View reviewed changes

CRAMcodecs.tex Outdated Show resolved Hide resolved

jkbonfield added sam cram and removed sam labels Jan 7, 2025

jkbonfield added a commit to jkbonfield/hts-specs that referenced this pull request Jan 7, 2025

Clarify the name tokeniser uncomp_len calculation (PR samtools#803)

4982e03

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802

jkbonfield force-pushed the name-tok-size branch from bece1f7 to 4982e03 Compare January 7, 2025 16:02

zaeleus reviewed Jan 22, 2025

View reviewed changes

jkbonfield force-pushed the name-tok-size branch from 4982e03 to 7de0ae0 Compare January 28, 2025 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

jkbonfield commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

cmnbroad commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

zaeleus Jan 22, 2025

jkbonfield Jan 28, 2025

github-actions bot commented Jan 28, 2025

		the number of read names. This is followed the array elements
		themselves. Note the uncompressed size is calculated as the sum of

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Are you sure you want to change the base?

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Conversation

jkbonfield commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

cmnbroad commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

zaeleus Jan 22, 2025

Choose a reason for hiding this comment

jkbonfield Jan 28, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 28, 2025