Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33601 Document lz4s and lz4shc index compression and options #19605

Open
wants to merge 1 commit into
base: candidate-9.10.x
Choose a base branch
from

Conversation

JamesDeFabia
Copy link
Contributor

@JamesDeFabia JamesDeFabia commented Mar 11, 2025

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Unit Test:
https://github.com/JamesDeFabia/github-action-dev-build/actions/runs/13777976447

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33601

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Member

@g-pan g-pan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments inline

role="bold">'inplace:lz4shc'</emphasis> </emphasis></entry>

<entry>The default compression in versions after versions 9.6.90,
9.8.66 ,and 9.10.12. Causes inplace compression on the key fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing s/b: " 9.8.66, and "

Lempel-Ziv-Welch algorithm. It remains the default for backward
compatibility.</entry>
<entry>A variant of the Lempel-Ziv-Welch algorithm. This was the
the default compression prior to versions 9.6.90, 9.8.66 ,and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing s/b: " 9.8.66, and "

Lempel-Ziv-Welch algorithm. It remains the default for backward
compatibility.</entry>
<entry>A variant of the Lempel-Ziv-Welch algorithm. This was the
the default compression prior to versions 9.6.90, 9.8.66 ,and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing s/b: " 9.8.66, and "

@@ -412,18 +433,82 @@ BUILD(VehicleKey3);
without decompression. The original index compression implementation
decompresses the rows when they are read from disk.</para>

<para>The inplace index compression format (introduced in versions 9.6.90,
9.8.66 ,and 9.10.12 9.2.0 or later) improves compression and reduces build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing, comma s/b: " 9.8.66, 9.10.12, and 9.2.0 or "

9.8.66 ,and 9.10.12 9.2.0 or later) improves compression and reduces build
time. These formats require an engine that supports it. In other words,
<emphasis role="bold">if you build an index using the lz4s or lz4shc
formats, you must use a platform later than 9.6.90, 9.8.66 ,and 9.10.12 to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing s/b: " 9.8.66, and "

read those indexes. </emphasis></para>

<para>If you attempt to read an index with the inplace compression format
on a system that does not support them, you will receive an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awkward grammar: an index -- support them

Copy link
Member

@g-pan g-pan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A few comments.

@@ -441,8 +441,8 @@

<entry><para>Optional. Specifies the index should be compressed
using the type of compression specified. If omitted, the default
is <emphasis role="bold">LZW</emphasis>, a variant of the
Lempel-Ziv-Welch algorithm. </para></entry>
is <emphasis role="bold">'inplace:lz4shc'</emphasis>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not changed yet - the default is still LZW

<entry><emphasis role="bold"><emphasis
role="bold">'inplace:lz4shc'</emphasis> </emphasis></entry>

<entry>The default compression in versions after versions 9.6.90,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default compression for inplace indexes.
Both this and lz4s are only supported in those versions and later.

role="bold">'inplace:lz4s'</emphasis> </emphasis></entry>

<entry>Causes inplace compression on the key fields and lz4s
compression on the payload. This uses the streaming API to build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the stream LZ4 API to avoid recompressing the data and reduce the index build times.

Similar for lz4shc below.

@@ -903,25 +925,86 @@ BUILD(FilterDsLib1);
without decompression. The original index compression implementation
decompresses the rows when they are read from disk.</para>

<para>The inplace index compression format (introduced in versions 9.6.90,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't the inplace index compression that was introduced it was already supported. I am not sure that either of these paragraphs are needed.


<para>If you attempt to read an index with the inplace compression format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is still true. Possibly should be extended with "or an inplace compression format"

<row>
<entry><emphasis role="bold">hclevel</emphasis></entry>

<entry>An integer between 0 and 9 to specify the level of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range is 2 to 12.


<entry>An integer between 0 and 9 to specify the level of
compression. The default is 3. Higher levels increase compression
times, but may be cost-effective.</entry>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher levels increase the compression, but also increase the compression times.

I'm not sure if the cost comment is worthwhile, but if it is it needs to be clear.

This may be cost effective depending on the length of time the data is stored, and the storage costs compared to the compute costs to build the index.

<entry><emphasis role="bold">maxrecompress</emphasis></entry>

<entry>Specifies the number of times the entire input dataset
should be compressed to free up space. Increasing the number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed -> recompressed


<row>
<entry><emphasis role="bold"><emphasis
role="bold">'inplace:lz4s'</emphasis> </emphasis></entry>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to common up this text - otherwise you will have to apply the same edits to both.

…ions

Signed-off-by: Jim DeFabia <jamesdefabia@lexisnexis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants