-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-33601 Document lz4s and lz4shc index compression and options #19605
base: candidate-9.10.x
Are you sure you want to change the base?
HPCC-33601 Document lz4s and lz4shc index compression and options #19605
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33601 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few comments inline
role="bold">'inplace:lz4shc'</emphasis> </emphasis></entry> | ||
|
||
<entry>The default compression in versions after versions 9.6.90, | ||
9.8.66 ,and 9.10.12. Causes inplace compression on the key fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing s/b: " 9.8.66, and "
Lempel-Ziv-Welch algorithm. It remains the default for backward | ||
compatibility.</entry> | ||
<entry>A variant of the Lempel-Ziv-Welch algorithm. This was the | ||
the default compression prior to versions 9.6.90, 9.8.66 ,and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing s/b: " 9.8.66, and "
Lempel-Ziv-Welch algorithm. It remains the default for backward | ||
compatibility.</entry> | ||
<entry>A variant of the Lempel-Ziv-Welch algorithm. This was the | ||
the default compression prior to versions 9.6.90, 9.8.66 ,and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing s/b: " 9.8.66, and "
@@ -412,18 +433,82 @@ BUILD(VehicleKey3); | |||
without decompression. The original index compression implementation | |||
decompresses the rows when they are read from disk.</para> | |||
|
|||
<para>The inplace index compression format (introduced in versions 9.6.90, | |||
9.8.66 ,and 9.10.12 9.2.0 or later) improves compression and reduces build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing, comma s/b: " 9.8.66, 9.10.12, and 9.2.0 or "
9.8.66 ,and 9.10.12 9.2.0 or later) improves compression and reduces build | ||
time. These formats require an engine that supports it. In other words, | ||
<emphasis role="bold">if you build an index using the lz4s or lz4shc | ||
formats, you must use a platform later than 9.6.90, 9.8.66 ,and 9.10.12 to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing s/b: " 9.8.66, and "
read those indexes. </emphasis></para> | ||
|
||
<para>If you attempt to read an index with the inplace compression format | ||
on a system that does not support them, you will receive an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awkward grammar: an index -- support them
1862d88
to
762c1f6
Compare
762c1f6
to
dfa3307
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. A few comments.
@@ -441,8 +441,8 @@ | |||
|
|||
<entry><para>Optional. Specifies the index should be compressed | |||
using the type of compression specified. If omitted, the default | |||
is <emphasis role="bold">LZW</emphasis>, a variant of the | |||
Lempel-Ziv-Welch algorithm. </para></entry> | |||
is <emphasis role="bold">'inplace:lz4shc'</emphasis>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not changed yet - the default is still LZW
<entry><emphasis role="bold"><emphasis | ||
role="bold">'inplace:lz4shc'</emphasis> </emphasis></entry> | ||
|
||
<entry>The default compression in versions after versions 9.6.90, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default compression for inplace indexes.
Both this and lz4s are only supported in those versions and later.
role="bold">'inplace:lz4s'</emphasis> </emphasis></entry> | ||
|
||
<entry>Causes inplace compression on the key fields and lz4s | ||
compression on the payload. This uses the streaming API to build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the stream LZ4 API to avoid recompressing the data and reduce the index build times.
Similar for lz4shc below.
@@ -903,25 +925,86 @@ BUILD(FilterDsLib1); | |||
without decompression. The original index compression implementation | |||
decompresses the rows when they are read from disk.</para> | |||
|
|||
<para>The inplace index compression format (introduced in versions 9.6.90, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't the inplace index compression that was introduced it was already supported. I am not sure that either of these paragraphs are needed.
|
||
<para>If you attempt to read an index with the inplace compression format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is still true. Possibly should be extended with "or an inplace compression format"
<row> | ||
<entry><emphasis role="bold">hclevel</emphasis></entry> | ||
|
||
<entry>An integer between 0 and 9 to specify the level of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range is 2 to 12.
|
||
<entry>An integer between 0 and 9 to specify the level of | ||
compression. The default is 3. Higher levels increase compression | ||
times, but may be cost-effective.</entry> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Higher levels increase the compression, but also increase the compression times.
I'm not sure if the cost comment is worthwhile, but if it is it needs to be clear.
This may be cost effective depending on the length of time the data is stored, and the storage costs compared to the compute costs to build the index.
<entry><emphasis role="bold">maxrecompress</emphasis></entry> | ||
|
||
<entry>Specifies the number of times the entire input dataset | ||
should be compressed to free up space. Increasing the number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed -> recompressed
|
||
<row> | ||
<entry><emphasis role="bold"><emphasis | ||
role="bold">'inplace:lz4s'</emphasis> </emphasis></entry> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to common up this text - otherwise you will have to apply the same edits to both.
…ions Signed-off-by: Jim DeFabia <jamesdefabia@lexisnexis.com>
dfa3307
to
02b0cd8
Compare
Type of change:
Checklist:
Smoketest:
Testing:
Unit Test:
https://github.com/JamesDeFabia/github-action-dev-build/actions/runs/13777976447