Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please add support for sqlite archives #75

Closed
Qriist opened this issue May 9, 2022 · 29 comments
Closed

please add support for sqlite archives #75

Qriist opened this issue May 9, 2022 · 29 comments
Assignees
Labels
enhancement New feature or request

Comments

@Qriist
Copy link

Qriist commented May 9, 2022

Hey there, I'm requesting the addition of the sqlar_compress() and sqlar_uncompress() interfaces. They are already part of the SQLite standard and looks to be just one include file.

sqlar overview: https://www.sqlite.org/sqlar.html
inclusion code: https://sqlite.org/src/file/ext/misc/sqlar.c

Thanks in advance!

@utelle
Copy link
Owner

utelle commented May 9, 2022

Hey there, I'm requesting the addition of the sqlar_compress() and sqlar_uncompress() interfaces. They are already part of the SQLite standard and looks to be just one include file.

Actually, support for SQLite Archive Files is part of the SQLite shell. Activating this feature is possible by defining the symbol SQLITE_HAVE_ZLIB when compiling the shell (shell.c). However, obviously this adds an external dependency on the ZLIB library.

Admittedly, the shell binaries of the official SQLite distribution have this feature enabled, while the binaries coming with SQLite3 Multiple Ciphers releases currently haven't. I will consider to build binaries with SQLite Archive support.

However, I'm a bit reluctant to add the sqlar extension to the core library, as this extension only adds the sqlar_compress() and sqlar_uncompress() SQL functions without any support for the sqlar management table. And as said above it would add the dependency on the ZLIB library.

@Qriist
Copy link
Author

Qriist commented May 9, 2022

Thanks for the quick reply!

I'd like to clarify that my request title was poorly phrased, and I have no need of the shell interface whatsoever - I'm not using the shell for any part of my dev process. I am only interested in the sqlar functions.

With that in mind...

Actually, support for SQLite Archive Files is part of the SQLite shell.

The shell understands the specifically formatted sqlar table but the functions are added to the DLL itself. Accordingly, the sqlar_x() interfaces should be available to use for any transaction which desires it, on any table with any data.

I believe that any "mandatory/expected" (for lack of a better term) sqlar support ends at ensuring the functions work as intended, and does not extend to any advanced parsing of the sqlar table.

However, I do understand your reticence on adding what could be percieved as adding a half-baked feature and appreciate your consideration all the same. :)

For what it's worth, my use case is very near to the one described on the sqlar overview page: I'm making an api cache system and it'd be a huge boon to store the response text compressed inside the database, as opposed to the current uncompressed method.

@utelle
Copy link
Owner

utelle commented May 9, 2022

I'd like to clarify that my request title was poorly phrased, and I have no need of the shell interface whatsoever - I'm not using the shell for any part of my dev process. I am only interested in the sqlar functions.

Ok, I understand.

In principle, the sqlar extension could be built as a loadable extension, and the application could then load it at runtime. However, personally I usually try to avoid runtime-loadable extensions, because they impose a security risk. So, it is certainly preferrable to have the option to link extensions into the core library, as is done for quite a few other extensions already.

The question is how the dependency on ZLIB should be handled. Personally, I would prefer a solution that doesn't depend on any external library. One way to accomplish that would be to add the single-file ZLIB replacement miniz to the project.

With that in mind...

Actually, support for SQLite Archive Files is part of the SQLite shell.

The shell understands the specifically formatted sqlar table but the functions are added to the DLL itself. Accordingly, the sqlar_x() interfaces should be available to use for any transaction which desires it, on any table with any data.

That's right. The sqlar_x SQL functions just compress or decompress data, but don't depend on any specific database table.

However, I do understand your reticence on adding what could be percieved as adding a half-baked feature and appreciate your consideration all the same. :)

Well, since the extension is included in the official SQLite shell the feature can't be called half-baked, I'd say. In principle, I have no problem to include the sqlar extension, at least as an optional extension not enabled by default.

For what it's worth, my use case is very near to the one described on the sqlar overview page: I'm making an api cache system and it'd be a huge boon to store the response text compressed inside the database, as opposed to the current uncompressed method.

Being able to compress/decompress data within the database is certainly a useful feature. I will make some tests with miniz in the near future. If that works without major problems, the feature could be added in one of the next releases.

@utelle utelle added the enhancement New feature or request label May 9, 2022
@utelle utelle self-assigned this May 9, 2022
@utelle
Copy link
Owner

utelle commented May 9, 2022

In the meantime I made a few tests. The ZLIB replacement miniz works without any problems for the extensions compress, sqlar, and zipfile.

I will update the project, so that the components compress, sqlar, and zipfile can be enabled by specifying a precompiler symbol. Additionally, it will be possible to decide whether the original ZLIB library or miniz should be used. The source code for the latter will be added to the repository.

@Qriist
Copy link
Author

Qriist commented May 10, 2022

Cool! Thank you very much for entertaining this request. ^_^

@utelle
Copy link
Owner

utelle commented May 10, 2022

I expect to commit the necessary changes within the next couple of days. Stay tuned.

utelle added a commit that referenced this issue May 10, 2022
The SQLite extensions COMPRESS, SQLAR, and ZIPFILE can be optionally enabled. All these extensions depend on the ZLIB library. The MINIZ library is included as a replacement for the ZLIB library, eliminating the external dependency, if desired.
@utelle
Copy link
Owner

utelle commented May 10, 2022

Commit 5068c89 is a first implementation of your request.

To enable the SQLite extensions COMPRESS, SQLAR, and/or ZIPFILE you have to specify the respective precompiler symbol:

  • SQLITE_ENABLE_COMPRESS
  • SQLITE_ENABLE_SQLAR
  • SQLITE_ENABLE_ZIPFILE

All these extensions depend on the ZLIB library. There are 2 options:

  1. Use the ZLIB library of the system. Define the preprocessor symbol SQLITE_HAVE_ZLIB and add the library reference for the compiler and linker.
  2. Use the MINIZ replacement for ZLIB. The source files are included. Define the preprocessor symbol SQLITE3MC_USE_MINIZ=1.

Please give it a try and report any issues.

@Qriist
Copy link
Author

Qriist commented May 11, 2022

Please give it a try and report any issues.

Thanks! I've done what I can but I'm not very strong in either C or Visual Studio. While I've managed to compile the new sqlite3mc_dll, it doesn't appear to have enabled functions. I'll walk you through which steps I've taken.

To enable the SQLite extensions COMPRESS, SQLAR, and/or ZIPFILE you have to specify the respective precompiler symbol:

  • SQLITE_ENABLE_COMPRESS
  • SQLITE_ENABLE_SQLAR
  • SQLITE_ENABLE_ZIPFILE

It looks like those 3 functions are just booleans so I defined them as follows. I'm almost certain I've done this part wrong. >.>
image

  1. Use the MINIZ replacement for ZLIB. The source files are included. Define the preprocessor symbol SQLITE3MC_USE_MINIZ=1.

I added the preprocessor definition to this field in the project properties. I'm reasonably sure I've done this part right.
image

For sake of completeness, here is the build output:

Build started...
1>------ Build started: Project: sqlite3mc_dll, Configuration: Release Win64 x64 ------
1>sqlite3mc.c
1>C:\Users\Qriist\source\repos\SQLite3MultipleCiphers\src\compress.c(97,13): warning C4018: '<': signed/unsigned mismatch
1>   Creating library ..\bin\vc17\dll\release\sqlite3mc_x64.lib and object ..\bin\vc17\dll\release\sqlite3mc_x64.exp
1>sqlite3mc_vc17_dll.vcxproj -> C:\Users\Qriist\source\repos\SQLite3MultipleCiphers\bin\vc17\dll\release\sqlite3mc_x64.dll
1>Done building project "sqlite3mc_vc17_dll.vcxproj".
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

Thanks in advance for your help, and sorry for my deficiencies in this realm.

@utelle
Copy link
Owner

utelle commented May 11, 2022

I've done what I can but I'm not very strong in either C or Visual Studio. While I've managed to compile the new sqlite3mc_dll, it doesn't appear to have enabled functions.

How did you detect that the SQL functions were not enabled? What error messages did you get?

I'll walk you through which steps I've taken.

To enable the SQLite extensions COMPRESS, SQLAR, and/or ZIPFILE you have to specify the respective precompiler symbol:

  • SQLITE_ENABLE_COMPRESS
  • SQLITE_ENABLE_SQLAR
  • SQLITE_ENABLE_ZIPFILE

It looks like those 3 functions are just booleans so I defined them as follows. I'm almost certain I've done this part wrong. >.> image

This is ok, although you could define the symbols in the project settings (C/C++-Preprocessor-Preprocessor definitions) - as you did for the symbol SQLITE3MC_USE_MINIZ - instead of hardcoding them in the source.

At runtime the SQL functions

  • compress, uncompress (COMPRESS extension),
  • sqlar_compress, sqlar_uncompress (SQLAR extension),
  • zipfile (ZIPFILE extension)

will be automatically registered for any new database connection.

  1. Use the MINIZ replacement for ZLIB. The source files are included. Define the preprocessor symbol SQLITE3MC_USE_MINIZ=1.

I added the preprocessor definition to this field in the project properties. I'm reasonably sure I've done this part right. image

Yes, looks all correct. However, I would define the other symbols also here.

For sake of completeness, here is the build output:

Build started...
1>------ Build started: Project: sqlite3mc_dll, Configuration: Release Win64 x64 ------
1>sqlite3mc.c
1>C:\Users\Qriist\source\repos\SQLite3MultipleCiphers\src\compress.c(97,13): warning C4018: '<': signed/unsigned mismatch
1>   Creating library ..\bin\vc17\dll\release\sqlite3mc_x64.lib and object ..\bin\vc17\dll\release\sqlite3mc_x64.exp
1>sqlite3mc_vc17_dll.vcxproj -> C:\Users\Qriist\source\repos\SQLite3MultipleCiphers\bin\vc17\dll\release\sqlite3mc_x64.dll
1>Done building project "sqlite3mc_vc17_dll.vcxproj".
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

Apart from the warning about a sign mismatch the build succeeded. The warning can be ignored, unless you expect extremely large blobs (> (2^63-1) bytes). The warning can be eliminated easily by changing int i, j; to unsigned int i, j; in line 57 of compress.c.

In the build process I can't see anything being wrong. So please describe, what you tried thereafter, what failed, and which error messages you got.

@Qriist
Copy link
Author

Qriist commented May 11, 2022

This is ok, although you could define the symbols in the project settings (C/C++-Preprocessor-Preprocessor definitions) - as you did for the symbol SQLITE3MC_USE_MINIZ - instead of hardcoding them in the source.

Oh, I'll fix that then. You used a different phrase ("precompiler symbol") so I was unsure what you were refering to. Thanks for clarifying! :)

Apart from the warning about a sign mismatch the build succeeded. The warning can be ignored, unless you expect extremely large blobs (> (2^63-1) bytes). The warning can be eliminated easily by changing int i, j; to unsigned int i, j; in line 57 of compress.c.

I'll make that adjustment ^_^

Now for the actual SQL portion:

How did you detect that the SQL functions were not enabled? What error messages did you get?

The first thing I did was just replace the DLL in my project where I'm writing the aforementioned cache db. The way the library I'm using functions is it will return a 0 if there's a failure. Therefore, I have simple error popups while I'm in the nuts and bolts phase to know which part is specifically failing.

This is in AHK btw. I know you probably aren't fluent in AHK but you'll hopefully be able to understand the main points. 🤞


/*	prepares the SQL insertion, working as intended*/
this.acDB.exec("BEGIN TRANSACTION;")
if !this.acDB.exec(SingleRecordSQL("cacheTable",insObj,"fingerprint"))	
	msgbox % "insObj failure`n" acDB.ErrorCode

/*	working code that doesn't compress - currently commented out
;last_insert_rowid() will select the just-created diskId from cacheTable
this.acDB.exec("INSERT INTO sqlar(name,mode,mtime,sz,data) VALUES ((SELECT last_insert_rowid()),777," timestamp ",length(" sqlQuote(post) ")," sqlQuote(post) ");")
	msgbox % "non-compressed data failure`n" acDB.ErrorCode
*/

/*	statment intended to replace the above one with - identical except for the sqlar_compress() in the final parameter  */
if !this.acDB.exec("INSERT INTO sqlar(name,mode,mtime,sz,data) VALUES ((SELECT last_insert_rowid()),777," timestamp ",length(" sqlQuote(post) "),sqlar_compress(" sqlQuote(post) "));")	;last_insert_rowid() will select the just-created diskId from cacheTable
	msgbox % "sqlar_compress failure`n" acDB.ErrorCode
			;;;;	^---This is where the insertion fails. 
			;;;;     I'm not sure what the exact error is because my library isn't returning it. :-/

this.acDB.exec("COMMIT;")			

/* side note: I tried using compress() in place of sqlar_compress() but it also didn't work. */

On a lark just now, I replaced my SQLiteStudio's ( https://sqlitestudio.pl ) dll with the sqlar one, and it immediately worked with sqlar_compress/sqlar_uncompress and compress/uncompress. It did not give me its typical "no such function" error.

There is a curiosity though: the sqlar_compress() function does not appear to be, well, compressing. I made a set of insertions using a large amount of easy-to-compress data (pastebin with relevant SQL) and SQLite Studio does not display binary data in the second row.
(quick key: NC = No Compression, SC=sqlar_compress(), CC=compress())
image

using length() also reveals the discrepency:
image

So I think we are looking at two distinct problems:

  1. I can't load the compression functions in my project - maybe there's a flag or something I need to set at runtime?
  2. sqlar_compress() appears to just be passing through the data without compressing.

Sorry for the info dump. ^^;

@Qriist
Copy link
Author

Qriist commented May 11, 2022

Apart from the warning about a sign mismatch the build succeeded. The warning can be ignored, unless you expect extremely large blobs (> (2^63-1) bytes). The warning can be eliminated easily by changing int i, j; to unsigned int i, j; in line 57 of compress.c.

made the changes per your recommendation, that particular warning persists. (Very low priority compared to the other issues - I'm just documenting.)
image

@utelle
Copy link
Owner

utelle commented May 11, 2022

Apart from the warning about a sign mismatch the build succeeded. The warning can be ignored, unless you expect extremely large blobs (> (2^63-1) bytes). The warning can be eliminated easily by changing int i, j; to unsigned int i, j; in line 57 of compress.c.

made the changes per your recommendation, that particular warning persists. (Very low priority compared to the other issues - I'm just documenting.) image

Arrgh, I was misled by my own Visual C++ compiler and was simply too quick and didn't check carefully enough, which changes are required.

  1. You must revert the change in line 57 of compress.c, that is, unsigned int i, j; back to int i, j;, otherwise you would experience an endless loop in line 63, if you call the SQL function compress.
  2. Instead change int i; to unsigned int i; in line 92 of compress.c.

Sorry for the inconveniences.

@Qriist
Copy link
Author

Qriist commented May 11, 2022

  1. You must revert the change in line 57 of compress.c, that is, unsigned int i, j; back to int i, j;, otherwise you would experience an endless loop in line 63, if you call the SQL function compress.
  2. Instead change int i; to unsigned int i; in line 92 of compress.c.

Good news!
image
That change got things to compile with zero warning... AND it fixed my own project's issue with accessing the compression functions! I'm freaking stoked! :D

So now we are just down to whatever issue remains with sqlar_compress(). I confirmed that this new DLL, in both my project and in SQLite Studio, still does not store compressed content.

Thinking out loud for a moment... We know:

  • compress() works
  • sqlar_compress() recieves the data
  • sqlar_compress() passes the data, uncompressed BUT intact

so it's probably something as simple as sqlar_compress() returning the wrong value after executing compress()

Sorry for the inconveniences.

Think nothing of it - you're the one helping me, not the other way around. :)

@utelle
Copy link
Owner

utelle commented May 11, 2022

This is ok, although you could define the symbols in the project settings (C/C++-Preprocessor-Preprocessor definitions) - as you did for the symbol SQLITE3MC_USE_MINIZ - instead of hardcoding them in the source.

Oh, I'll fix that then.

It's not a bug to define the symbols in the source code, but usually it is more convenient to set them in the project settings.

You used a different phrase ("precompiler symbol") so I was unsure what you were refering to. Thanks for clarifying! :)

Sorry, I meant to write "preprocessor", and unfortunately I don't have a DWIM-computer (DWIM = Do What I Mean). 😁

Apart from the warning about a sign mismatch the build succeeded. The warning can be ignored, unless you expect extremely large blobs (> (2^63-1) bytes). The warning can be eliminated easily by changing int i, j; to unsigned int i, j; in line 57 of compress.c.

I'll make that adjustment ^_^

Sorry, a different change is required - please see my previous post.

Now for the actual SQL portion:

How did you detect that the SQL functions were not enabled? What error messages did you get?

The first thing I did was just replace the DLL in my project where I'm writing the aforementioned cache db. The way the library I'm using functions is it will return a 0 if there's a failure. Therefore, I have simple error popups while I'm in the nuts and bolts phase to know which part is specifically failing.

Usually, SQLite's error codes and messages help to identify SQL errors.

This is in AHK btw. I know you probably aren't fluent in AHK but you'll hopefully be able to understand the main points. 🤞

You are right that I don't know much about AHK, but it's certainly not rocket science.

/*
  [...]
*/

/*	statment intended to replace the above one with - identical except for the sqlar_compress() in the final parameter  */
if !this.acDB.exec("INSERT INTO sqlar(name,mode,mtime,sz,data) VALUES ((SELECT last_insert_rowid()),777," timestamp ",length(" sqlQuote(post) "),sqlar_compress(" sqlQuote(post) "));")	;last_insert_rowid() will select the just-created diskId from cacheTable
	msgbox % "sqlar_compress failure`n" acDB.ErrorCode
			;;;;	^---This is where the insertion fails. 
			;;;;     I'm not sure what the exact error is because my library isn't returning it. :-/

this.acDB.exec("COMMIT;")			

/* side note: I tried using compress() in place of sqlar_compress() but it also didn't work. */

At first glance I have no clue why the insert fails. And without knowing the SQLite error code and message it is hard to tell what is going on. You could try to issue the command in a SQLite shell or in a tool like SQLiteStudio, and see what messages you get.

On a lark just now, I replaced my SQLiteStudio's ( https://sqlitestudio.pl ) dll with the sqlar one, and it immediately worked with sqlar_compress/sqlar_uncompress and compress/uncompress. It did not give me its typical "no such function" error.

At least that is good news.

There is a curiosity though: the sqlar_compress() function does not appear to be, well, compressing.

Strange. Admittedly, up to now I tested only with the archive command of the SQLite shell and that worked as expected.

I made a set of insertions using a large amount of easy-to-compress data (pastebin with relevant SQL)

I will use your test data for further experiments.

So I think we are looking at two distinct problems:

  1. I can't load the compression functions in my project - maybe there's a flag or something I need to set at runtime?

Not that I know of. Since the extensions are statically linked with the SQLite library itself, the functions implemented by the extensions are automatically registered for each new database connection.

  1. sqlar_compress() appears to just be passing through the data without compressing.

This is really weird. I will make some more experiments to check whether I can somehow reproduce this behaviour.

@Qriist
Copy link
Author

Qriist commented May 11, 2022

your big long post responding to my big long post

I know you were probably typing when I posted my last, but most things (save sqlar_compress()'s weird behavior) seem to be fixed now. Just pushing a notification so you hopefully don't waste your time on something already fixed. :)

@utelle
Copy link
Owner

utelle commented May 11, 2022

Good news! image That change got things to compile with zero warning...

That was to be expected.

AND it fixed my own project's issue with accessing the compression functions! I'm freaking stoked! :D

Main thing is that the functions can now be accessed.

So now we are just down to whatever issue remains with sqlar_compress(). I confirmed that this new DLL, in both my project and in SQLite Studio, still does not store compressed content.

At the moment, I have no clue why compressing doesn't work as expected.

Thinking out loud for a moment... We know:

  • compress() works

The function compress compresses the data unconditionally.

  • sqlar_compress() recieves the data
  • sqlar_compress() passes the data, uncompressed BUT intact

The function sqlar_compress makes a check whether compressing really results in a smaller blob. This is described in the comment in sqlar.c:

** If the type of X is SQLITE_BLOB, and compressing that blob using
** zlib utility function compress() yields a smaller blob, return the
** compressed blob. Otherwise, return a copy of X.

However, your test data look highly compressible. Therefore I don't understand why it is stored uncompressed.

so it's probably something as simple as sqlar_compress() returning the wrong value after executing compress()

I'll try to look into this.

@utelle
Copy link
Owner

utelle commented May 11, 2022

your big long post responding to my big long post

I know you were probably typing when I posted my last,

Yes, indeed. 😄

but most things (save sqlar_compress()'s weird behavior) seem to be fixed now. Just pushing a notification so you hopefully don't waste your time on something already fixed. :)

If I understand it correctly, sqlar_compress still doesn't compress your data. That is really surprising, because the archive feature of the SQLite shell uses that function and it worked in my prior tests. I will do a few more tests using your test data. I'll let you know if I can come up with any new insights.

@Qriist
Copy link
Author

Qriist commented May 11, 2022

Usually, SQLite's error codes and messages help to identify SQL errors.

The wrapper class has a few properties directly accessible after each method call. acDB.ErrorCode up above is my attempt to get the actual error code that was, in theory, set after the failed call, along with my custom "plain-text location" that spawned the error in the first place. The now-fixed problem apparently wasn't being handled properly by the class. ¯_(ツ)_/¯

@Qriist
Copy link
Author

Qriist commented May 11, 2022

the archive feature of the SQLite shell uses that function and it worked in my prior tests.

I've never actually used the shell. I'll give it a shot myself and report what happens.

@utelle
Copy link
Owner

utelle commented May 11, 2022

Usually, SQLite's error codes and messages help to identify SQL errors.

The wrapper class has a few properties directly accessible after each method call. acDB.ErrorCode up above is my attempt to get the actual error code

SQLite's error codes are often quite generic, but most of the times the error message strings contain useful hints, what went wrong.

that was, in theory, set after the failed call, along with my custom "plain-text location" that spawned the error in the first place. The now-fixed problem apparently wasn't being handled properly by the class. ¯_(ツ)_/¯

Proper error handling can be a pain ...

@Qriist
Copy link
Author

Qriist commented May 11, 2022

SQLite shell uses that function

So I just went and grabbed the latest shell, spent a little time figuring out how to get useful results. Settled on a way to get consistent data...

I used this exact test script

As before, the first insertion is plaintext, while the second insertion uses the sqlar_compress() function.
image
image

Here's where things get weird: both insertions are the same raw size in the database. Whatever the flaw in this function is, I don't think it has anything to do with what you've very generously written...
image

I think we have uncovered an implementation bug with SQLite itself.

@utelle
Copy link
Owner

utelle commented May 11, 2022

So I just went and grabbed the latest shell, spent a little time figuring out how to get useful results. Settled on a way to get consistent data...

I used this exact test script

Thanks for providing the test script. I will use it to track down what exactly is going on. This may take some time...

As before, the first insertion is plaintext, while the second insertion uses the sqlar_compress() function.

Here's where things get weird: both insertions are the same raw size in the database. Whatever the flaw in this function is, I don't think it has anything to do with what you've very generously written...

It is indeed very unlikely that the problem has anything to do with the encryption feature. Nevertheless, I'm curious why there is a problem with this function and what's causing it.

I think we have uncovered an implementation bug with SQLite itself.

Maybe, but we have to prove it, if possible.

@utelle
Copy link
Owner

utelle commented May 11, 2022

Well, in the meantime I checked what's going on. Actually, we didn't find a bug in SQLite.

I think the problem arises from the somewhat misleading description of the SQLite Archive Feature:

  • The sqlar_compress(X) function attempts to compress a copy of the string or blob X using the Default algorithm and returns the result as a blob. If the input X is incompressible, then a copy of X is returned. This routine is used when inserting content into an SQLite Archive.

One can get the impression that the argument X of sqlar_compress can be a text string or a blob and will be compressed in both cases, if possible. However, this is not true. If X is not a blob, sqlar_compress will always simply return a copy of X.

Your application must take care of passing the data as a blob. That is, you will have to write the call to sqlar_compress as follows:

sqlar_compress(CAST('very long textstring' AS BLOB))

The comment in the implementation file sqlar.c makes clear that the function argument must be a blob, if any compression should take place:

  • If the type of X is SQLITE_BLOB, and compressing that blob using zlib utility function compress() yields a smaller blob, return the compressed blob. Otherwise, return a copy of X.

@Qriist
Copy link
Author

Qriist commented May 12, 2022

I think the problem arises from the somewhat misleading description of the SQLite Archive Feature

  • The sqlar_compress(X) function attempts to compress a copy of the string or blob X using the Default algorithm and returns the result as a blob.

One can get the impression that the argument X of sqlar_compress can be a text string or a blob and will be compressed in both cases, if possible. However, this is not true. If X is not a blob, sqlar_compress will always simply return a copy of X.

Yeah, that overview documentation is jank! So much time wasted due to a poor description. -.-

sqlar_compress(CAST('very long textstring' AS BLOB))

That worked!
cmd_sVBoIWOCx1

We're finally past the compression stage, lol.

...Which brings me to the next issue: sqlar_uncompress() fails on stored blobs. It does, however, return stored strings as expected. (BC = Blob Cast, current test script)
cmd_sVBoIWOCx1

This failure occurs even when the BC insert is the only row, achievable by commenting/deleting the NC+SC inserts
cmd_sVBoIWOCx1

@Qriist
Copy link
Author

Qriist commented May 12, 2022

UPDATE: ignore all that. There was an errant tab character in my test data.
brave_jJVSUZkhYa

Removing that fixes the uncompress() issue:
cmd_XoC18oV9fc

I think we've solved everything!

@utelle
Copy link
Owner

utelle commented May 12, 2022

I think we've solved everything!

Good news. I will soon prepare a release and then will close this issue.

@Qriist
Copy link
Author

Qriist commented May 12, 2022

Thank you for all of your hard work on this obscure function! I really appreciate it. ^_^

@Qriist
Copy link
Author

Qriist commented May 15, 2022

Quick status report: the sqlar features are working flawlessly! 26k+ individual row inserts comprising data pulled from api endpoints and every one was as fast (or maybe even faster!) than just dumping to disk, with individual record reads being virtually instantaneous. The compressed database comes out to around 40% of uncompressed size - a noticeable reduction!

Thanks again for taking the time to enable this - was a huge help.

PS: I figured you might find my database schema interesting, given that I'm your first use case. ;)

Everything is tracked by the fingerprint, which is a concatonated hash of the remote url and any outgoing headers the user sets. Getting useful data is as simple as SELECTing from vRecords based on any of the criteria you'd normally look for.

I'm not using the strict sqlar spec in my project because it didn't make sense to split up the two blobs I'm maintaining per row (plus there was a massive performance hit when I experimented with doing so.)

-- Table: simpleCacheTable
CREATE TABLE simpleCacheTable (
    fingerprint       TEXT    PRIMARY KEY
                              UNIQUE,
    url               TEXT,
    headers           TEXT,
    responseHeaders   BLOB,
    responseHeadersSz INTEGER,
    timestamp         INTEGER,
    expiry            INTEGER,
    mode              INTEGER,
    data              BLOB,
    dataSz            INTEGER
);


-- View: vRecords
CREATE VIEW vRecords AS
    SELECT fingerprint,
           url,
           headers,
           sqlar_uncompress(responseHeaders, responseHeadersSz) AS responseHeaders,
           timestamp,
           expiry,
           sqlar_uncompress(data, dataSz) AS data
      FROM simpleCacheTable;

@utelle
Copy link
Owner

utelle commented May 16, 2022

Quick status report: the sqlar features are working flawlessly!

I'm glad to hear that the feature now works to your expectations.

PS: I figured you might find my database schema interesting, given that I'm your first use case. ;)

Thanks for describing your use case. This will most likely help other, too, to use this feature.

Today I made a new release, Version 1.4.4, that includes the source code of the extensions COMPRESS, SQLAR, and ZIPFILE (as well as the source code of the TCL interface asked for by another developer). With this release I think this issue can be closed.

@utelle utelle closed this as completed May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants