`btree_index` in existing database #129

tgmichel · 2022-09-22T16:01:03Z

If I change the configuration for a column from default (btree_index = false) to btree_index = true and the db already existed, will it automatically index that column when I open the db?

The text was updated successfully, but these errors were encountered:

cheme · 2022-09-23T08:01:59Z

The db should not open because this is not a supported change (if you look in 'metadata' file in the db repo, all config written should be check on start and an error raised if one of these config is different).

Changing it is not really doable because when btree_index = false we do no store the keys, but just the hash of the keys.
Changing in the other direction is doable but not automatic (the migration should be fairly easy to write: read all key value and add them to new column (IIRC I did have some similar process in the admin to rewrite a whole column in a slightly different context).

cheme · 2022-09-23T08:05:41Z

Note that with btree_index = falseand for the state column, we can access all info (since we store directly hash the uniform = true flag indicate that we don't hash the key because we know keys are already 32 byte resulting from a proper hash function).
So we can iterate on all key value in this case, but the iteration need to be part of a migration (the primitives do not support iteration on a live database).

cheme · 2022-09-23T08:07:53Z

Probably changing a bit

parity-db/admin/src/lib.rs

Line 62 in 3e988a7

use parity_db::Options;

(to force dest to use btree_index = false) should work.
(but only when uniform = true and the hash function output is 32bytes)

tgmichel · 2022-09-23T08:43:33Z

Thank you @cheme. For an existing db that was created with the default btree_index = false column option, how can I iterate over the keys of a given column?

Also

Changing in the other direction is doable but not automatic

If that's the case shouldn't btree_index = true be the default Option?

cheme · 2022-09-23T08:49:26Z

btree_index = true is really not focus on performance. So it is slow. The point was initially to cover some needs of polkadot.
With time I see more use of this though, maybe at some point we should do some easy performance gain changes (for instance using fix size indexed table for nodes was way faster but also more code change). Or even propose a different index (I had some experimental radix ones).
But for this feature the initial scope was as small and simple as possible.

tgmichel · 2022-09-23T09:22:49Z

I see thank you. Just to clarify then, if I configured a db with default column options, I understand is not possible to iterate over the keys of a column using the public api?

cheme · 2022-09-23T09:36:22Z

I see thank you. Just to clarify then, if I configured a db with default column options, I understand is not possible to iterate over the keys of a column using the public api?

Yes you can't.
It should be possible to iterate on hash(key), but it is not implemented and not that usefull (this iteration is currently only doable in the migrate process). (for uniform column you iterate on 32 first byte of your key).
Actually I think it may become something user wants; keeping the keys attached to value. So it would be possible to iterate in a random order on both key and value.
Currently one can write the key with the value, but there is more efficient ways and better api to put in place if it becomes something needed.

tgmichel · 2022-09-23T09:57:23Z

Another question, when you mentioned btree_index = true is slow, I understand the performance penalty exists on write?

cheme · 2022-09-23T10:01:06Z

Write and read.
By design it will always be slower than btree_index = false, I think even if doing all possible optimization.
By choice it is not as efficient as it can be so it's way slower than rocksdb for instance.
It was really a small feature for a limited use case.

tgmichel · 2022-09-23T10:06:37Z

From a user prespective, I think having the possibility of iterating the keys of a column is a must for most data migrations and it would be a neat addition to the existing api.

You mentioned the case of uniform columns, which hold the original unhashed 32 first bytes of the key:

So we can iterate on all key value in this case, but the iteration need to be part of a migration (the primitives do not support iteration on a live database)

Could you elaborate a bit more on iteration need to be part of a migration? If I understood correctly only btree indexed column can provide an iterator.

cheme · 2022-09-23T10:18:46Z

From a user prespective, I think having the possibility of iterating the keys of a column is a must for most data migrations and it would be a neat addition to the existing api.

yes I hear this a bit (really should have trace the past conversation where the question arise), I also understand that it may be a bit awkward to not be able to retrieve the original keys.
cc @arkpar @Tpt

Could you elaborate a bit more on iteration need to be part of a migration? If I understood correctly only btree indexed column can provide an iterator.

Current migration iteration only look at the files and skip all intermediate in memory changes (commit overlay and maybe log overlay), so it requires to flush everything first (and complete running reindexing).
Implementing it to run live should be doable, but the implementation will be likely to be a bit awkward:

ordering will be random
due to different layer we would need to keep trace of some returned keys to avoid returning value twice (means may use a bit of memory sometime).
When column get reindexed, we would need to read from two files (or more) with redundant content

So that's why we currently only have it for migration.
Edit: I mean there is internal process to iterate on content for non btree column, but it is not exposed due to being very limited.

arkpar · 2022-09-29T14:17:33Z

As @cheme explained, migration from hash index to btree is not possible in general. Hash tables don't even store original keys, only their hashes. You'd need to perform the migration on the higher level. E.g. if your keys are block hashes you can enumerate them pu following parent hashes and reinsert into a new database.

tgmichel mentioned this issue Sep 23, 2022

Frontier DB block mapping one-to-many polkadot-evm/frontier#862

Merged

arkpar closed this as completed Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`btree_index` in existing database #129

`btree_index` in existing database #129

tgmichel commented Sep 22, 2022

cheme commented Sep 23, 2022

cheme commented Sep 23, 2022

cheme commented Sep 23, 2022 •

edited

Loading

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022 •

edited

Loading

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022 •

edited

Loading

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022 •

edited

Loading

arkpar commented Sep 29, 2022

btree_index in existing database #129

btree_index in existing database #129

Comments

tgmichel commented Sep 22, 2022

cheme commented Sep 23, 2022

cheme commented Sep 23, 2022

cheme commented Sep 23, 2022 • edited Loading

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022 • edited Loading

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022 • edited Loading

tgmichel commented Sep 23, 2022

cheme commented Sep 23, 2022 • edited Loading

arkpar commented Sep 29, 2022

`btree_index` in existing database #129

`btree_index` in existing database #129

cheme commented Sep 23, 2022 •

edited

Loading

cheme commented Sep 23, 2022 •

edited

Loading

cheme commented Sep 23, 2022 •

edited

Loading

cheme commented Sep 23, 2022 •

edited

Loading