Implement searchv2 #3058

roman-khimov · 2024-12-18T19:26:54Z

Is your feature request related to a problem? Please describe.

I'm always frustrated when we don't have an implementation for nspcc-dev/neofs-api#314.

Describe the solution you'd like

The per-container DB should be structured like:

Given
DELIM = 0x00 // Not a valid UTF-8, can't be attribute name or value
KEY // Attribute key or special attribute that can be searched for
VALUE // Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE
PREFIXA // And B/C/D, some byte prefixes
OID // Object ID

For each object the following keys are created (with no values):
PREFIXA_OID // OIDs only to list objects without filters
PREFIXB_KEY_DELIM_VALUE_DELIM_OID // The main search workhorse, created for all keys
PREFIXC_OID_KEY_DELIM_VALUE // Auxiliary for secondary filters and additional data returned

The mechanics is:

node accepting the request forwards it to other nodes as in case of original search
it accepts answers, merges them and limits to the number requested (or max)
it generates the cursor corresponding to the result returned to client (more on cursor below)
reply is ready and sent

Each node does the following:

if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is
if there are filters, the first one is magic: whatever the key/action/value is there is used to position DB cursor into PREFIXB key if match is EQUAL/NOT_EQUAL/PREFIX/NUM* (don't forget about negative numbers)
other matches can be checked for using the same key (key>N && key <M), this can shortcut the search more quickly for numerics
iterating over the DB we check all matches via PREFIXC while the first one works (when it's not, we're done)
search is limited by the number of elements requested or max (1000), so we return results earlier if we have enough elements
we add requested fields of matching elements using PREFIXC
a cursor is returned if we're not yet at the end, it's KEY_DELIM_VALUE_DELIM_OID of the last element encoded as base64 or base58, this allows to continue easily
if the first filter is NOT_PRESENT we're also looping over PREFIXA and checking PREFIXC

Describe alternatives you've considered

SQL, various other types of DBs. But the scheme above should be sufficient for our primary cases now.

Additional context

#2990, #2757, #2989, nspcc-dev/neofs-api#306

The text was updated successfully, but these errors were encountered:

roman-khimov · 2024-12-19T09:33:06Z

Caveat: creating a cursor from merged values can be non-trivial if attribute is not included into the requested list. It can be degraded to a simple OID then (complicating continuation somewhat) and in general most of use cases do need attribute values, but still.

roman-khimov · 2024-12-19T20:22:31Z

Caveat 2: numeric values might require an additional prefix anyway since we can have Index=100500 in one object and Index=abcd in another, using the same prefix they'd be mixed and we can end up treating strings as numbers.

cthulhu-rider · 2025-01-09T07:59:12Z

note: per-container DB describes virtual structure, physically it is split within existing metabases

roman-khimov · 2025-01-09T08:30:54Z

Yes, we need to limit changes to this specific feature (expose API as early as possible) and deal with associated meta code (GC and alike) in future. Search is still possible with multiple DBs since results can be merged similar to the way results from different nodes are merged.

cthulhu-rider · 2025-01-10T10:57:42Z

Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE

choice is obvious for system fields. For example, owner ID is a string while payload size is an integer

for user-defined attributes it is not so obvious. Like here #3058 (comment). In current protocol, there is no way to determine whether user attribute is numeric or not. So, I rly doubt storing them in various formats is legit. But we can resolve this on search query processing. In original search, any non-integer attribute mismatches any numeric query. Do we wanna change this behaviour for SearchV2 somehow?

@roman-khimov u also mentioned some special prefix, could u pls elaborate on this thought?

roman-khimov · 2025-01-10T11:31:31Z

You can only do this content-based, just like you do this now for old search. The only difference is that the choice is made when processing the object instead of when processing the search request.

Special prefix means splitting PREFIXB into B1 and B2 for numeric and string data.

cthulhu-rider · 2025-01-13T13:54:15Z

if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is

shouldnt cursor be OID + values of requested attributes to sort/continue in PREFIXC in this case?

UPD: seems like no, missed this requirement https://github.com/nspcc-dev/neofs-api/blob/9f1f12866a4742adb7778c51bd632cd240f81262/object/service.proto#L554-L555

cthulhu-rider · 2025-02-03T09:01:48Z

i'd like to clarify primary seek in proposed algo. Consider objects:

ID1 Height:10 Weight:20
ID2 Height:10 Weight:10
ID3 Height:10 Weight:30

where ID1 < ID2 < ID3

request: FILTER Height>0 Count:1 Attributes:{Weight}

first resp: ID2 Weight:10 cursor:Height_10_ID2

on 2nd request, we position to ID2 in PREFIXB bucket. Then the cursor will go to ID3 and skip ID1, which is wrong: next resp item should be ID1 Weight:20 cursor:Height_10_ID1

this example shows that primary Seek() and Next() can go wrong. Instead, we should iterate over all KEY_DELIM_VALUE_DELIM* items. Or am i missing smth?

one more nuance

if last resp was ID1 Weight:20 cursor:Height_10_ID1, then node should skip ID2 and respond with ID3. If node stores all objects, it can restore Weight attribute of ID1 from the DB to compare other items against it. But if node does not store ID1, it'll respond with ID2 althoughts its Weight is less. For this purpose it would help to have a cursor with all requested attributes' values, not just the primary one

roman-khimov · 2025-02-03T12:16:37Z

Correct. We have two options here:

iterate whole primary attribute prefix again
relax ordering requirements for secondary attributes

Our primary use cases for now:

OID enumeration, isn't affected
block index a < x < b search, isn't affected
REST FilePath=smth and we need a timestamp
S3 FilePath=smth AND Type=smth AND maybe more AND please give me a lot of addtional attributes

Secondary attribute order does have some advantages for the REST/S3 cases. But to be fair both would benefit a bit more from the reverse order, since when we're talking about time stamps we usually need the latest and it's going to be the last. Implementing reverse result order is certainly not something we want now. We still need this to be simple and to be fast. Both REST and S3 cases are not very likely to produce a lot of results at the same time (very likely to fit into 1000 limit). So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement, will work good enough for current users. If we're to find other use cases we can think of (even more advanced) ordering again.

cthulhu-rider · 2025-02-03T12:21:55Z

So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement

full agree, lets start with this

There is a need to serve `ObjectService.SearchV2` RPC by the SN. In order not to expand the structure and configuration of the node, the best place to store metadata is metabase. Metabases are extended with per-container object metadata buckets. For each object, following indexes are created: - OID; - attribute->OID; - OID->attribute. Integers are stored specifically to reach lexicographic comparisons without decoding. New `Search` method is provided: it allows to filter out container's objects and receive specified attributes. Count is also limited, op is paged via cursor. In other words, the method follows SearchV2 behavior within single metabase. Refs #3058.

There is a need to serve `ObjectService.SearchV2` RPC by the SN. In order not to expand the structure and configuration of the node, the best place to store metadata is metabase. Metabases are extended with per-container object metadata buckets. For each object, following indexes are created: - OID; - attribute->OID; - OID->attribute. Integers are stored specifically to reach lexicographic comparisons without decoding. New `Search` method is provided: it allows to filter out container's objects and receive specified attributes. Count is also limited, op is paged via cursor. In other words, the method follows SearchV2 behavior within single metabase. Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

Future use-cases: - merge results from several shard's metabases; - merge results from several SNs. Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

There is a need to serve `ObjectService.SearchV2` RPC by the SN. In order not to expand the structure and configuration of the node, the best place to store metadata is metabase. Metabases are extended with per-container object metadata buckets. For each object, following indexes are created: - OID; - attribute->OID; - OID->attribute. Integers are stored specifically to reach lexicographic comparisons without decoding. New `Search` method is provided: it allows to filter out container's objects and receive specified attributes. Count is also limited, op is paged via cursor. In other words, the method follows SearchV2 behavior within single metabase. Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

Future use-cases: - merge results from several shard's metabases; - merge results from several SNs. Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

WIP Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

There is a need to serve `ObjectService.SearchV2` RPC by the SN. In order not to expand the structure and configuration of the node, the best place to store metadata is metabase. Metabases are extended with per-container object metadata buckets. For each object, following indexes are created: - OID; - attribute->OID; - OID->attribute. Integers are stored specifically to reach lexicographic comparisons without decoding. New `Search` method is provided: it allows to filter out container's objects and receive specified attributes. Count is also limited, op is paged via cursor. In other words, the method follows SearchV2 behavior within single metabase. Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>

Test cursors deeper. Many tests fail revealing implementation bugs, mainly ordering. Fixes coming soon. This deduplicates a lot of code also. Refs #3160. Refs #3058. Signed-off-by: Leonard Lyubich <leonard@morphbits.io>