[new feature] introduce Coprocessor #8559

liguozhong · 2023-02-20T02:43:36Z

Is your feature request related to a problem? Please describe.

ref issue : High cardinality labels
{log_type="service_metrics"} |= "ee74f4ee-3059-4473-8ba6-94d8bfe03272"

We have counted the source distribution of our logql, and 85% of the grafana log explore queries are traceID queries.
Generally, the log time range of traceID is about 10 minutes（trace time= start～end）, but because users do not know traceID start time and end time , they usually search for 7 day log. In fact having a time range of "7d-10m" is an invalid search.

So we hope to introduce some auxiliary abilities to solve this "7d-10m" invalid search.

We have checked that in the database field, such feature have been implemented very maturely.

And our team tried to implement the preQuery Coprocessor, and achieved great success. Through this feature, we solved the problem of "loki + traceID search is very slow".

Thanks Google’s BigTable coprocessor and HBase coprocessor.
HBase coprocessor_introduction link： https://blogs.apache.org/hbase/entry/coprocessor_introduction
The idea of HBase Coprocessors was inspired by Google’s BigTable coprocessors. Jeff Dean gave a talk at LADIS ’09 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, page 66-67)

HBase Coprocessor 
The RegionObserver interface provides callbacks for:

preOpen, postOpen: Called before and after the region is reported as online to the master.
preFlush, postFlush: Called before and after the memstore is flushed into a new store file.
preGet, postGet: Called before and after a client makes a Get request.
preExists, postExists: Called before and after the client tests for existence using a Get.
prePut and postPut: Called before and after the client stores a value.
preDelete and postDelete: Called before and after the client deletes a value.
etc.

Describe the solution you'd like

Loki Coprocessor 
The QuerierObserver interface provides callbacks for:

`**preQuery**`: Called before querier , Pass (logql, start, end) 3 parameters to the Coprocessor,
 and the Coprocessor judges whether it is necessary for the querier to actually execute this query.

 For example, for traceID search,   query range = 7d + `split_queries_by_interval: 2h`. 
This logql query will actually be divided into 84 query sub-requests, and here 83 are invalid, 
and only one 2h sub-request can find the log of traceID.
We try to implement two types of Coprocessors in this scenario.

traceID Coprocessor 1 simple text analysis : 
if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be 
similar to AWS X-Ray (use timestamp )#1947》), this type of traceID information has a timestamp, 
and Coprocessor can specify a trace to execute the longest duration to cooperate 
with logql start and end 2 information quickly judges.

traceID Coprocessor 2 base tracing system: 
If the trace information exists in a certain tracing system, the Coprocessor can query the return result of the traceID 
in the tracing system once, and judge whether the logql query is 
necessary based on the time distribution in the returned result and the start and end time of logql.

`preGetChunk`,: ...do someThing .
`preGetIndex`,: ...do someThing .
etc.

The IngesterObserver interface provides callbacks for:

`preFlush`, postFlush: ...do someThing .
etc.

Describe alternatives you've considered
The problem of slow traceID search In the past six months, we tried to introduce kv system / reverse index text system/ bloomfilter to speed up logql return, but the machine cost was too high and finally gave up.

Additional context
"traceID Coprocessor 1 simple text analysis" makes loki return traceID search results in 0.9s.

The text was updated successfully, but these errors were encountered:

jeschkies · 2023-02-20T12:38:35Z

Nice. I think this calls for a LID though. Would you mind creating a pull request and add the information from this issue in the template?

liguozhong · 2023-02-21T07:31:41Z

Ok, I'll try to write a LID.

liguozhong · 2023-02-24T08:44:34Z

Nice. I think this calls for a LID though. Would you mind creating a pull request and add the information from this issue in the template?

LID: #8616
I tried to write a LID.

valyala · 2023-07-13T00:34:27Z

FYI, VictoriaLogs uses bloom filters for speeding up queries over high-cardinality phrases inside log messages such as "traceID=ee74f4ee-3059-4473-8ba6-94d8bfe03272". See these docs for details. Loki can use similar technique.

sandstrom · 2023-09-06T16:51:03Z

For reference, here is another issue discussing high-cardinality labels: #91

liguozhong mentioned this issue Feb 21, 2023

[new feature] introduce loki Coprocessor querier pre query ,And provider a golang demo XRayCoprocessor. #8568

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new feature] introduce Coprocessor #8559

[new feature] introduce Coprocessor #8559

liguozhong commented Feb 20, 2023 •

edited

Loading

jeschkies commented Feb 20, 2023

liguozhong commented Feb 21, 2023

liguozhong commented Feb 24, 2023

valyala commented Jul 13, 2023

sandstrom commented Sep 6, 2023

[new feature] introduce Coprocessor #8559

[new feature] introduce Coprocessor #8559

Comments

liguozhong commented Feb 20, 2023 • edited Loading

jeschkies commented Feb 20, 2023

liguozhong commented Feb 21, 2023

liguozhong commented Feb 24, 2023

valyala commented Jul 13, 2023

sandstrom commented Sep 6, 2023

liguozhong commented Feb 20, 2023 •

edited

Loading