Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new feature] introduce Coprocessor #8559

Open
liguozhong opened this issue Feb 20, 2023 · 5 comments
Open

[new feature] introduce Coprocessor #8559

liguozhong opened this issue Feb 20, 2023 · 5 comments

Comments

@liguozhong
Copy link
Contributor

liguozhong commented Feb 20, 2023

Is your feature request related to a problem? Please describe.

ref issue : High cardinality labels
{log_type="service_metrics"} |= "ee74f4ee-3059-4473-8ba6-94d8bfe03272"

We have counted the source distribution of our logql, and 85% of the grafana log explore queries are traceID queries.
Generally, the log time range of traceID is about 10 minutes(trace time= start~end), but because users do not know traceID start time and end time , they usually search for 7 day log. In fact having a time range of "7d-10m" is an invalid search.

So we hope to introduce some auxiliary abilities to solve this "7d-10m" invalid search.

We have checked that in the database field, such feature have been implemented very maturely.

And our team tried to implement the preQuery Coprocessor, and achieved great success. Through this feature, we solved the problem of "loki + traceID search is very slow".

image
Thanks Google’s BigTable coprocessor and HBase coprocessor.
HBase coprocessor_introduction link: https://blogs.apache.org/hbase/entry/coprocessor_introduction
The idea of HBase Coprocessors was inspired by Google’s BigTable coprocessors. Jeff Dean gave a talk at LADIS ’09 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, page 66-67)

HBase Coprocessor 
The RegionObserver interface provides callbacks for:

preOpen, postOpen: Called before and after the region is reported as online to the master.
preFlush, postFlush: Called before and after the memstore is flushed into a new store file.
preGet, postGet: Called before and after a client makes a Get request.
preExists, postExists: Called before and after the client tests for existence using a Get.
prePut and postPut: Called before and after the client stores a value.
preDelete and postDelete: Called before and after the client deletes a value.
etc.

Describe the solution you'd like

Loki Coprocessor 
The QuerierObserver interface provides callbacks for:

`**preQuery**`: Called before querier , Pass (logql, start, end) 3 parameters to the Coprocessor,
 and the Coprocessor judges whether it is necessary for the querier to actually execute this query.

 For example, for traceID search,   query range = 7d + `split_queries_by_interval: 2h`. 
This logql query will actually be divided into 84 query sub-requests, and here 83 are invalid, 
and only one 2h sub-request can find the log of traceID.
We try to implement two types of Coprocessors in this scenario.

traceID Coprocessor 1 simple text analysis : 
if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be 
similar to AWS X-Ray (use timestamp )#1947》), this type of traceID information has a timestamp, 
and Coprocessor can specify a trace to execute the longest duration to cooperate 
with logql start and end 2 information quickly judges.

traceID Coprocessor 2 base tracing system: 
If the trace information exists in a certain tracing system, the Coprocessor can query the return result of the traceID 
in the tracing system once, and judge whether the logql query is 
necessary based on the time distribution in the returned result and the start and end time of logql.

`preGetChunk`,: ...do someThing .
`preGetIndex`,: ...do someThing .
etc.

The IngesterObserver interface provides callbacks for:

`preFlush`, postFlush: ...do someThing .
etc.

Describe alternatives you've considered
The problem of slow traceID search In the past six months, we tried to introduce kv system / reverse index text system/ bloomfilter to speed up logql return, but the machine cost was too high and finally gave up.

Additional context
"traceID Coprocessor 1 simple text analysis" makes loki return traceID search results in 0.9s.
image
image

@jeschkies
Copy link
Contributor

Nice. I think this calls for a LID though. Would you mind creating a pull request and add the information from this issue in the template?

@liguozhong
Copy link
Contributor Author

Ok, I'll try to write a LID.

@liguozhong
Copy link
Contributor Author

Nice. I think this calls for a LID though. Would you mind creating a pull request and add the information from this issue in the template?

LID: #8616
I tried to write a LID.

@valyala
Copy link

valyala commented Jul 13, 2023

FYI, VictoriaLogs uses bloom filters for speeding up queries over high-cardinality phrases inside log messages such as "traceID=ee74f4ee-3059-4473-8ba6-94d8bfe03272". See these docs for details. Loki can use similar technique.

@sandstrom
Copy link

For reference, here is another issue discussing high-cardinality labels: #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants