Handle a default/request pipeline and a final pipeline with minimal additional overhead #93329

joegallo · 2023-01-27T18:04:33Z

Closes #81244
Closes #92843
Closes #93118

Tightens up the document handling aspects of executePipelines and its callees. innerExecute becomes trivial, and nearly drops out (renamed to executePipeline where it remains just to adapt handler shapes).

At a high level, the execution goes from:

- pipeline 1 (default/request pipeline):
  - parse json
  - execute processors
  - generate json
- pipeline 2 (final pipeline):
  - parse json
  - execute processors
  - generate json

to

- parse json
- pipeline 1 (default/request pipeline):
  - execute processors
- pipeline 2 (final pipeline):
  - execute processors
- generate json

The difference in the flame graph is pretty clear. Before:

After:

And the performance is much better, as one would expect, with the total time spent in any ingest code for the nightly security benchmark dropping from 4994128 to 3568490 millis -- a decrease of 29%.

This is the direct follow up to #93213, but I've been working up to over a while -- #93119 and #93120 added the tests that are now made passing by this PR, while #92203, #92308, #92455 laid some of the groundwork for the eventual document listener cleanup.

It used to make sense for this to live in the ingest service, because we avoided allocating an ingest document if the pipeline was empty. Now we already have the document regardless, so this can just live in IngestDocument anyway. In any case, this would be a rare and unusual thing to have happen at all. I don't want to drop the logic completely, but I'm also not worried about the performance implications of where it lives.

elasticsearchmachine · 2023-01-27T18:04:58Z

Pinging @elastic/es-data-management (Team:Data Management)

masseyke · 2023-01-27T19:57:27Z

server/src/main/java/org/elasticsearch/ingest/IngestDocument.java

+        // shortcut if the pipeline is empty
+        if (pipeline.getProcessors().isEmpty()) {
+            handler.accept(this, null);
+            return;


Is this going to potentially confuse people who are looking at ingest metrics? I would think it would be a pretty rare case -- how much does this optimization save us? Is it worth the potential future "why are my metrics wrong?" support tickets?

What would be wrong about the metrics?

Also note that this is the same logic as before, it's just in a slightly different place. (See c2fbb08 for the deets.)

Hmm I don't know what would be wrong about the metrics -- I thought I had traced where this would impact them last week, but now I have no idea what I was seeing.

No worries, it's certainly a valid question to ask and we are indeed in a maze of twisty passages all alike. 😄

masseyke

Looks good to me!

martijnvg · 2023-01-30T18:02:40Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

-                    onFinished.onResponse(null);
+                    // update the index request's source and (potentially) cache the timestamp for TSDB
+                    updateIndexRequestSource(indexRequest, ingestDocument);
+                    cacheRawTimestamp(indexRequest, ingestDocument);


👍 nice work

joegallo added 8 commits January 27, 2023 12:41

Make these variables final

848ccd1

Drop early returns

073c2ab

Use a document listener for all this

72ff31d

Move top-level ingest stats to the top-level

822d900

Move parse/generate out of innerExecute

3ced88f

Add more detail to this error message

887ee71

Rename and move innerExecute

02f6160

joegallo added release highlight :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v8.7.0 labels Jan 27, 2023

joegallo requested a review from masseyke January 27, 2023 18:04

Update docs/changelog/93329.yaml

ae2e3ff

joegallo added the >bug label Jan 27, 2023

Update docs/changelog/93329.yaml

5b9d4a9

joegallo removed the >refactoring label Jan 27, 2023

joegallo and others added 3 commits January 27, 2023 14:48

Update docs/changelog/93329.yaml

b943913

Reword the release highlights

8bd83b8

Merge branch 'main' into ingest-service-internal-rewrite

90ba8c5

elastic deleted a comment from elasticsearchmachine Jan 27, 2023

masseyke reviewed Jan 27, 2023

View reviewed changes

joegallo added 3 commits January 27, 2023 15:31

This can be final

be9d35e

Rename these iterators for clarity

8b60a8b

Add more detail to this error message

87c64f1

masseyke approved these changes Jan 30, 2023

View reviewed changes

joegallo merged commit 3e3b271 into elastic:main Jan 30, 2023

joegallo deleted the ingest-service-internal-rewrite branch January 30, 2023 16:48

joegallo mentioned this pull request Jan 30, 2023

Ingest parses index requests twice when there is a final pipeline #81244

Closed

martijnvg reviewed Jan 30, 2023

View reviewed changes

joegallo mentioned this pull request Feb 1, 2023

Faster CollectionUtils.ensureNoSelfReferences #93433

Merged

joegallo mentioned this pull request Feb 2, 2023

Correctly handle an exception case for ingest failure #92455

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle a default/request pipeline and a final pipeline with minimal additional overhead #93329

Handle a default/request pipeline and a final pipeline with minimal additional overhead #93329

joegallo commented Jan 27, 2023 •

edited

Loading

elasticsearchmachine commented Jan 27, 2023

masseyke Jan 27, 2023

joegallo Jan 27, 2023

joegallo Jan 27, 2023 •

edited

Loading

masseyke Jan 30, 2023

joegallo Jan 30, 2023

masseyke left a comment

martijnvg Jan 30, 2023

Handle a default/request pipeline and a final pipeline with minimal additional overhead #93329

Handle a default/request pipeline and a final pipeline with minimal additional overhead #93329

Conversation

joegallo commented Jan 27, 2023 • edited Loading

elasticsearchmachine commented Jan 27, 2023

masseyke Jan 27, 2023

Choose a reason for hiding this comment

joegallo Jan 27, 2023

Choose a reason for hiding this comment

joegallo Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

masseyke Jan 30, 2023

Choose a reason for hiding this comment

joegallo Jan 30, 2023

Choose a reason for hiding this comment

masseyke left a comment

Choose a reason for hiding this comment

martijnvg Jan 30, 2023

Choose a reason for hiding this comment

joegallo commented Jan 27, 2023 •

edited

Loading

joegallo Jan 27, 2023 •

edited

Loading