Text parser optimization (~4.5x perf) #282

mfpierre · 2018-06-07T13:10:51Z

Hi,

As we extensively use this library for parsing prometheus text format, we have observed that for big payloads (like https://github.com/kubernetes/kube-state-metrics for a big cluster) that can contains 400k+ lines the text parsing can be quite slow. (up to 27secs)

We tried to optimize the parser by dropping the state machine and leveraging native python functions such as index or find and it gives us an average of x5 performances.

Here are some benchmark using timeit:

call (x100000): _parse_sample('simple_metric 1.513767429e+09')
previous implementation: 1.10845804214
new implementation: 0.277444839478
improvement: x3.99523755508

call (x100000): _parse_sample('kube_service_labels{label_app="kube-state-metrics",label_chart="kube-state-metrics-0.5.0",label_heritage="Tiller",label_release="ungaged-panther",namespace="default",service="ungaged-panther-kube-state-metrics"} 1')
previous implementation: 7.58089208603
new implementation: 1.48280119896
improvement: x5.11254785291

For the KSM payload (400k lines) the parsing goes from ~27sec to ~4.7sec

Note: We could go up to almost 10x performance if we dropped some edge-cases treatment (like escaping, tab/space, etc...) could we consider a "strict" parsing mode that we could optionally use for "good citizens"?

brian-brazil · 2018-06-07T13:15:40Z

could we consider a "strict" parsing mode that we could optionally use for "good citizens"?

Those aspects of the text format are not optional, they must be implemented to have a correct parser.

brian-brazil

Performance improvements would be great, however the format has many ways it can be represented and this code only parses a subset of the potential valid input. If you can manage to make it work with that, that'd be great.

brian-brazil · 2018-06-07T13:16:07Z

prometheus_client/core.py

@@ -181,6 +181,10 @@ def __eq__(self, other):
                self.type == other.type and
                self.samples == other.samples)

+    def __repr__(self):


A __str__ would make more sense I think

renamed it to __str__ and having repr call str because it's useful for comparing unit test output:

E First differing element 0: E <Metric name: a, documentation: help, type: counter, samples: [(u'a', {u'foo': u'bar'}, 1), (u'a', {u'foo': u'baez'}, 2), (u'a', {u'foo': u'buz'}, 3)]> E <Metric name: a, documentation: help, type: counter, samples: [(u'a', {u'foo': u'bar'}, 1.0), (u'a', {u'foo': u'baz'}, 2.0), (u'a', {u'foo': u'buz'}, 3.0)]> E E - [<Metric name: a, documentation: help, type: counter, samples: [(u'a', {u'foo': u'bar'}, 1), (u'a', {u'foo': u'baez'}, 2), (u'a', {u'foo': u'buz'}, 3)]>] E ? - E E + [<Metric name: a, documentation: help, type: counter, samples: [(u'a', {u'foo': u'bar'}, 1.0), (u'a', {u'foo': u'baz'}, 2.0), (u'a', {u'foo': u'buz'}, 3.0)]>] E ?

instead of

E First differing element 0: E <prometheus_client.core.CounterMetricFamily object at 0x108018c50> E <prometheus_client.core.Metric object at 0x108018cd0> E E - [<prometheus_client.core.CounterMetricFamily object at 0x108018c50>] E ? ------- ------ ^ E E + [<prometheus_client.core.Metric object at 0x108018cd0>]

Usually str calls repr rather than the other way around. repr should usually be an instantiatable version of object, while str is more human readable.

Modified repr to fit what it's supposed to be, still way more readable for tests outputs 👍

brian-brazil · 2018-06-07T13:16:45Z

prometheus_client/parser.py

-                slash = True
-            else:
-                result.append(char)
+def _replace_escaping(s):


Help and label values have different escaping rules (double quote is the difference), you need two functions for this.

added another function specific for Help

brian-brazil · 2018-06-07T13:17:13Z

prometheus_client/parser.py

-    return ''.join(result)
+def _parse_labels(labels_string):
+    labels = {}
+    # return if we don't have valid labels


Please keep comments as full sentances

brian-brazil · 2018-06-07T13:17:53Z

prometheus_client/parser.py

+
+    # we don't have labels
+    except ValueError:
+        # detect what separator is used


Any mix of any number of spaces and tabs is permitted

Added some additional unit tests around this 👍

brian-brazil · 2018-06-07T13:19:23Z

prometheus_client/parser.py

+        label_start, label_end = text.index("{"), text.rindex("}")
+        # the name is before the labels
+        name = text[:label_start].strip()
+        # we ignore the starting curly brace


there can be whitespace after the brace, and basically everywhere else between things

this should be covered, I added one additional test case in test_spaces

brian-brazil · 2018-06-07T13:20:06Z

prometheus_client/parser.py

-    state = 'name'
+    # detect the labels in the text
+    try:
+        label_start, label_end = text.index("{"), text.rindex("}")


A label value could contain a }

This already taken into account with the use of rindex, added a test case to validate this point.

brian-brazil · 2018-06-07T13:20:33Z

prometheus_client/parser.py

+        name = text[:label_start].strip()
+        # we ignore the starting curly brace
+        label = text[label_start + 1:label_end]
+        # the value is after the label end (ignoring curly brace and space)


There can be a trailing comma after the last "

should be already covered, test_commas is validating this

Signed-off-by: Pierre Margueritte <mfpierre@gmail.com>

brian-brazil · 2018-06-07T15:55:55Z

prometheus_client/parser.py

+            i = 0
+            while i < len(value_substr):
+                i = value_substr.index('"', i)
+                if value_substr[i - 1] != "\\":


What if if you have x="" as the label? i - 1 will be -1, which might have unexpected results

added a unit-test around empty labels, but this works fine 👍

Signed-off-by: Pierre <mfpierre@gmail.com>

brian-brazil · 2018-06-08T07:49:01Z

Thanks!

mfpierre · 2018-07-09T15:29:52Z

Hey @brian-brazil, any plans to do a release soon? 🙇

brian-brazil · 2018-07-09T15:34:20Z

I've added it to my todo list

brian-brazil reviewed Jun 7, 2018

View reviewed changes

mfpierre force-pushed the JulienBalestra/parser branch 5 times, most recently from 41edcb8 to a25d444 Compare June 7, 2018 14:54

Optimize parsing & ease debug/testing of the Metric class

1d7190c

Signed-off-by: Pierre Margueritte <mfpierre@gmail.com>

mfpierre force-pushed the JulienBalestra/parser branch from a25d444 to 1d7190c Compare June 7, 2018 15:00

brian-brazil reviewed Jun 7, 2018

View reviewed changes

Fix repr and add empty label test

f378b1a

Signed-off-by: Pierre <mfpierre@gmail.com>

mfpierre changed the title ~~Text parser optimization~~ Text parser optimization (~5x perf) Jun 7, 2018

mfpierre changed the title ~~Text parser optimization (~5x perf)~~ Text parser optimization (~4.5x perf) Jun 7, 2018

brian-brazil merged commit dc15164 into prometheus:master Jun 8, 2018

CharlyF mentioned this pull request Jun 22, 2018

k8s w/ ksm integration issues DataDog/datadog-agent#1853

Closed

JulienBalestra deleted the JulienBalestra/parser branch July 9, 2018 15:47

mfpierre mentioned this pull request Jul 11, 2018

Bump prometheus client library to 0.3.0 DataDog/integrations-core#1866

Merged

4 tasks

This was referenced Jul 18, 2018

Fix unescaping implementation in parser #289

Closed

Fix unescaping take 2 #291

Merged

This was referenced May 6, 2019

Openmetrics text parser performance #401

Closed

optimize openmetrics text parsing (~4x perf) #402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text parser optimization (~4.5x perf) #282

Text parser optimization (~4.5x perf) #282

mfpierre commented Jun 7, 2018

brian-brazil commented Jun 7, 2018

brian-brazil left a comment

brian-brazil Jun 7, 2018 •

edited

Loading

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil Jun 7, 2018

mfpierre Jun 7, 2018

brian-brazil commented Jun 8, 2018

mfpierre commented Jul 9, 2018

brian-brazil commented Jul 9, 2018

Text parser optimization (~4.5x perf) #282

Text parser optimization (~4.5x perf) #282

Conversation

mfpierre commented Jun 7, 2018

brian-brazil commented Jun 7, 2018

brian-brazil left a comment

Choose a reason for hiding this comment

brian-brazil Jun 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brian-brazil commented Jun 8, 2018

mfpierre commented Jul 9, 2018

brian-brazil commented Jul 9, 2018

brian-brazil Jun 7, 2018 •

edited

Loading