Save performance metrics #137

jfontan · 2018-03-25T20:29:41Z

It would be good to have saved performance metrics that we can use to understand the changes. As mentioned in #36 these metrics can be saved in Session or in Context.

Some of the data we can save is:

Time spent executing nodes
Memory consumption before and after node execution
Cache hit/miss
Child nodes to pinpoint slower paths

The proposal is adding a Performance interface to base session or context that can be used to save this data. It's done as an interface so we can have several implementations, some can get more info than others or disable collecting data in case it's too expensive.

type Performance interface {
	// Called once per query to reset the counters
	Reset()

	// Marks the start of one node execution, returns an id as there can be several
	// nodes running at the same time
	Start(nodeName, arguments string) (id int)

	// Marks the start of one node execution
	End(id int)

	// This can be used to save other values like cache hit and miss rates
	ExtraData(id int, name string, value int)

	// Gets the collected data in some structured format like json
	GetData() string
}

Then all the nodes have to be instrumented to save this data:

func (ib *IsBinary) Eval(
	ctx *sql.Context,
	row sql.Row,
) (interface{}, error) {
	// Start collecting data
	perID := ctx.Performance.Start("IsBinary", ib.String())
	defer ctx.Performance.End(perID)

	v, err := ib.Child.Eval(ctx, row)

	[...]

	blob, err := sql.Blob.Convert(v)
	if err != nil {
		return nil, err
	}

	// Add extra information that may be useful
	ctx.Performance.ExtraData(perID, "blob size", blob.Size())

	return isBinary(blob.([]byte)), nil
}

Extra tooling should be created to read and analyze this data. It would also be interesting to automate running tests with old and new code (for example master and PR) and comparing the data to find regressions.

Thoughts? @ajnavarro, @erizocosmico, @mcarmonaa

The text was updated successfully, but these errors were encountered:

ajnavarro · 2018-03-26T10:52:12Z

I like the idea and it fits on the gitbase backlog. Just a couple of changes:

The name of the interface should be something like Metrics
The first implementation of that interface should be using open metrics format, on that case using https://github.com/jaegertracing/jaeger
Check deeply the interface methods comparing them with the jaeger API and be sure that we`ll able to use all the functionality that we want to use from jaeger.

erizocosmico · 2018-03-26T10:55:37Z

Instead of jaeger, we could use opencensus (https://github.com/census-instrumentation/opencensus-go), which has exporters for jaeger and many more, if we want to avoid vendor lock-in.

OpenCensus already provides most of the functionality that's described in that interface, so we would not need to reinvent the wheel.

ajnavarro · 2018-03-26T11:19:19Z

@smola , what do you think about use opencensus instead of jaeger? for me is ok.

smola · 2018-03-26T13:54:03Z

Note that jaeger provides OpenTracing API, and if you limit yourself to that, it's not vendor-locking. On the other hand, OpenCensus has its own standard and implementations only for Golang and Java (so far). Which might be good for us. It also provides wider APIs (e.g. metrics).

Whether we go with OpenTracing or OpenCensus, we can choose Jaeger/Zipkin/Other as server.

erizocosmico · 2018-03-27T12:50:14Z

Which one did we ended up choosing?

ajnavarro · 2018-03-27T13:37:37Z

Jaeger because is compatible with OpenTracin API

Fixes src-d#137 This commit adds tracing of spans of execution in all nodes and expressions. Since expressions and functions evaluation is immediate, spans are finished inside the `Eval` function with a defer. In the case of plan nodes it's a bit more complex, as they just return an iterator that will be lazily evaluated. Measuring the span as just the time it takes to return the iterator would be wrong, so what it does is wrap all iterators with a `spanIter`, which iterates over an internal iterator and finishes the span when it's needed (there is an error, io.EOF is returned or `Close` is called) pushing some metrics like the number of processed rows or the error without needing the user to provide them. Signed-off-by: Miguel Molina <miguel@erizocosmi.co>

jfontan added the proposal proposal for new additions or changes label Mar 25, 2018

erizocosmico self-assigned this Mar 27, 2018

This was referenced Mar 27, 2018

*: implement span tracing in sql.Context #138

Merged

sql/analyzer: propagate context throughout the analyzer #139

Merged

erizocosmico mentioned this issue Mar 28, 2018

sql/(plan,expression,parse,analyzer): trace node execution #140

Merged

erizocosmico added enhancement New feature or request performance Performance improvements labels Mar 28, 2018

erizocosmico closed this as completed in #140 Apr 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save performance metrics #137

Save performance metrics #137

jfontan commented Mar 25, 2018

ajnavarro commented Mar 26, 2018 •

edited

Loading

erizocosmico commented Mar 26, 2018 •

edited

Loading

ajnavarro commented Mar 26, 2018

smola commented Mar 26, 2018

erizocosmico commented Mar 27, 2018

ajnavarro commented Mar 27, 2018

Save performance metrics #137

Save performance metrics #137

Comments

jfontan commented Mar 25, 2018

ajnavarro commented Mar 26, 2018 • edited Loading

erizocosmico commented Mar 26, 2018 • edited Loading

ajnavarro commented Mar 26, 2018

smola commented Mar 26, 2018

erizocosmico commented Mar 27, 2018

ajnavarro commented Mar 27, 2018

ajnavarro commented Mar 26, 2018 •

edited

Loading

erizocosmico commented Mar 26, 2018 •

edited

Loading