add json parsing support to starlark #8168

ssoroka · 2020-09-22T22:04:47Z

eg

# Example of parsing json out of a field and modifying the metric with it.
# this is great to use in conjunction with the value parser.
#
# Example Input:
# json value="{\"label\": \"hero\", \"count\": 14}" 1465839830100400201
#
# Example Output:
# json,label=hero count=14i 1465839830100400201

load("json", "encode", "decode", "indent")

 def apply(metric):
    j = decode(metric.fields.get('value'))
    metric.fields.pop('value')
    metric.tags["label"] = j.get("label")
    metric.fields["count"] = j.get("count")
    return metric

reimda · 2020-09-22T22:33:08Z

plugins/processors/starlark/starlark.go

+
+func loadFunc(thread *starlark.Thread, module string) (starlark.StringDict, error) {
+	switch module {
+	case "json":


Suggested change

case "json":

case "starlarkjson":

This json library has a different api than the python json library. The import name here should be something other than json to avoid confusion between the two. "starlarkjson" makes sense to me since that's what it's named in its own source tree.

The import name is part of telegraf's starlark api. Whatever we choose, we'll have to stick with it to preserve script compatibility.

https://docs.python.org/3/library/json.html

interesting suggestion. Starlark is not python, and it doesn't seem we will ever have a json library other than this one. Is it a good plan to avoid all python package names? or should we expect users to adapt because they can't do anything interesting in starlark that they can do in python (no network, disk, imports, database...)?

I don't like prefixing everything with starlark (and you will end up with this ;-)) just because there is a python library with the same name. First of all I would expect people to know that they are programming starklark! That's the plugin name! Second, you will always find a python library you forgot. Imaging we add a starlark library called "foo" and at this time there is no such library in python. Soon after release someone decides to write a python library "foo" and now you have a wierd mix where you have to remember which libs to prefix and which not.

Just my 2 cent...

I understand starlark is not exactly python, but it's closely based on python and feels familiar to people who have used python. It's definitely a python derivative language and it's appealing to us and to our users almost solely because it's similar to python, not because it's different.

Most if not all of our starlark users have python experience. As soon as someone starts writing a telegraf starlark script they are confronted with the differences. Rather than hide the differences and make our users discover them on their own, one after another (which is a frustrating user experience) I think we should be clear about the things that are different.

I'm not recommending we use the Batman style of naming things (batmobile, bat signal, batarang, bat shark repellent, etc.). That would be absurd. I only suggested "starlarkjson" because that's what its author named it. I don't personally like how long the name is or that it refers to the language.

I think our approach to starlark in telegraf in general should be 1) do what the starlark project does, then 2) be compatible with python or 3) if we add something not compatible with python, avoid looking like python (this means module names, especially those in the python standard library).

After a closer look at the starlark project I found that they use "json.star" as the extension name they load it in code as module "json". It also returns starlarkjson.Module, not starlarkjson.Module.Members. This leaves the functions in the module namespace so in code it would be json.decode() instead of decode() like we have in this PR.

https://github.com/google/starlark-go/blob/6e684ef5eeee284749e13eae206cb4dd899b3472/starlark/eval_test.go#L193
https://github.com/google/starlark-go/blob/6e684ef5eeee284749e13eae206cb4dd899b3472/starlark/testdata/json.star#L5

This approach addresses my concern of confusion with the python json module and seems to be what the starlark project intended. What do you two think about doing it this way?

I join the discussion, if you don't mind. I agree with @reimda that it is worth doing what starlark is aiming for.
This will make life easier for users who will clearly see what they are working with.

If you do not point out the differences, many may have issues. Of course, there will always be those who read the processor source code and see which library is being used, but most will be confused.

Or it is worth explicitly highlighting in the README that a specific library is used that differs from python - with a link to its documentation.

To be honest that's exactly my point. All starlark code I found uses json not starlarkjson, so people will be confused if they can just apply the examples they g**gled or if they have to do something different!
But anyway, I'm happy to see the module support... :-)

The second paragraph in the readme spells it out.

The Starlark language is a dialect of Python, and will be familiar to those who have experience with the Python language. However, there are major differences.

If you haven't bothered to read the Starlark plugin README, or follow one of the examples that already shows you how to use it, I don't think naming is going to save you. I prefer the shorter naming, especially if it's in common use as srebhan pointed out. Given I expect 99% of people to copy and modify the examples, I doubt this matters too much.

Here's what I've switched to:

load("starlarkjson", "json") json.decode("{}")

Sounds good. Could we switch "starlarkjson" to "json.star"? That would make it just like the examples in the starlark repo. Sorry to recommend "starlarkjson" initially.

ssoroka added 4 commits September 22, 2020 17:56

add json parsing support to starlark

da6f21d

add more description

ebdda88

go mod tidy

5cb4f0b

use shorter syntax

504336e

ssoroka requested a review from reimda September 22, 2020 22:17

reimda suggested changes Sep 22, 2020

View reviewed changes

sjwang90 added the area/starlark label Sep 25, 2020

change up the naming

7b8ca1a

ssoroka requested a review from reimda September 29, 2020 17:51

use json.star for module name

775c0e9

reimda approved these changes Sep 29, 2020

View reviewed changes

ssoroka merged commit a475d76 into master Sep 29, 2020

ssoroka deleted the starlark-json branch September 29, 2020 21:37

arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023

add json parsing support to starlark (influxdata#8168)

d8d4014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add json parsing support to starlark #8168

add json parsing support to starlark #8168

ssoroka commented Sep 22, 2020 •

edited

Loading

reimda Sep 22, 2020

ssoroka Sep 23, 2020

srebhan Sep 23, 2020

reimda Sep 23, 2020

M0rdecay Sep 23, 2020

srebhan Sep 24, 2020

ssoroka Sep 29, 2020

reimda Sep 29, 2020

add json parsing support to starlark #8168

add json parsing support to starlark #8168

Conversation

ssoroka commented Sep 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssoroka commented Sep 22, 2020 •

edited

Loading