Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add json parsing support to starlark #8168

Merged
merged 6 commits into from
Sep 29, 2020
Merged

add json parsing support to starlark #8168

merged 6 commits into from
Sep 29, 2020

Conversation

ssoroka
Copy link
Contributor

@ssoroka ssoroka commented Sep 22, 2020

eg

# Example of parsing json out of a field and modifying the metric with it.
# this is great to use in conjunction with the value parser.
#
# Example Input:
# json value="{\"label\": \"hero\", \"count\": 14}" 1465839830100400201
#
# Example Output:
# json,label=hero count=14i 1465839830100400201

load("json", "encode", "decode", "indent")

 def apply(metric):
    j = decode(metric.fields.get('value'))
    metric.fields.pop('value')
    metric.tags["label"] = j.get("label")
    metric.fields["count"] = j.get("count")
    return metric

@ssoroka ssoroka requested a review from reimda September 22, 2020 22:17

func loadFunc(thread *starlark.Thread, module string) (starlark.StringDict, error) {
switch module {
case "json":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "json":
case "starlarkjson":

This json library has a different api than the python json library. The import name here should be something other than json to avoid confusion between the two. "starlarkjson" makes sense to me since that's what it's named in its own source tree.

The import name is part of telegraf's starlark api. Whatever we choose, we'll have to stick with it to preserve script compatibility.

https://docs.python.org/3/library/json.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting suggestion. Starlark is not python, and it doesn't seem we will ever have a json library other than this one. Is it a good plan to avoid all python package names? or should we expect users to adapt because they can't do anything interesting in starlark that they can do in python (no network, disk, imports, database...)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like prefixing everything with starlark (and you will end up with this ;-)) just because there is a python library with the same name. First of all I would expect people to know that they are programming starklark! That's the plugin name! Second, you will always find a python library you forgot. Imaging we add a starlark library called "foo" and at this time there is no such library in python. Soon after release someone decides to write a python library "foo" and now you have a wierd mix where you have to remember which libs to prefix and which not.

Just my 2 cent...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand starlark is not exactly python, but it's closely based on python and feels familiar to people who have used python. It's definitely a python derivative language and it's appealing to us and to our users almost solely because it's similar to python, not because it's different.

Most if not all of our starlark users have python experience. As soon as someone starts writing a telegraf starlark script they are confronted with the differences. Rather than hide the differences and make our users discover them on their own, one after another (which is a frustrating user experience) I think we should be clear about the things that are different.

I'm not recommending we use the Batman style of naming things (batmobile, bat signal, batarang, bat shark repellent, etc.). That would be absurd. I only suggested "starlarkjson" because that's what its author named it. I don't personally like how long the name is or that it refers to the language.

I think our approach to starlark in telegraf in general should be 1) do what the starlark project does, then 2) be compatible with python or 3) if we add something not compatible with python, avoid looking like python (this means module names, especially those in the python standard library).

After a closer look at the starlark project I found that they use "json.star" as the extension name they load it in code as module "json". It also returns starlarkjson.Module, not starlarkjson.Module.Members. This leaves the functions in the module namespace so in code it would be json.decode() instead of decode() like we have in this PR.

https://github.com/google/starlark-go/blob/6e684ef5eeee284749e13eae206cb4dd899b3472/starlark/eval_test.go#L193
https://github.com/google/starlark-go/blob/6e684ef5eeee284749e13eae206cb4dd899b3472/starlark/testdata/json.star#L5

This approach addresses my concern of confusion with the python json module and seems to be what the starlark project intended. What do you two think about doing it this way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I join the discussion, if you don't mind. I agree with @reimda that it is worth doing what starlark is aiming for.
This will make life easier for users who will clearly see what they are working with.

If you do not point out the differences, many may have issues. Of course, there will always be those who read the processor source code and see which library is being used, but most will be confused.

Or it is worth explicitly highlighting in the README that a specific library is used that differs from python - with a link to its documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest that's exactly my point. All starlark code I found uses json not starlarkjson, so people will be confused if they can just apply the examples they g**gled or if they have to do something different!
But anyway, I'm happy to see the module support... :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second paragraph in the readme spells it out.

The Starlark language is a dialect of Python, and will be familiar to those who have experience with the Python language. However, there are major differences.

If you haven't bothered to read the Starlark plugin README, or follow one of the examples that already shows you how to use it, I don't think naming is going to save you. I prefer the shorter naming, especially if it's in common use as srebhan pointed out. Given I expect 99% of people to copy and modify the examples, I doubt this matters too much.

Here's what I've switched to:

load("starlarkjson", "json")
json.decode("{}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Could we switch "starlarkjson" to "json.star"? That would make it just like the examples in the starlark repo. Sorry to recommend "starlarkjson" initially.

@ssoroka ssoroka requested a review from reimda September 29, 2020 17:51
@ssoroka ssoroka merged commit a475d76 into master Sep 29, 2020
@ssoroka ssoroka deleted the starlark-json branch September 29, 2020 21:37
arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants