Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update json_v2 parser to allow excluding/including specific array elements #9381

Closed
sspaink opened this issue Jun 16, 2021 · 15 comments · Fixed by #9449
Closed

Update json_v2 parser to allow excluding/including specific array elements #9381

sspaink opened this issue Jun 16, 2021 · 15 comments · Fixed by #9449
Assignees
Labels
area/json json and json_v2 parser/serialiser related bug unexpected problem or unintended behavior docs Issues related to Telegraf documentation and configuration descriptions

Comments

@sspaink
Copy link
Contributor

sspaink commented Jun 16, 2021

The json_v2 parser doesn't allow you to exclude/include a element inside an array. The parser should be updated to support index numbers in the JSON key's so that users can select individual array items. At the moment only an entire array can be excluded/included.

e.g.

The JSON key: etd_0_estimate_0_minutes

Should select: 24

From the JSON:

{
    "root": {
        "station": [
            {
                "etd": [
                    {
                        "estimate": [
                            {
                                "minutes": "24",
                                "platform": "2",
                                "direction": "North",
                                "length": "10",
                                "color": "YELLOW",
                                "hexcolor": "#ffff33",
                                "bikeflag": "1",
                                "delay": "0"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
@sspaink sspaink added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin area/json_v2 labels Jun 16, 2021
@barbaranelson
Copy link

Without being able to select an element within the array, you can only get the last element of the array, because the array elements all have the same time stamp and the same tags, and the same field value name, so InfluxDB overwrites the series each time it gets a new entry with the same time stamp, tags and field name.

@sspaink sspaink added bug unexpected problem or unintended behavior and removed feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Jun 16, 2021
@barbaranelson
Copy link

barbaranelson commented Jun 16, 2021

Actually, I was able to make this work. Here's the telegraf configuration I ended up with, and it only selects the first entry in the array for each station:

[[inputs.http]]
  urls = [
    "http://api.bart.gov/api/etd.aspx?cmd=etd&orig=COLM&key=MW9S-E7SL-26DU-VV8V&dir=n&json=y",
    "http://api.bart.gov/api/etd.aspx?cmd=etd&orig=POWL&key=MW9S-E7SL-26DU-VV8V&dir=s&json=y"
  ]

  data_format = "json_v2"
  tagexclude = ["host", "url"]
  [[inputs.http.json_v2]]
    [[inputs.http.json_v2.field]]
      path = "root.station.#.etd.0.estimate.0.minutes"
    [[inputs.http.json_v2.tag]]
      path = "root.station.#.abbr"
      rename = "from_station"
    [[inputs.http.json_v2.tag]]
      path = "root.station.#.etd.#.abbreviation"
      rename = "to_station"
    [[inputs.http.json_v2.tag]]
      path = "root.station.#.etd.0.estimate.0.direction"

@barbaranelson
Copy link

Actually on further testing this doesn't completely work. It works in that you can select the first entry in the array, but when you independently select root.station.#.etd.0.estimate.0.minutes for the minutes, and root.station.#.abbr, you get inconsistent data. I am assuming this is because the # is expanded independently for these two paths, so you get every combination of station name and minutes to depart, rather than getting the station name corresponding to the number of minutes to depart.

@akrantz01 akrantz01 self-assigned this Jun 21, 2021
@sjwang90 sjwang90 added the docs Issues related to Telegraf documentation and configuration descriptions label Jun 22, 2021
@ghost
Copy link

ghost commented Jun 29, 2021

Hey, I'm having the same problem.
Im trying to parse a JSON array: data = [8 2 9 10 11]

Telegraf conf file:

[[inputs.mqtt_consumer.json_v2.field]]
  path = "data"
 rename = "data"
  type = "int"

According to GJSON i try it with path = "data". The only number i get is the last one in the array.

It's possible to get the length of the array with path = "data.#" or single elements with path = "data.0" or path = "data.1" for example but not the whole array. Thanks for the work!

@sspaink
Copy link
Contributor Author

sspaink commented Jun 29, 2021

@danberg13 Thank you for trying out the new parser, if you want to gather all elements in the data array then the path "data" should be correct and not return only the last one in the array.

Here is an example from something I ran locally:

Input json:

{
    "data": [
        1,
        2,
        3
    ]
}

Telegraf config:

[[inputs.file]]
    files = ["./testdata/fields_and_tags/input.json"]
    data_format = "json_v2"
    [[inputs.file.json_v2]]
        [[inputs.file.json_v2.field]]
            path = "data"

Expected output:

file test=1 
file test=2
file test=3

Hopefully I understood your problem correctly? I also tried it in the gjson playground https://gjson.dev/ which you can use to try out the path syntax.

@ghost
Copy link

ghost commented Jul 1, 2021

Thank you @sspaink for testing my issue.

Input:

{ 
   "cnt" : 23,
   "data": [ 
     3,
     7,
     10,
     23
], 
"format": 0
}

Telegraf config:

[[inputs.mqtt_consumer]]
data_format = "json_v2"
    [[inputs.mqtt_consumer.json_v2]]
        [[inputs.mqtt_consumer.json_v2.object]]
            path = "@this"
            disable_prepend_keys = true
        [inputs.mqtt_consumer.json_v2.object.fields]
         cnt  = "int"
         data = "int"
         format  = "int"

Im testing the new JSON Parser in combination with InfluxDB and Grafana. How do you test it locally? Maybe there is the difference or there is another Problem with the configuration because i still get just the last number of the datainput. Any Ideas?

The GJSON Path seems to be correct.

Thanks for the help!

@sspaink
Copy link
Contributor Author

sspaink commented Jul 2, 2021

@barbaranelson I believe I've implemented a change that addresses the issue your facing. If you'd like you can find the Telegraf artifacts with the changes in the link posted by the tiger bot here. I've updated the README.md in the change and added a unit test that uses bart data and you can find the config,expected output, and input json as reference here. With this change you can now define multiple field and tag tables within a object table, what this achieves is the ability to gather multiple field/tag's using the GJSON path syntax which will adhere to the JSON structure. The field and tag tables you define outside of the object table are still there and can be used to apply global field/tag's to the line protocol. Hopefully that makes sense! Just let me know if you have any trouble or if you'd like another explanation. This is experimental and depends on another pull request I made to the open source GJSON library, but I think it will help make the json_v2 parser even better.

@sspaink
Copy link
Contributor Author

sspaink commented Jul 2, 2021

@danberg13 Thank you for providing your config, I think the reason your only seeing the last number is because of how influxdata handles duplicate points. In the same measurement, it uses the tag and timestamp to distinguish between two separate line protocols and in your example there is no tag therefore it will just use the last one. I think to avoid this you could use the values in the data array as the tag, then each line protocol will be unique. I am testing out this parser locally by using the unit tests associated with the parser, with them I can add a config using the file input plugin,input json, and expected line protocol output. While working on this issue I added your config and json to a experimental change I am working on, and you can find the files here. If you'd like to try it out you can find the artifacts with the changes posted by the Telegraf tiger bot here. But I don't think you need to use this change to get it working with the json_v2 parser available in 1.19.0. If you cange your config to look like this, it should start working:

[[inputs.mqtt_consumer]]
data_format = "json_v2"
    [[inputs.mqtt_consumer.json_v2]]
        [[inputs.mqtt_consumer.json_v2.object]]
            path = "@this"
            disable_prepend_keys = true
            tags = ["data"]
        [inputs.mqtt_consumer.json_v2.object.fields]
            cnt  = "int"
            format  = "int"

@barbaranelson
Copy link

barbaranelson commented Jul 2, 2021

I tried the new build, and unfortunately it crashed. Here's the stack trace:

2021-07-02T23:35:29Z I! Starting Telegraf 
2021-07-02T23:35:29Z I! Loaded inputs: http
2021-07-02T23:35:29Z I! Loaded aggregators: 
2021-07-02T23:35:29Z I! Loaded processors: starlark
2021-07-02T23:35:29Z I! Loaded outputs: file influxdb_v2
2021-07-02T23:35:29Z I! Tags enabled: host=ip-192-168-1-68.ec2.internal
2021-07-02T23:35:29Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ip-192-168-1-68.ec2.internal", Flush Interval:10s
2021-07-02T23:35:29Z D! [agent] Initializing plugins
2021-07-02T23:35:29Z D! [agent] Connecting outputs
2021-07-02T23:35:29Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2021-07-02T23:35:29Z D! [agent] Successfully connected to outputs.influxdb_v2
2021-07-02T23:35:29Z D! [agent] Attempting connection to [outputs.file]
2021-07-02T23:35:29Z D! [agent] Successfully connected to outputs.file
2021-07-02T23:35:29Z D! [agent] Starting service inputs
2021-07-02T23:35:39Z D! [outputs.file] Wrote batch of 2 metrics in 3.579061ms
2021-07-02T23:35:39Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2021-07-02T23:35:40Z D! [outputs.influxdb_v2] Wrote batch of 2 metrics in 1.095654397s
2021-07-02T23:35:40Z D! [outputs.influxdb_v2] Buffer fullness: 2 / 10000 metrics
2021-07-02T23:35:49Z D! [outputs.file] Wrote batch of 2 metrics in 127.328µs
2021-07-02T23:35:49Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2021-07-02T23:35:49Z D! [outputs.influxdb_v2] Wrote batch of 2 metrics in 173.396479ms
2021-07-02T23:35:49Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
panic: runtime error: slice bounds out of range [:1561] with capacity 1280

goroutine 121 [running]:
github.com/influxdata/telegraf/plugins/parsers/json_v2.(*Parser).checkIfIncludedCollection(0xc0000f2f20, 0xd3, 0xc000e11200, 0x59f, 0x58)
	/go/src/github.com/influxdata/telegraf/plugins/parsers/json_v2/parser.go:410 +0x230
github.com/influxdata/telegraf/plugins/parsers/json_v2.(*Parser).expandArray(0xc0000f2f20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/influxdata/telegraf/plugins/parsers/json_v2/parser.go:268 +0x57e
github.com/influxdata/telegraf/plugins/parsers/json_v2.(*Parser).processObjects(0xc0000f2f20, 0xc00029d930, 0x1, 0x1, 0x601, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/influxdata/telegraf/plugins/parsers/json_v2/parser.go:464 +0x698
github.com/influxdata/telegraf/plugins/parsers/json_v2.(*Parser).Parse(0xc0000f2f20, 0xc0010f9100, 0x681, 0x700, 0x700, 0x0, 0x0, 0xc001248480, 0xc000a4c500)
	/go/src/github.com/influxdata/telegraf/plugins/parsers/json_v2/parser.go:128 +0x4da
github.com/influxdata/telegraf/plugins/inputs/http.(*HTTP).gatherURL(0xc000503a00, 0x6035f58, 0xc000b0c1c0, 0xc0009edd41, 0x57, 0x0, 0x0)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/http/http.go:223 +0x595
github.com/influxdata/telegraf/plugins/inputs/http.(*HTTP).Gather.func1(0xc0005b0170, 0xc000503a00, 0x6035f58, 0xc000b0c1c0, 0xc0009edd41, 0x57)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/http/http.go:134 +0x9f
created by github.com/influxdata/telegraf/plugins/inputs/http.(*HTTP).Gather
	/go/src/github.com/influxdata/telegraf/plugins/inputs/http/http.go:132 +0xf7

@sspaink
Copy link
Contributor Author

sspaink commented Jul 3, 2021

Oh no! Thank you for trying it out, I think I know why it happened the function checkIfIncludedCollection should have been deleted. Can you share the config you used? Then I can add it to the tests to make sure it doesn't happen again.

@barbaranelson
Copy link

Here is my config.
telegraf.conf.txt

@sspaink
Copy link
Contributor Author

sspaink commented Jul 3, 2021

Latest artifacts posted by the telegraf tiger no longer throws a panic, but instead produces this line protocol with the provided config:

http,etd_estimate_direction=North,from_station=COLM,to_station=ANTC minutes="6" 1625276770000000000
http,etd_estimate_direction=South,from_station=POWL,to_station=DALY minutes="16" 1625276770000000000

Link to added test

@ghost
Copy link

ghost commented Jul 4, 2021

@sspaink Thank you for the explanation and the new config!
I tried it out and it works, the data comes through.

The tag solution works but is there a possibility to get the values of the data array as field values, too? I think it would be nice to have the arrays in the JSON as field values. I think the old JSON Parser did it automatically. Thanks!

@sspaink
Copy link
Contributor Author

sspaink commented Jul 7, 2021

@danberg13 I am not sure how you could get data array as field values without updating the JSON to include more information so you can use different tags. I don't think the old JSON parser can do it either, but I might be overlooking something. Do you have an example of using the old JSON parser that does what you want? Maybe I can use that to add the same functionality to json_v2. Because the problem we are discussing doesn't pertain to problem in this issue, would you mind creating a new issue describing your problem and tagging me in it? Then we can continue the discussion there. Thanks!

@ghost
Copy link

ghost commented Jul 8, 2021

I created the new issue @sspaink. Thanks!

@Hipska Hipska added the area/json json and json_v2 parser/serialiser related label Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/json json and json_v2 parser/serialiser related bug unexpected problem or unintended behavior docs Issues related to Telegraf documentation and configuration descriptions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants