-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parsers/csv): Add metadata support to CSV parser plugin #10083
feat(parsers/csv): Add metadata support to CSV parser plugin #10083
Conversation
ed79ce1
to
2feefd9
Compare
Hi @sspaink , I have another Linter issue. It is erroring things that I have not changed in the READEME.md file. Would you be able to have a look, please? I know how to fix the issue. But I am not sure if I should. Thanks, |
Hi, Can someone also tell me why the tiger not has given my job the bugfix label and not the feature label? Is there a place where I could find out what other labels there are and how to add them? Cheers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etycomputer thanks for working on this new feature, noticed some minor typos in the README.md
so I've got some requested changed. To answer your questions:
-
The bot sometimes get the labels wrong, its a todo item to get it more accurate by using the conventional commit message. I've fixed the labels manually in the meantime :)
-
The markdown linter unfortunately doesn't just check your changes but the whole file, I decided to fix the linter errors in a separate pull request: fix(parser/csv): resolve linter issues #10093
bede9b0
to
d937d84
Compare
Hi @sspaink, The only part that I don't like is where I am using the different Readers. Basically, the part that gets messy is when I am trying to read lines to skip or parse metadata. in these cases, I don't want to use the CSV reader and just want to use a basic reader. I look forward to hearing from you, Kind regards, |
d937d84
to
c284d9b
Compare
🥳 This pull request decreases the Telegraf binary size by -0.01 % for linux amd64 (new size: 131.5 MB, nightly size 131.5 MB) 📦 Looks like new artifacts were built from this PR. Expand this list to get them **here**! 🐯Artifact URLs |
d493863
to
3442aac
Compare
3442aac
to
75544ec
Compare
Hi @sspaink, When you have a few minutes could you have a look at this PR. I look forward to receiving your feedback. Kind regards, |
5b9dd16
to
65b960f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.
Hey @Hipska, thanks for your rapid feedback. I really appreciate it. Yes, I agree with you too. Metadata is commonly going to be saved a tags. But just in case this is not the case I have allowed the option for it to be a field by default and adding them as tags is possible by listing the tag keys. The other reason behind considering the default for metadata to be fields is, we usually exactly know keys that are going to be tags but dynamic fields are harder to predict. Finally, the current CSV parse considers all columns to be fields unless they are flagged as tags, I am following the same pattern. Kind regards, |
93b1e69
to
3dec379
Compare
Hi @srebhan, I had a unit test that cover the Metadata example you mentioned but I have also updated the same unit test to also cover the empty row for skipped rows. Please look at the unit test |
p = &Parser{ | ||
HeaderRowCount: 1, | ||
SkipRows: 2, | ||
MetadataRows: 4, | ||
Comment: "#", | ||
TagColumns: []string{"type", "version"}, | ||
MetadataSeparators: []string{":", "="}, | ||
MetadataTrimSet: " #", | ||
} | ||
err = p.Init() | ||
require.NoError(t, err) | ||
testCSVRows := []string{ | ||
"garbage nonsense that needs be skipped", | ||
"", | ||
"# version= 1.0\r\n", | ||
"", | ||
" invalid meta data that can be ignored.\r\n", | ||
"file created: 2021-10-08T12:34:18+10:00", | ||
"timestamp,type,name,status\n", | ||
"2020-11-23T08:19:27+10:00,Reader,R002,1\r\n", | ||
"#2020-11-04T13:23:04+10:00,Reader,R031,0\n", | ||
"2020-11-04T13:29:47+10:00,Coordinator,C001,0", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why defining the same twice? Couldn't p.Init()
be run a second time to reset internal counters? Also testCSVrows and the resulting tests could be reused imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Init()
will not reset the counters because we are using the variable set by the config (i.e. the MetadataRows
field) directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but still defining the test csv rows and the resulting tests twice seems unnecessary and error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etycomputer could you change to define the test csv only once please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not changed or given any comment on why not..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srebhan I discussed with @etycomputer about this on Slack. Do you agree we should only test if the processor can handle csv's with metadata in this test only and maybe add another test that checks the compatibility of different line endings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way.
3dec379
to
7ea2cfe
Compare
@etycomputer I now see my fault. Even an empty line contains the |
@etycomputer I'm hoping to fix your CI error with #10497. After this (and a rebase from your side) I guess we are good to go. |
No problem at all. It's your job to ask and mine to make sure everything is answered. |
✨ Added support to parse metadata for CSV parser plugin 💡 Adding comment ✅ Updated the unit tests 📝 Updated the README.md
7ea2cfe
to
ac807a2
Compare
The reason for this change is that there is a \n at the start of the simpleCSVWithHeader
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
I have rebased my branch and fixed all CI issues. It is ready for you to do the final review. I did have to fix a test bug introduced on 04/02/2022 by Rabhan.
The issue is that there is an empty line at the start and one comment followed by a header and he has only two SkipRows and no header rows. I have fixed the test by changing the SkipRows to 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for working on this @etycomputer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, I have two questions for you if you could help answer them.
Thanks, everyone; it has been a long journey. |
@etycomputer and I appreciate your commitment and hard work to get this over the finishing line! |
Hey @etycomputer thanks for implementing this feat! In the merged version you put all metadata in tags, but the comment and the Readme still point out that metadata will be fields by default. |
The readme should definitely be updated to reflect the actual scenario! Please create a new issue for that. (Or fix it in your PR #10742) This was my opinion back when we started reviewing this PR, it has not been changed since then:
|
Required for all PRs:
resolves #10079
Added support to parse metadata for the CSV parser plugin.