Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parsers/csv): Add metadata support to CSV parser plugin #10083

Merged

Conversation

etycomputer
Copy link
Contributor

@etycomputer etycomputer commented Nov 9, 2021

Required for all PRs:

resolves #10079

Added support to parse metadata for the CSV parser plugin.

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Nov 9, 2021
@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from ed79ce1 to 2feefd9 Compare November 10, 2021 03:47
@etycomputer
Copy link
Contributor Author

Hi @sspaink ,

I have another Linter issue. It is erroring things that I have not changed in the READEME.md file.

Would you be able to have a look, please?

I know how to fix the issue. But I am not sure if I should.

Thanks,
Ehsan

@etycomputer
Copy link
Contributor Author

Hi,

Can someone also tell me why the tiger not has given my job the bugfix label and not the feature label? Is there a place where I could find out what other labels there are and how to add them?

Cheers,
Ehsan

@sspaink sspaink added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin and removed fix pr to fix corresponding bug labels Nov 10, 2021
Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etycomputer thanks for working on this new feature, noticed some minor typos in the README.md so I've got some requested changed. To answer your questions:

  • The bot sometimes get the labels wrong, its a todo item to get it more accurate by using the conventional commit message. I've fixed the labels manually in the meantime :)

  • The markdown linter unfortunately doesn't just check your changes but the whole file, I decided to fix the linter errors in a separate pull request: fix(parser/csv): resolve linter issues #10093

@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch 4 times, most recently from bede9b0 to d937d84 Compare November 11, 2021 04:13
@etycomputer etycomputer requested a review from sspaink November 11, 2021 04:19
@etycomputer
Copy link
Contributor Author

Hi @sspaink,
Thanks for fixing the markdown linter issue on #10093 and your feedback regarding my new feature.
I have finished working on this job. I have tried to add all the scenarios that I can think of in the unit tests.

The only part that I don't like is where I am using the different Readers.
I have started learning to program in Go not more than three weeks, so if there is a more efficient way to reach the same goal please let me know.

Basically, the part that gets messy is when I am trying to read lines to skip or parse metadata. in these cases, I don't want to use the CSV reader and just want to use a basic reader.

I look forward to hearing from you,

Kind regards,
Ehsan

@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from d937d84 to c284d9b Compare November 11, 2021 04:34
@telegraf-tiger
Copy link
Contributor

🥳 This pull request decreases the Telegraf binary size by -0.01 % for linux amd64 (new size: 131.5 MB, nightly size 131.5 MB)

📦 Looks like new artifacts were built from this PR.

Expand this list to get them **here**! 🐯

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm freebsd_amd64.tar.gz windows_i386.zip
armel.deb armv6hl.rpm freebsd_armv7.tar.gz
armhf.deb i386.rpm freebsd_i386.tar.gz
i386.deb ppc64le.rpm linux_amd64.tar.gz
mips.deb s390x.rpm linux_arm64.tar.gz
mipsel.deb x86_64.rpm linux_armel.tar.gz
ppc64el.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_s390x.tar.gz
static_linux_amd64.tar.gz

@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch 3 times, most recently from d493863 to 3442aac Compare November 16, 2021 22:55
@etycomputer etycomputer changed the title feat: Add metadata support to CSV parser plugin feat(parsers/csv): Add metadata support to CSV parser plugin Nov 16, 2021
@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 3442aac to 75544ec Compare November 17, 2021 00:53
@etycomputer
Copy link
Contributor Author

etycomputer commented Nov 17, 2021

Hi @sspaink,

When you have a few minutes could you have a look at this PR.
I have fully tested it with the CSV files that I needed to parse and it is now working very well.

I look forward to receiving your feedback.

Kind regards,
Ehsan

@Hipska Hipska requested a review from srebhan November 18, 2021 09:38
@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch 2 times, most recently from 5b9dd16 to 65b960f Compare November 22, 2021 02:39
Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.

@etycomputer
Copy link
Contributor Author

General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.

Hey @Hipska, thanks for your rapid feedback. I really appreciate it. Yes, I agree with you too. Metadata is commonly going to be saved a tags. But just in case this is not the case I have allowed the option for it to be a field by default and adding them as tags is possible by listing the tag keys.

The other reason behind considering the default for metadata to be fields is, we usually exactly know keys that are going to be tags but dynamic fields are harder to predict.

Finally, the current CSV parse considers all columns to be fields unless they are flagged as tags, I am following the same pattern.

Kind regards,
Ehsan

@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 93b1e69 to 3dec379 Compare January 18, 2022 01:58
@etycomputer
Copy link
Contributor Author

Hi @srebhan, I had a unit test that cover the Metadata example you mentioned but I have also updated the same unit test to also cover the empty row for skipped rows. Please look at the unit test TestParseCSVFileWithMetadata(). The current code coves these scenarios at a higher layer.

Comment on lines +968 to +990
p = &Parser{
HeaderRowCount: 1,
SkipRows: 2,
MetadataRows: 4,
Comment: "#",
TagColumns: []string{"type", "version"},
MetadataSeparators: []string{":", "="},
MetadataTrimSet: " #",
}
err = p.Init()
require.NoError(t, err)
testCSVRows := []string{
"garbage nonsense that needs be skipped",
"",
"# version= 1.0\r\n",
"",
" invalid meta data that can be ignored.\r\n",
"file created: 2021-10-08T12:34:18+10:00",
"timestamp,type,name,status\n",
"2020-11-23T08:19:27+10:00,Reader,R002,1\r\n",
"#2020-11-04T13:23:04+10:00,Reader,R031,0\n",
"2020-11-04T13:29:47+10:00,Coordinator,C001,0",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why defining the same twice? Couldn't p.Init() be run a second time to reset internal counters? Also testCSVrows and the resulting tests could be reused imho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Init() will not reset the counters because we are using the variable set by the config (i.e. the MetadataRows field) directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but still defining the test csv rows and the resulting tests twice seems unnecessary and error prone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etycomputer could you change to define the test csv only once please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not changed or given any comment on why not..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srebhan I discussed with @etycomputer about this on Slack. Do you agree we should only test if the processor can handle csv's with metadata in this test only and maybe add another test that checks the compatibility of different line endings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way.

@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 3dec379 to 7ea2cfe Compare January 20, 2022 22:53
@srebhan
Copy link
Member

srebhan commented Jan 24, 2022

@etycomputer I now see my fault. Even an empty line contains the \n which passes the if as the string is not empty. Sorry for the noise.

@srebhan
Copy link
Member

srebhan commented Jan 24, 2022

@etycomputer I'm hoping to fix your CI error with #10497. After this (and a rebase from your side) I guess we are good to go.

@etycomputer
Copy link
Contributor Author

@etycomputer I now see my fault. Even an empty line contains the \n which passes the if as the string is not empty. Sorry for the noise.

No problem at all. It's your job to ask and mine to make sure everything is answered.

✨ Added support to parse metadata for CSV parser plugin
💡 Adding comment
✅ Updated the unit tests
📝 Updated the README.md
@etycomputer etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 7ea2cfe to ac807a2 Compare February 11, 2022 00:47
The reason for this change is that there is a \n at the start of the simpleCSVWithHeader
@telegraf-tiger
Copy link
Contributor

@etycomputer
Copy link
Contributor Author

Hi @srebhan and @Hipska,

I have rebased my branch and fixed all CI issues.

It is ready for you to do the final review.

I did have to fix a test bug introduced on 04/02/2022 by Rabhan.

const simpleCSVWithHeader = `
# Simple CSV with header(s)
a,b,c
1.2,3.1415,ok
`
parser := &csv.Parser{
	MetricName:  "metricName",
	SkipRows:    2,
	ColumnNames: []string{"a", "b", "c"},
	TagColumns:  []string{"c"},
}

The issue is that there is an empty line at the start and one comment followed by a header and he has only two SkipRows and no header rows.

I have fixed the test by changing the SkipRows to 3

@Hipska Hipska added the area/csv csv parser/serialiser related label Feb 15, 2022
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for working on this @etycomputer!

@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Feb 16, 2022
Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, I have two questions for you if you could help answer them.

@etycomputer etycomputer requested review from sspaink and removed request for Hipska February 24, 2022 00:18
@sspaink sspaink merged commit 5adecc3 into influxdata:master Feb 24, 2022
@etycomputer
Copy link
Contributor Author

Thanks, everyone; it has been a long journey.
I appreciate your commitment and contributions.

@srebhan
Copy link
Member

srebhan commented Feb 24, 2022

@etycomputer and I appreciate your commitment and hard work to get this over the finishing line!

@JackSharebourg
Copy link
Contributor

Hey @etycomputer thanks for implementing this feat!

In the merged version you put all metadata in tags, but the comment and the Readme still point out that metadata will be fields by default.
IMHO I would also recommend to change this back. Metadata should be fields by default because this is also the current way for all columns to be parsed and with the tag key list they can easily be converted when needed.
This way metadata could also be used as the measurement, since the measurement is chosen only out of fields and not out of tags.
Using a processor to change metadata tags back to fields or as measurement every time this is needed is much more complicated and error prone then putting the metadata tags in the tag list. @Hipska what do you think?

@Hipska
Copy link
Contributor

Hipska commented Feb 28, 2022

The readme should definitely be updated to reflect the actual scenario! Please create a new issue for that. (Or fix it in your PR #10742)

This was my opinion back when we started reviewing this PR, it has not been changed since then:

General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csv csv parser/serialiser related feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding support to parse metadata before headers to the CSV parser
6 participants