-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scan older files to be harvested based on mod time #4374
scan older files to be harvested based on mod time #4374
Conversation
Can one of the admins verify this patch? |
Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run. |
@waynemz Thanks for sharing your implementation. This is definitively something we should discuss. Some thoughts:
|
@ruflin I agree the feature should be off by default given the additional performance impact. Instead of a boolean the config option could be a string with values off/ascending/descending. @karmi I added my other email to my github account. |
I like the suggestion of I have in the back of my head that if we add this file, pretty soon people will ask for it to be sorted by filename instead of date. Could be perhaps take this already into account now when we design the config. Or is it overkill? |
@ruflin Thats a great suggestion. I'll incorporate the filename as well. That will bring the possible values for scan_order to five: off(default), oldest_file_first, newest_file_first, ascending_by_name, descending_by_name |
I'll correct the go imports style issue but I could not figure out what went wrong with the test: BTW I'm using Gogland Early Access as my editor |
For the config options I was more thinking of
This kind of implies we should switch to |
For the failing check: Run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this forward. I left a few comments.
Could you also please add a changelog entry? We also will have to add docs for this feature. Could you also add a very short doc entry to it? The config docs are in https://github.com/elastic/beats/blob/master/filebeat/docs/reference/configuration/filebeat-options.asciidoc
@@ -7,6 +7,9 @@ import ( | |||
"path/filepath" | |||
"time" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove newline?
} | ||
|
||
func getSortedFiles(scanOrder string, scanSort string, sortInfos []FileSortInfo) []FileSortInfo { | ||
switch scanOrder + "_" + scanSort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you perhaps to that in two switch / case? So first check the scanSort and then the order inside each? It feels a bit strange to create a string for comparison. Perhaps there is an even "shorter" option.
You could move the sort.Slice
part to after switch
and only define the function which will be used inside the switch cases. This will make the cod cleaner I think.
// Scan starts a scanGlob for each provided path/glob | ||
func (p *Prospector) scan() { | ||
|
||
for path, info := range p.getFiles() { | ||
paths := p.getFiles() | ||
files := getSortInfos(paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not happen in every call, only if sorting is enabled as this could have a performance impact. I would expect something like:
if config.sort.Enabled() { paths := paths.Sort()
This is heavily simplified but you get what I'm trying to explain? Do we even need to expose the fileinfo outside of the sort?
@ruflin I made the suggested changes. Can you resolve the conflict with changelog? |
I fixed the goimports style but there is some build failure with the full yml file. Not sure how to resolve that. |
@waynemz You can fix that by running |
jenkins, test it |
sortFunc = func(i, j int) bool { | ||
return sortInfos[i].info.ModTime().After(sortInfos[j].info.ModTime()) | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to return an explicit error on these default branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsg what do you want to happen if the default case is reached? we can abort with an error message (panic), or log an error and use the default unsorted behavior.
paths := p.getFiles() | ||
if strings.ToLower(p.config.ScanSort) != "none" { | ||
sortInfos = getSortedFiles(strings.ToLower(p.config.ScanOrder), | ||
strings.ToLower(p.config.ScanSort), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of calling ToLower
here, as this creates some leniency in the config parsing which we don't have in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waynemz Did you see this comment?
var path string | ||
var info os.FileInfo | ||
|
||
if strings.ToLower(p.config.ScanSort) != "none" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing ToLower
here should also mean less work on the default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left 2 minor comments. Also check comments from @tsg
|
||
Specifies if files should be harvested in order and how to determine the order. Possible values are modtime, filename and none. To sort by file modification time use modtime otherwise use filename. | ||
|
||
If you specify a value other than none for this setting you can determine whether to use ascending or descending order using `scan.order` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add .
dots at the end of the sentences? Also check below.
CHANGELOG.asciidoc
Outdated
@@ -132,6 +132,8 @@ https://github.com/elastic/beats/compare/v6.0.0-alpha1...v6.0.0-alpha2[View comm | |||
- Add `logging.files` `permissions` option. {pull}4295[4295] | |||
|
|||
*Filebeat* | |||
- Added ability to sort harvested files. {pull}4374[4374] | |||
- Add experimental Redis slow log prospector type. {pull}4180[4180] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line probably sneaked in during rebasing?
@waynemz I really appreciate all the work you put into this. Thanks a lot for adding this feature. |
Someone please take a look, there are no conflicts as of now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I left a few additional comments. It's mainly about the error management we should improve.
return sortInfos | ||
} | ||
|
||
func getSortedFiles(scanOrder string, scanSort string, sortInfos []FileSortInfo) ([]FileSortInfo, string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return argument should be error
and not string
. This is quite standard in Golang. Like this you can check
if err != nil {
...
}
} | ||
|
||
func getSortedFiles(scanOrder string, scanSort string, sortInfos []FileSortInfo) ([]FileSortInfo, string) { | ||
var sortFunc func(i, j int) bool = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to set it to nil here
return sortInfos[i].info.ModTime().After(sortInfos[j].info.ModTime()) | ||
} | ||
default: | ||
return nil, "Unexpected value for scan.order: " + scanOrder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use fmt.Errorf("Unexpected value for scan.order: %v", scanOrder)
here
return strings.Compare(sortInfos[i].info.Name(), sortInfos[j].info.Name()) > 0 | ||
} | ||
default: | ||
return nil, "Unexpected value for scan.order: " + scanOrder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
return nil, "Unexpected value for scan.order: " + scanOrder | ||
} | ||
default: | ||
return nil, "Unexpected value for scan.sort: " + scanSort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
sort.Slice(sortInfos, sortFunc) | ||
} | ||
|
||
return sortInfos, "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return sortInfos, nil
var absolutePath string | ||
absolutePath, err = filepath.Abs(path) | ||
if err != nil { | ||
logp.Err("could not fetch abs path for file %s: %s", absolutePath, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return here on error? Means this also can return an error?
|
||
paths := p.getFiles() | ||
|
||
var err string = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var err error
if p.config.ScanSort != "none" { | ||
sortInfos, err = getSortedFiles(strings.ToLower(p.config.ScanOrder), | ||
strings.ToLower(p.config.ScanSort), | ||
getSortInfos(paths)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle the error directly inside the if clause. In case of error
should we return? what are we doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we are going to print an error saying why sorting failed and continue the regular unsorted route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
// Scan starts a scanGlob for each provided path/glob | ||
func (p *Prospector) scan() { | ||
|
||
for path, info := range p.getFiles() { | ||
var sortInfos []FileSortInfo = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to set it to nil, also follow up below
@waynemz Thanks for your work on this. LGTM. Two things that came to my mind which can be done a follow up PR:
@waynemz Do you want to add this to this PR or better as a follow up? |
@waynemz Merged. Thanks a lot for the work on this one. I will open a follow up PR with the above suggestions. |
Thanks @ruflin |
Follow up PR from elastic#4374.
config option to scan older files first.
Have a look at issue: https://discuss.elastic.co/t/filebeat-harvester-picks-up-files-in-random-order-when-scanning/86542/4