-
Notifications
You must be signed in to change notification settings - Fork 42
Conversation
Please also add a new test that verifies the bug fix. The test presumably should fail with the implementation before the fix and should pass after the fix. |
In an effort to gain confidence in fingerprint comparison logic, I added another test here in #145. I have yet to see a case where I strongly suspect that the issue is not directly with fingerprint comparison. Tomorrow I will get my head further into this problem. |
Can you clarify exactly which test is failing?
The printed bytes appear to be encoded, possibly base64. If this is the case, the string representation may be misleading, as the difference (padding) may have been introduced in the encoding process. In any case, it would be helpful to have the exact code used to create this issue. I agree with Tigran that we really need to have a specific test that demonstrates the problem. |
I added more logs whenever it tails a new file. |
These are some pretty significant code changes, and frankly I can't recommend specific improvements because I can't tell whether the original problem was solved or not. If I understand correctly, you believe you have identified the root cause of the original issue. If so, I believe a reasonable path forward would be:
This approach will allow us to establish whether the changes are solving the problem. |
Thank you for suggesting a great process. it makes sense. I have done exactly that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is really complex, and there are a lot of changes that have implications I still do not fully understand. I do not see the cause of test failures. However, I left some feedback which may help the design.
} | ||
f.tailingReaders[path] = []*Reader{reader} | ||
} else { // If path exists | ||
f.Infow("Log rotation detected. Started watching file", "path", path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this necessarily mean the file has rotated? Wouldn't this happen any time we have seen the same file path before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in every poll
cycle, we will be checking the same file we have been tailing. only if the fingerprint is different and the path is what we have been tailing, then it means log rotation.
Did i address your concern well here?
@@ -151,10 +151,11 @@ func (c InputConfig) Build(context operator.BuildContext) ([]operator.Operator, | |||
firstCheck: true, | |||
cancel: func() {}, | |||
knownFiles: make([]*Reader, 0, 10), | |||
tailingReaders: map[string][]*Reader{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the slice of Readers represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is all previously rotations of the file, then do we really need to keep them? When we detect a rotation, can we just read the rest of the old file and then close it? Do we expect that rotated files are still written to? (My understanding is that the issue we have is that a file is written to immediately before being rotated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. once file is rotated, it won't be written to. I should add a workflow to stop tailing them yea?
|
||
count := min0(f.MaxConcurrentFiles, readerCount) | ||
count = min0(count, len(f.readerQueue)) | ||
polledReaders := f.readerQueue[:count] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't think max_concurrent_files
is really being respected with this design.
The original problem that this setting was introduced for, is when there are so many matching files that we run out of file descriptors. By leaving all the files open, we are not limiting the number of file descriptors.
I'm not certain that it's possible to accomplish both objectives though (limiting file descriptors and tracking rotations), so perhaps we would need to set a reasonably high default for max_concurrent_files
and then only keep that many open. (This would mean that if you are matching too many files, you may lose a small percent of logs upon rotation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously not an ideal situation, but I think users must be able to allocate a fixed number of file descriptors to the receiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. so set it to reasonably high like like 5,000? we definitely need to have a workflow to close out already rotated files to be support this. but even then, in situation where there are 6000 actively written files and maximum is 5000, then it will only tail 5000 files and not tail the rest 1000 files at all.
closing. it was fixed with #168 |
fixes #85
Existing unit tests, checks for no missing log, no duplicates, check point recovery, and etc.
But everything is captured in the
Include
parameter.But with kubernetes, we tail symlink files that has other important k8s metadata in the file name itself. When the underlying file is rotated, they fall outside of
Include
match. So in essence, it is likeIt results in much data loss from k8s environment.
When file is opened with
os.Open()
, even when it is moved away, we can keep track of the file and continue to tail what was appended to it. It is only possible when we don't close it. So main thing I did is to keep them opened through polling cycle. Before this change, it closes them at the end of polling and re-open them at the beginning of cycle.I want to point this out. There are 2 methods for file rotation. "Move/Create" and "Chunk/Truncate". It is only possible with "Move/Create" method.