-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: Prevent store from panic after range object storage #3322
Conversation
Signed-off-by: MegaEle <ajmy789456123@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
When the store uses the object store as the s3 interface, there is a probability that the loadChunks or loadSeries method panic will appear in the query. It is found that the s3 interface does not return complete data, and the store should return an error instead of panic
I would be interested into hearing about the issue. The protection you propose here is ok (unless will turn out to add too much complexity) but I'm not sure I understand the root cause of this issue, which is more interesting to me. Have you opened an issue with details about it?
@@ -1908,7 +1909,11 @@ func (r *bucketIndexReader) loadSeries(ctx context.Context, ids []uint64, refetc | |||
r.mtx.Unlock() | |||
|
|||
for i, id := range ids { | |||
c := b[id-start:] | |||
seriesOffset := id - start | |||
if int(seriesOffset) > indexLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just checks the offset, but the following code could overread anyway. Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I found the following issue, we encountered the same issue.
#3057
When I debugged, I found that when using readIndexRange and readChunkRange, the length of the data is not necessarily equal to or longer than the required length of the data, but when we use it, we will set the offset according to the length, so there will be situations .
I use the s3 interface and the underlying ceph storage. I am still looking for ways to solve the root cause of the incorrect data length, but there is no progress. In addition, I hope the store will not panic in this case
I tried to modify the readChunkRange and readIndexRange methods, but I found that other upstream methods do not require the returned data length. So I only modified the place where panic occurs : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MegaEle do you use radosgw, perchance? I have seen the same thing a few times but never got to it. It's probably some kind of bug there in the API in that it doesn't follow the spec. Have you ever found a way to consistently reproduce it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the issue I just encountered is also relevant: #3339
I use Radosgw/ceph to confirm what @GiedriusS says
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I think #3621 is a more reasonable defensive coding measure than this one because that other change adds a checker that is closer to the actual problem. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This has been fixed by #3795, thank you for your contribution! |
Signed-off-by: MegaEle ajmy789456123@gmail.com
Changes
When the store uses the object store as the s3 interface, there is a probability that the loadChunks or loadSeries method panic will appear in the query. It is found that the s3 interface does not return complete data, and the store should return an error instead of panic
Verification
I compared it after build the modified code. Under the same circumstances, the store will not panic, and the query returns the following error like:
Mint: 1601021768826 Maxt: 1602734400000: rpc error: code = Aborted desc = fetch series for block 01EMMWANSPNASD41ER54W9B2WQ: preload chunks: read range for 0: can't read chunk for required length, required 2723780, actually 48762