Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks #124

Closed
xianjingfeng opened this issue Aug 4, 2022 · 13 comments
Closed

Comments

@xianjingfeng
Copy link
Member

  1. If we set spark.rss.data.replica.write=2 and spark.rss.data.replica=3,Data integrity cannot be guaranteed in any one shuffle server. right?
  2. But in method org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler#readShuffleData, it just read from one shuffle server
@frankliee
Copy link
Contributor

frankliee commented Aug 4, 2022

Which version did you use?

Do you set spark.rss.data.replica.read=2 ? It ensures the bitmap metadata of blocks to be written to 2 servers.

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

@xianjingfeng
Copy link
Member Author

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

@xianjingfeng
Copy link
Member Author

Which version did you use

internal version 0.5.0-snapshot

@frankliee
Copy link
Contributor

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched.
In current implementation, the client will only fetch “the first available” server to avoid the read cost.
But when the data in this first server is damaged, the final check will report "read inconsistent".

@xianjingfeng
Copy link
Member Author

I know, but the application will fail

@jerqi
Copy link
Contributor

jerqi commented Aug 4, 2022

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent".

I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?

@xianjingfeng
Copy link
Member Author

I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?

I am trying to do this, and i think it needs to be fixed with #108 together

@frankliee
Copy link
Contributor

I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks).
Rss has provided some skipping mechanisms for localfile and hdfs.
But I'am worry about memory data. @jerqi

@jerqi
Copy link
Contributor

jerqi commented Aug 4, 2022

I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks). Rss has provided some skipping mechanisms for localfile and hdfs. But I'am worry about memory data. @jerqi

In my opinion, memory data should also have data skip ability, and our read memory process should be optimized.

@xianjingfeng
Copy link
Member Author

Get

@frankliee
Copy link
Contributor

This will change server's memory storage to add "index" like hdfs

@jerqi
Copy link
Contributor

jerqi commented Aug 4, 2022

This will change server's memory storage to add "index" like hdfs

This problem will should discuss in another issue, we also should have a simple design doc.

jerqi pushed a commit that referenced this issue Nov 28, 2022
### What changes were proposed in this pull request?
Add fallback mechanism for blocks read inconsistent

### Why are the changes needed?
When the data in this first server is damaged, application will fail. #124 #129 

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Already added
@jerqi
Copy link
Contributor

jerqi commented Nov 28, 2022

closed by #276

@jerqi jerqi closed this as completed Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants