Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Azure List by Prefix Bug #42713

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented May 30, 2019

  • This check is necessary to make Azure more resilient. Without the type check listings that contain directories as well as blobs will throw ClassCastExceptions because they will show up in the listing as CloudBlobDirectory
  • I would suggest to back port this to 6.x since it may have implications for cloud resiliency

* This check is necessary to make Azure more resilient. Without the type check listings that contain directories as well as blobs will throw `ClassCastException`s
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the change, but I want to better understand what types of problems that this is addressing. Is this addressing the case where a user is manually adding different types of blobs (append or page blobs) to existing directories of the ES repo?

final String name = blobPath.substring(keyPath.length());
logger.trace(() -> new ParameterizedMessage("blob url [{}], name [{}], size [{}]", uri, name, properties.getLength()));
blobsBuilder.put(name, new PlainBlobMetaData(name, properties.getLength()));
if (blobItem instanceof CloudBlockBlob) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the more generic CloudBlob here? Do you know under which conditions this is returning CloudBlobDirectory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the more generic CloudBlob here?

Can't do it, we need to get the size off of the object and the generic blob doesn't have that on its interface.

Do you know under which conditions this is returning CloudBlobDirectory?

Sure, simple:

Whenever we do a list by prefix and it catches a directory we error out. E.g. if someone accidentally created a directory in a shard dir (I know it's a little far fetched for someone to manually created a dir somewhere but: 1. still we shouldn't completely shut down if they did. 2. Maybe it's not that far fetched given that people could be messing around with the nested dir setting of their repos).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do it, we need to get the size off of the object and the generic blob doesn't have that on its interface.

The getProperties() method is on CloudBlob, not CloudBlockBlob.

Note that we should also make the listBlobsByPrefix method consistent (and have tests for it) across all implementations. AFAICS, for HDFS it also returns directories...

Perhaps better to keep this in #42653 and add the tests there.

Copy link
Member Author

@original-brownbear original-brownbear May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry my bad, I mixed up the kinds of blobs :D Sure let's keep it in #42653 then (that one already has tests for this scenario :) that's how I found this :P). I'll weaken the cast there then + make the tests a little more explicit for this case -> closing here :)

@original-brownbear
Copy link
Member Author

Is this addressing the case where a user is manually adding different types of blobs (append or page blobs) to existing directories of the ES repo?

Not really, those cases are either ignored currently or cleaned up in the shard data dirs. This handles the case where users would manually add directories named/positioned in a way that would have them get picked up by one of our listing calls.

final String name = blobPath.substring(keyPath.length());
logger.trace(() -> new ParameterizedMessage("blob url [{}], name [{}], size [{}]", uri, name, properties.getLength()));
blobsBuilder.put(name, new PlainBlobMetaData(name, properties.getLength()));
if (blobItem instanceof CloudBlockBlob) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do it, we need to get the size off of the object and the generic blob doesn't have that on its interface.

The getProperties() method is on CloudBlob, not CloudBlockBlob.

Note that we should also make the listBlobsByPrefix method consistent (and have tests for it) across all implementations. AFAICS, for HDFS it also returns directories...

Perhaps better to keep this in #42653 and add the tests there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants