-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iceberg caching catalog refresh table failed #55828
Comments
@eshishki how do you reproduce this case? or how can I reproduce this case? |
@dirtysalt create iceberg connector with caching catalog |
Yeah, I can reproduce that. I almost know what happens.
And I think the reason is (According to gpt, not sure if it's correct): partition table scan only supports latest snapshot id. |
@eshishki I think i have to close this pr. this failure is expected. the reason why it happens is becuas there could be diff between
When you update iceberg table out from starrocks(like spark), B will get updated, but A is not(until cache expired). But eventually the snapshot id will be the same. However here we just want to get partitions of (A) but not (B). So we should pass And if we don't pass snapshot_id=A, this function will return result of partitions in snapshot_id=B. Thart's not what we expect. |
no, the issue is that if does not support snaphost scan in metascan at all |
yes. another way is to check snapshot id is the latest , to avoid this exception. but this exception does not have any harm. because we want partitions of snapshot id A, but latest snapshot is B, then we should not get partitions. |
no, during refresh catalog, we resolve new table snapshot, we request table partitions of this new snapshot, and fail here, so we dont cache the table and partitions names |
let me check with my team mate, I guess maybe it's not correct here. this snapshot id is still old one. maybe have to use if (!currentLocation.equals(updateLocation)) {
LOG.info("Refresh iceberg caching catalog table {}.{} from {} to {}",
dbName, tableName, currentLocation, updateLocation);
long baseSnapshotId = currentOps.current().currentSnapshot().snapshotId(); //<---- HERE
refreshTable(updateTable, baseSnapshotId, dbName, tableName, executorService);
LOG.info("Finished to refresh iceberg table {}.{}", dbName, tableName);
} |
no, we call if you wanted just to get the latest partitions and disregard snapshot, ignore that snapshot id |
guys, If I say "I want explicity snapshot id = A partitions" explicitly. here what I want to say is
|
i believe you should use the old function https://github.com/StarRocks/starrocks/pull/53007/files#diff-14a0f2a038313d660d400d66ff60bf2546a7c61967a7ba6070c6f9706b8bc414L102 |
I discard it for perofmance wise. the reason why I need partitions, only when there is materialized view for iceberg partitions mapping. and it can tolerate some delays of refreshing parttions. so normal scan case, partitions are not needed any more. I get whay you mean
I'll review this design stuff. |
The issue seems to be introduced by #53007
on table refresh we get this stacktrace
Cannot find snapshot with ID
is a message from iceberg 'SnapshotScan.useSnapshot'i verified that snapshot do exist, but the error is real too
so the issue seems to be that we try to set this snapshot to metadata table scan, which is not the same as base table
#53007 (comment)
The text was updated successfully, but these errors were encountered: