Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix check_access_control_on_utilized_columns_only for CTE #24647

Merged
merged 1 commit into from
Mar 7, 2025

Conversation

rschlussel
Copy link
Contributor

@rschlussel rschlussel commented Feb 27, 2025

Description

Previously we weren't associating the with query with the columns used from it in the main part of the query, so we wouldn't consider any columns as used. When the usage within the cte or outside of it was just an identifier, we would have the utilized column collected anyway because the column collected in the main query kept the source table information. However, if there was an expression in between, we would lose that information, and we wouldn't check access control for the column that was used in an expression in the cte.

Example:

For the following query

with cte as (select  x as c1, z + 1 as c2 from t13) select c1, c2 from (select * from cte)

Previously we would only check access permissions on t13.x, but not t13.z. With this change we will check column access on both 13.x and t13.z.

Motivation and Context

Fix a bug

Impact

Fixes a security hole where we were not checking access for some columns used in a query when the reference is within a with clause.

Test Plan

unit test

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix a security bug when ``check_access_control_for_utlized_columns`` is true for queries that uses a ``WITH`` clause. Previously we would sometime not check permissions for certain columns that were used in the query.  Now we will always check permissions for all columns used in the query. There are some corner cases for CTEs with the same name where we may check more columns than are used or fall back to checking all columns referenced in the query. 

@rschlussel
Copy link
Contributor Author

I'm realizing i didn't test for the following cases, and not sure they will work properly

  1. 2 with clauses and one refers to the other e.g. with cte1 as (SELECT x as c1, z+1 as c2 from t13), cte2 as (SELECT c1+1 as a c2 as b) SELECT * from cte2. Not sure what order the with clauses show up in the list, but they may actually need to be evaluated in reverse order
  2. 2 ctes with the same name (e.g. nested/within different scopes). I think to handle this we'd need a different way to identify the source table that takes into account the scope. If it's too complex IMO it's okay that we don't handle this and just fall back to regular, but failing the checkState is probably not the clearest way to do that. And ideally we'd add a query warning or something so people know that happened.

@rschlussel rschlussel force-pushed the fix-utilized-columns branch 2 times, most recently from facc558 to 3879ae5 Compare February 28, 2025 00:41
@rschlussel
Copy link
Contributor Author

I've updated this to handle multiple ctes that reference each other, and add more explicit handling for falling back to checking all columns when multiple ctes have the same name. Also added a warning for queries that fall back to checking all columns so that it's easier to see when it's happening. Since this is a security vulnerability, i'd rather get the fix in quickly than invest in handling that corner case.

@@ -526,7 +526,10 @@ public void copyFieldIdsToExploreForWithQuery(WithQuery withQuery)
List<RelationId> relationIds = fieldsToExplore.keySet().stream()
.filter(key -> key.getSourceNode() instanceof Table && ((Table) key.getSourceNode()).getName().equals(name))
.collect(toImmutableList());
if (relationIds.size() != 1) {
if (relationIds.isEmpty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases would the relationId would be empty? Is that if the CTE doesn't actually get used in the query or are there other cases as well? Can you add tests?

@facebook-github-bot
Copy link
Collaborator

@rschlussel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Collaborator

@rschlussel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Collaborator

@rschlussel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kevintang2022
kevintang2022 previously approved these changes Mar 6, 2025
Comment on lines +552 to +553
// TODO(kevintang2022): Fix visitWithQuery so that it looks up relation properly.
// Currently, it just looks the relations using the name (string) of the CTE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some todo for followup. The name conflict is caused by just using name for the look up. We might need to lookup with more info or add a unique identifier to each column

@kevintang2022 kevintang2022 force-pushed the fix-utilized-columns branch from 72b6a4b to bc4566d Compare March 6, 2025 18:14
Previously we weren't associating the with query with the columns used
from it in the main part of the query, so we wouldn't consider any
columns as used.  When the usage within the cte or outside of it was
just an identifier, we would have the utilized column collected anyway
because the column collected in the main query kept the source table
information. However, if there was an expression in between, we would
lose that information, and we wouldn't check access control for the
column that was used in an expression in the cte.

Example:

For the following query
```
with cte as (select  x as c1, z + 1 as c2 from t13) select c1, c2 from (select * from cte)
```

Previously we would only check access permissions on t13.x, but not
t13.z.  With this change we will check column access on both 13.x and
t13.z.

Fix empty relations case

Add error message to utilized columns warning.

handle ctes used multiple times

fixup >1 reference support

Add tests for multiple cte with same name

Co-authored-by: Kevin Tang <tangk@meta.com>

Add comments clarification and test for multiple same name for multiple columns
@kevintang2022 kevintang2022 force-pushed the fix-utilized-columns branch from c4cceec to 30bc51f Compare March 6, 2025 19:26
Copy link
Contributor Author

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks! let's find someone else to review. (i can't approve because it's my PR)

Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Release note

@rschlussel rschlussel merged commit 95845fb into prestodb:master Mar 7, 2025
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants