Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add column declarations to dataswarm Glean schema
Summary: This adds support for column declarations to the Glean schema for the dataswarm indexer so we can support goto definition on columns. We distinguish two cases: [1] Columns declared in a Hive table's toplevel INSERT SELECT statement and [2] columns declared in a subquery Example: ```text my_task = PrestoInsertOperatorWithSchema( output_data={"out": output.table("<TABLE:my_table>")}, select=""" WITH foo AS ( SELECT blah <--- SubqueryColumnDeclaration(di.my_task, foo, blah) FROM table1 ) SELECT blah <--- TableColumnDeclaration(my_table:di, blah) FROM foo """ ) ``` We treat these two cases differently because a Hive table name+namespace is globally unique across the warehouse, so table_name+namespace+column_name is sufficient to uniquely identify a column declaration – but by contrast a subquery name is not globally unique, it is scoped to a given SQL query, so in that case we need the dataswarm task ID to uniquely identify it Reviewed By: iamirzhan Differential Revision: D67097540 fbshipit-source-id: 0b94b0c86adaca5fc126d1396e7de6e139f1de9c
- Loading branch information