-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542
fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542
Conversation
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_sql_parser.py
Outdated
Show resolved
Hide resolved
…hub-fork into master+ing-510-lookml-cll
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_sql_parser.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_sql_parser.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
…hub-fork into master+ing-510-lookml-cll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Outside diff range and nitpick comments (4)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (1)
Line range hint
1349-1349
: Remove use oflru_cache
on methods.Using
functools.lru_cache
on methods can lead to memory leaks. Consider using an alternative caching mechanism.- @lru_cache(maxsize=200) + # @lru_cache(maxsize=200)metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (2)
Line range hint
385-385
: Optimize dictionary key check.Use
key in dict
instead ofkey in dict.keys()
.- for field in filters.keys(): + for field in filters:
Line range hint
1260-1264
: Refactor nestedif
statements.Use a single
if
statement instead of nestedif
statements.- if dashboard is None and dashboard_element is not None: - ownership = self.get_ownership(dashboard_element) - if ownership is not None: - chart_snapshot.aspects.append(ownership) + if dashboard is None and dashboard_element is not None and (ownership := self.get_ownership(dashboard_element)) is not None: + chart_snapshot.aspects.append(ownership)metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (1)
Line range hint
1405-1485
: Ensure completeness of field definitions.The field
country
is mentioned in the view logic but not defined in the schema metadata. This could lead to incomplete metadata representation.Ensure that all fields used in the view logic are defined in the schema metadata.
{ "fieldPath": "country", "nullable": false, "description": "Country", "label": "", "type": { "type": { "com.linkedin.pegasus2avro.schema.StringType": {} } }, "nativeDataType": "string", "recursive": false, "globalTags": { "tags": [] }, "isPartOfKey": false }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (41)
- metadata-ingestion/setup.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_resolver.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (21 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/str_functions.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/urn_functions.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
- metadata-ingestion/tests/integration/looker/test_looker.py (1 hunks)
- metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (6 hunks)
- metadata-ingestion/tests/integration/lookml/expected_output.json (19 hunks)
- metadata-ingestion/tests/integration/lookml/lkml_samples/liquid.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/lkml_samples/nested/fragment_derived.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/lkml_samples_hive/included_view_file.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/lkml_samples_hive/liquid.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/lkml_samples_hive/nested/fragment_derived.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (6 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json (12 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_same_name_views_different_file_path.json (8 hunks)
- metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (20 hunks)
- metadata-ingestion/tests/integration/lookml/test_lookml.py (4 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/activity_logs.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_tax_report.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_total_income.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/top_10_employee_income_source.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (1 hunks)
Files not summarized due to errors (2)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
- metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json: Error: Message exceeds token limit
Files not reviewed due to errors (1)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (no review received)
Files skipped from review due to trivial changes (4)
- metadata-ingestion/src/datahub/ingestion/source/looker/str_functions.py
- metadata-ingestion/tests/integration/lookml/lkml_samples/liquid.view.lkml
- metadata-ingestion/tests/integration/lookml/lkml_samples_hive/liquid.view.lkml
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py
145-152: Use a single
if
statement instead of nestedif
statements(SIM102)
metadata-ingestion/tests/integration/lookml/test_lookml.py
719-720: Use a single
if
statement instead of nestedif
statements(SIM102)
metadata-ingestion/tests/integration/looker/test_looker.py
490-490: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
411-414: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
632-635: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
1349-1349: Use of
functools.lru_cache
orfunctools.cache
on methods can lead to memory leaks(B019)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
385-385: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
1260-1264: Use a single
if
statement instead of nestedif
statements(SIM102)
Additional comments not posted (173)
metadata-ingestion/tests/integration/lookml/lkml_samples_hive/included_view_file.view.lkml (1)
2-2
: Verify the SQL table name formatting.Ensure that the SQL table name
"looker_schema"."include_able"
is correctly formatted and valid in your database.metadata-ingestion/tests/integration/lookml/lkml_samples/nested/fragment_derived.view.lkml (3)
4-4
: Verify the SQL syntax and column name.Ensure that the column
date
exists and the aliasDATE
is correctly used in the SQL query.
5-5
: Verify the SQL syntax and column name.Ensure that the column
platform
exists and the aliasaliased_platform
is correctly used in the SQL query.
6-6
: Verify the SQL syntax and column name.Ensure that the column
country
exists and is correctly used in the SQL query.metadata-ingestion/tests/integration/lookml/lkml_samples_hive/nested/fragment_derived.view.lkml (3)
4-4
: Verify the SQL syntax and column name.Ensure that the column
date
exists and the aliasDATE
is correctly used in the SQL query.
5-5
: Verify the SQL syntax and column name.Ensure that the column
platform
exists and the aliasaliased_platform
is correctly used in the SQL query.
6-6
: Verify the SQL syntax and column name.Ensure that the column
country
exists and is correctly used in the SQL query.metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_tax_report.view.lkml (4)
2-2
: Verify the SQL table name formatting.Ensure that the SQL table name
data-warehouse.finance.form-16
is correctly formatted and valid in your database.
4-6
: Verify the dimension type and SQL syntax.Ensure that the dimension
id
with typenumber
and SQL${TABLE}.id
is correctly defined.
9-11
: Verify the dimension type and SQL syntax.Ensure that the dimension
name
with typestring
and SQL${TABLE}.name
is correctly defined.
14-16
: Verify the measure type and SQL syntax.Ensure that the measure
taxable_income
with typesum
and SQL${TABLE}.tax
is correctly defined.metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_total_income.view.lkml (4)
1-3
: LGTM!The SQL table name is correctly defined using a liquid template variable.
4-7
: LGTM!The dimension
id
is correctly defined with typenumber
and a SQL expression using a liquid template variable.
9-12
: LGTM!The dimension
name
is correctly defined with typestring
and a SQL expression using a liquid template variable.
14-17
: LGTM!The measure
total_income
is correctly defined with typesum
and a SQL expression using a liquid template variable.metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/top_10_employee_income_source.view.lkml (4)
1-10
: LGTM!The derived table is correctly defined using a SQL query with a liquid template variable.
12-15
: LGTM!The dimension
id
is correctly defined with typenumber
and a SQL expression using a liquid template variable.
17-20
: LGTM!The dimension
name
is correctly defined with typestring
and a SQL expression using a liquid template variable.
22-25
: LGTM!The dimension
source
is correctly defined with typestring
and a SQL expression using a liquid template variable.metadata-ingestion/src/datahub/ingestion/source/looker/urn_functions.py (2)
1-11
: LGTM!The function
get_qualified_table_name
correctly handles the URN format and returns the appropriate part of the URN.
13-18
: LGTM!The function
get_table_name
correctly handles the qualified table name and returns the appropriate part of the name.metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/activity_logs.view.lkml (2)
1-10
: LGTM!The SQL table name is correctly defined using liquid template variables and conditional logic.
12-17
: LGTM!The dimension
generated_message_id
is correctly defined with a group label, primary key, type, and SQL expression using a liquid template variable.metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source.view.lkml (3)
1-1
: Add a description for the view.It's good practice to add a description for the view to improve readability and maintainability.
+ description: "This view represents employee income source data."
6-12
: Ensure proper handling of SQL injection.Using liquid template tags in SQL queries can introduce SQL injection vulnerabilities. Ensure that the values used in these tags are properly sanitized.
Do you have measures in place to sanitize the values used in these liquid template tags?
16-16
: Verify the custom condition tag implementation.Ensure that the custom
condition
tag used here is correctly implemented and tested.Is the custom
condition
tag implementation tested and verified for correctness?metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py (2)
14-17
: Add a docstring for the CustomTagException class.Adding a docstring will improve code readability and maintainability.
class CustomTagException(Exception): + """ + Exception raised for errors in the custom tag processing. + + Attributes: + message -- explanation of the error + """
45-56
: Improve the docstring for the ConditionTag class.Clarify the usage of the
ConditionTag
class and provide examples.""" ConditionTag is the equivalent implementation of Looker's custom liquid tag "condition". Refer doc: https://cloud.google.com/looker/docs/templated-filters#basic_usage Refer doc to see how to write liquid custom tag: https://jg-rp.github.io/liquid/guides/custom-tags This class renders the below tag as order.region='ap-south-1' if order_region is provided in config.liquid_variables as order_region: 'ap-south-1' {% condition order_region %} order.region {% endcondition %} +Usage example: + {% condition order_region %} order.region {% endcondition %} """metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (2)
42-64
: Add a docstring for the LookerConnectionDefinition class.Adding a docstring will improve code readability and maintainability.
class LookerConnectionDefinition(ConfigModel): + """ + Represents a Looker connection definition. + + Attributes: + platform -- the platform name + default_db -- the default database name + default_schema -- the default schema name (optional) + platform_instance -- the platform instance name (optional) + platform_env -- the environment that the platform is located in (optional) + """
75-85
: Improve error handling in from_looker_connection method.Ensure that the method handles missing dialect names gracefully.
if looker_connection.dialect_name is None: raise ConfigurationError( f"Unable to fetch a fully filled out connection for {looker_connection.name}. Please check your API permissions." ) for extractor_pattern, extracting_function in extractors.items(): if re.match(extractor_pattern, looker_connection.dialect_name): (platform, db, schema) = extracting_function(looker_connection) return cls(platform=platform, default_db=db, default_schema=schema) raise ConfigurationError( f"Could not find an appropriate platform for looker_connection: {looker_connection.name} with dialect: {looker_connection.dialect_name}" )Likely invalid or redundant comment.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (2)
40-42
: Add a docstring for the is_view_seen method.Adding a docstring will improve code readability and maintainability.
def is_view_seen(self, path: str) -> bool: + """ + Checks if the view file at the given path has already been loaded. + + Args: + path: The path to the view file. + + Returns: + True if the view file has been loaded, False otherwise. + """ return path in self.viewfile_cache
98-113
: Add a docstring for the load_viewfile method.Adding a docstring will improve code readability and maintainability.
def load_viewfile( self, path: str, project_name: str, connection: Optional[LookerConnectionDefinition], reporter: LookMLSourceReport, ) -> Optional[LookerViewFile]: + """ + Loads the Looker view file at the given path, resolves liquid variables, and caches the result. + + Args: + path: The path to the view file. + project_name: The name of the project. + connection: The Looker connection definition. + reporter: The source report for logging and error reporting. + + Returns: + The loaded LookerViewFile object, or None if loading failed. + """ viewfile = self._load_viewfile( project_name=project_name, path=path, reporter=reporter, ) if viewfile is None: return None return replace(viewfile, connection=connection)metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (6)
19-23
: Add type hints to the function.Type hints improve code readability and help catch type-related errors early.
- def create_nested_dict(keys, value): + def create_nested_dict(keys: List[str], value: Any) -> Dict[str, Any]:
26-34
: Add type hints to the class methods.Type hints improve code readability and help catch type-related errors early.
- def __init__(self, liquid_variable): + def __init__(self, liquid_variable: Dict[str, Any]):
35-60
: Add type hints to the method_create_new_liquid_variables_with_default
.Type hints improve code readability and help catch type-related errors early.
- def _create_new_liquid_variables_with_default(self, variables: Set[str]) -> dict: + def _create_new_liquid_variables_with_default(self, variables: Set[str]) -> Dict[str, Any]:
62-74
: Add type hints to the methodliquid_variable_with_default
.Type hints improve code readability and help catch type-related errors early.
- def liquid_variable_with_default(self, text: str) -> dict: + def liquid_variable_with_default(self, text: str) -> Dict[str, Any]:
77-101
: Add type hints to the functionresolve_liquid_variable
.Type hints improve code readability and help catch type-related errors early.
- def resolve_liquid_variable(text: str, liquid_variable: Dict[Any, Any]) -> str: + def resolve_liquid_variable(text: str, liquid_variable: Dict[str, Any]) -> str:
104-122
: Add type hints to the functionresolve_liquid_variable_in_view_dict
.Type hints improve code readability and help catch type-related errors early.
- def resolve_liquid_variable_in_view_dict(raw_view: dict, liquid_variable: Dict[Any, Any]) -> None: + def resolve_liquid_variable_in_view_dict(raw_view: Dict[str, Any], liquid_variable: Dict[str, Any]) -> None:metadata-ingestion/src/datahub/ingestion/source/looker/lookml_resolver.py (7)
25-29
: Add type hints to the functionis_derived_view
.Type hints improve code readability and help catch type-related errors early.
- def is_derived_view(view_name: str) -> bool: + def is_derived_view(view_name: str) -> bool:
32-52
: Add type hints to the functionget_derived_looker_view_id
.Type hints improve code readability and help catch type-related errors early.
- def get_derived_looker_view_id(qualified_table_name: str, looker_view_id_cache: "LookerViewIdCache", base_folder_path: str) -> Optional[LookerViewId]: + def get_derived_looker_view_id(qualified_table_name: str, looker_view_id_cache: "LookerViewIdCache", base_folder_path: str) -> Optional[LookerViewId]:
55-81
: Add type hints to the functionresolve_derived_view_urn_of_col_ref
.Type hints improve code readability and help catch type-related errors early.
- def resolve_derived_view_urn_of_col_ref(column_refs: List[ColumnRef], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[ColumnRef]: + def resolve_derived_view_urn_of_col_ref(column_refs: List[ColumnRef], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[ColumnRef]:
84-110
: Add type hints to the functionfix_derived_view_urn
.Type hints improve code readability and help catch type-related errors early.
- def fix_derived_view_urn(urns: List[str], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[str]: + def fix_derived_view_urn(urns: List[str], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[str]:
113-127
: Add type hints to the functiondetermine_view_file_path
.Type hints improve code readability and help catch type-related errors early.
- def determine_view_file_path(base_folder_path: str, absolute_file_path: str) -> str: + def determine_view_file_path(base_folder_path: str, absolute_file_path: str) -> str:
129-173
: Add type hints to the class methods.Type hints improve code readability and help catch type-related errors early.
- def __init__(self, project_name: str, model_name: str, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, reporter: LookMLSourceReport): + def __init__(self, project_name: str, model_name: str, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, reporter: LookMLSourceReport):
174-215
: Add type hints to the methodget_looker_view_id
.Type hints improve code readability and help catch type-related errors early.
- def get_looker_view_id(self, view_name: str, base_folder_path: str, connection: Optional[LookerConnectionDefinition] = None) -> Optional[LookerViewId]: + def get_looker_view_id(self, view_name: str, base_folder_path: str, connection: Optional[LookerConnectionDefinition] = None) -> Optional[LookerViewId]:metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (5)
18-62
: Add type hints to the class methods.Type hints improve code readability and help catch type-related errors early.
- def __init__(self, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, connection_definition: LookerConnectionDefinition, source_config: LookMLSourceConfig, reporter: LookMLSourceReport): + def __init__(self, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, connection_definition: LookerConnectionDefinition, source_config: LookMLSourceConfig, reporter: LookMLSourceReport):
63-66
: Add type hints to the functionis_refinement
.Type hints improve code readability and help catch type-related errors early.
- def is_refinement(view_name: str) -> bool: + def is_refinement(view_name: str) -> bool:
68-94
: Add type hints to the functionmerge_column
.Type hints improve code readability and help catch type-related errors early.
- def merge_column(original_dict: dict, refinement_dict: dict, key: str) -> List[dict]: + def merge_column(original_dict: Dict[str, Any], refinement_dict: Dict[str, Any], key: str) -> List[Dict[str, Any]]:
97-105
: Add type hints to the functionmerge_and_set_column
.Type hints improve code readability and help catch type-related errors early.
- def merge_and_set_column(new_raw_view: dict, refinement_view: dict, key: str) -> None: + def merge_and_set_column(new_raw_view: Dict[str, Any], refinement_view: Dict[str, Any], key: str) -> None:
107-132
: Add type hints to the functionmerge_refinements
.Type hints improve code readability and help catch type-related errors early.
- def merge_refinements(raw_view: dict, refinement_views: List[dict]) -> dict: + def merge_refinements(raw_view: Dict[str, Any], refinement_views: List[Dict[str, Any]]) -> Dict[str, Any]:metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (4)
18-22
: LGTM!The
ProjectInclude
dataclass looks good and is correctly implemented.
24-30
: LGTM!The
LookerField
dataclass looks good and is correctly implemented.
39-85
: LGTM!The
from_looker_dict
method is well-structured and handles errors appropriately. The logging and reporting mechanisms are in place.
242-278
: LGTM!The
from_looker_dict
method is well-structured and handles errors appropriately. The logging and reporting mechanisms are in place.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (7)
30-54
: LGTM!The methods in
LookerFieldContext
are well-structured and handle field context appropriately. The logging and error handling mechanisms are in place.
191-216
: LGTM!The
resolve_extends_view_name
method is well-structured and handles view name resolution appropriately. The logging and error handling mechanisms are in place.
218-248
: LGTM!The
get_including_extends
method is well-structured and handles field resolution appropriately. The logging and error handling mechanisms are in place.
250-277
: LGTM!The methods
_get_sql_table_name_field
,_is_dot_sql_table_name_present
, andsql_table_name
are well-structured and handle SQL table name resolution appropriately. The logging and error handling mechanisms are in place.
278-321
: LGTM!The methods
derived_table
,explore_source
, andsql
are well-structured and handle derived table and SQL resolution appropriately. The logging and error handling mechanisms are in place.
323-343
: LGTM!The methods
name
andview_file_name
are well-structured and handle view name and file name resolution appropriately. The logging and error handling mechanisms are in place.
344-413
: LGTM!The methods
_get_list_dict
,dimensions
,measures
,dimension_groups
,is_materialized_derived_view
,is_regular_case
,is_sql_table_name_referring_to_view
,is_sql_based_derived_case
,is_native_derived_case
, andis_sql_based_derived_view_without_fields_case
are well-structured and handle view context appropriately. The logging and error handling mechanisms are in place.metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (3)
Line range hint
1-106
: LGTM!The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.
Line range hint
107-233
: LGTM!The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.
Line range hint
234-494
: LGTM!The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.
metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (6)
42-110
: LGTM!The utility functions
_platform_names_have_2_parts
,_drop_hive_dot
,_drop_hive_dot_from_upstream
, and_generate_fully_qualified_name
are well-structured and handle platform-specific naming and transformations appropriately. The logging and error handling mechanisms are in place.
113-148
: LGTM!The
AbstractViewUpstream
class is well-structured and defines abstract methods for extracting upstream column references and dataset URNs.
169-194
: LGTM!The
__get_spr
method is well-structured and handles SQL parsing results appropriately. The logging and error handling mechanisms are in place.
196-214
: LGTM!The
__get_upstream_dataset_urn
method is well-structured and handles upstream dataset URN resolution appropriately. The logging and error handling mechanisms are in place.
216-242
: LGTM!The
create_fields
method is well-structured and handles field creation appropriately. The logging and error handling mechanisms are in place.
244-281
: LGTM!The
get_upstream_column_ref
method is well-structured and handles upstream column references appropriately. The logging and error handling mechanisms are in place.metadata-ingestion/tests/integration/lookml/lookml_same_name_views_different_file_path.json (2)
Line range hint
386-694
: Ensure the correctness of browse paths.The browse paths appear correct, but it's crucial to verify if the container references and path IDs are accurate and exist in the respective datasets.
Verification successful
Browse paths and container references are correct.
The container references and path IDs (
urn:li:container:78f22c19304954b15e8adb1d9809975e
,Develop
,path1
,path2
,foo
) exist in the dataset definitions, validating the correctness of the browse paths.
urn:li:container:78f22c19304954b15e8adb1d9809975e
is found in multiple dataset definitions.id: "Develop"
,id: "path1"
,id: "path2"
, andid: "foo"
are all confirmed in the dataset definitions.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the existence of container references and path IDs in the respective datasets. # Test: Search for the container references and path IDs in the dataset definitions. Expect: Container references and path IDs exist in the datasets. rg --type json 'urn:li:container:78f22c19304954b15e8adb1d9809975e' rg --type json '"id": "Develop"' rg --type json '"id": "path1"' rg --type json '"id": "path2"' rg --type json '"id": "foo"'Length of output: 80442
173-229
: Ensure the correctness of fine-grained lineage.The fine-grained lineage entries appear correct, but it's crucial to verify if the upstream and downstream field references are accurate and exist in the respective datasets.
Verification successful
Fine-grained lineage entries are valid.
The upstream and downstream field references exist in the respective datasets, confirming the correctness of the fine-grained lineage entries.
- No issues found with the fine-grained lineage entries.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the existence of upstream and downstream field references in the respective datasets. # Test: Search for the field references in the dataset definitions. Expect: Field references exist in the datasets. rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country|city|is_latest|timestamp|measurement)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),country|city|is_latest|timestamp|average_measurement)'Length of output: 358
Script:
#!/bin/bash # Description: Verify the existence of upstream and downstream field references in the respective datasets. # Test: Search for the specific field references in the dataset definitions. Expect: Field references exist in the datasets. rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),city)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),is_latest)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),timestamp)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),measurement)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),country)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),city)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),is_latest)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),timestamp)' rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),average_measurement)'Length of output: 1402
metadata-ingestion/setup.py (1)
169-169
: Dependency addition looks good!The addition of
python-liquid
to thelooker_common
andmode
dependencies is correctly formatted and logically consistent.Also applies to: 374-374
metadata-ingestion/tests/integration/lookml/test_lookml.py (2)
896-928
: LGTM!The function
test_view_to_view_lineage_and_liquid_template
is well-structured and correctly sets up the pipeline with liquid variables. The use offreeze_time
ensures consistent test results. The golden file verification is a good practice to ensure the correctness of the output.
931-1004
: LGTM!The function
test_special_liquid_variables
is well-structured and correctly checks the handling of special liquid variables. The use offreeze_time
ensures consistent test results. The assertions ensure that the default values are correctly added and that the actual values are not overwritten.metadata-ingestion/tests/integration/looker/test_looker.py (1)
1053-1080
: LGTM!The function
test_upstream_cll
is well-structured and correctly sets up the mock Looker explore. The use offreeze_time
ensures consistent test results. The mock configuration is well-defined. The assertions ensure that the upstream fields are correctly set.metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json (2)
604-616
: Ensure dataset URNs are updated consistently.The dataset URN for
lkml_samples.view.owners
should be updated consistently across all aspects.
386-398
: Ensure dataset URNs are updated consistently.The dataset URN for
lkml_samples.view.my_view
should be updated consistently across all aspects.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (8)
4-7
: Import statements look good.The added imports are necessary for the updated functionality.
38-45
: New imports fromlooker_common
are appropriate.The added imports from
looker_common
are necessary for the updated functionality.
98-98
: New import forColumnRef
is appropriate.The added import for
ColumnRef
is necessary for fine-grained lineage extraction.
875-875
: Ensure consistent usage ofLookerRefinementResolver
.The
LookerRefinementResolver
instance is correctly instantiated and used for explore refinement.
912-918
: Ensure proper initialization ofLookerViewIdCache
.The
LookerViewIdCache
instance is correctly instantiated with the necessary parameters.
972-980
: Ensure proper initialization ofLookerViewContext
.The
LookerViewContext
instance is correctly instantiated with the necessary parameters.
985-994
: Ensure proper initialization ofLookerView
from Looker dictionary.The
LookerView
instance is correctly instantiated with the necessary parameters.
632-635
: Improve exception handling by chaining exceptions.Use
raise ... from err
to distinguish the exception from errors in exception handling.- raise ValueError(f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file") + raise ValueError(f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file") from errLikely invalid or redundant comment.
Tools
Ruff
632-635: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (9)
131-138
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- # Remove duplicates filed from self.fields + # Remove duplicate fields from the provided list of fields. - # Logic is: If more than a field has same ViewField.name then keep only one filed where ViewField.field_type + # Logic: If more than one field has the same ViewField.name, keep only the field where ViewField.field_type - # is DIMENSION_GROUP. + # is DIMENSION_GROUP. - # Looker Constraint: + # Looker Constraints: - # - Any field declared as dimension or measure can be redefined as dimension_group. + # - Any field declared as a dimension or measure can be redefined as a dimension_group. - # - Any field declared in dimension can't be redefined in measure and vice-versa. + # - Any field declared as a dimension can't be redefined as a measure and vice-versa.
296-297
: Verify the type hint forupstream_fields
.Ensure that the type hint
Union[List[ColumnRef]]
is appropriate and consider if it should beList[ColumnRef]
instead.- upstream_fields: Union[List[ColumnRef]] = dataclasses_field(default_factory=list) + upstream_fields: List[ColumnRef] = dataclasses_field(default_factory=list)
299-332
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- # It is the list of ColumnRef for derived view defined using SQL otherwise simple column name + # It is the list of ColumnRef for a derived view defined using SQL, otherwise a simple column name.
340-402
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- return None # Inconsistent info received + return None # Inconsistent information received. - # remove variant at the end. +1 for "_" + # Remove variant at the end. +1 for "_". - assert view_name # for lint false positive + assert view_name # For lint false positive.
403-456
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- ) # Variant i.e. Month, Day, Year ... is not available + ) # Variant (e.g., Month, Day, Year, etc.) is not available. - ) # for Dimensional Group the type is always start with date_[time|date] + ) # For Dimensional Group, the type always starts with date_[time|date]. - ) # if the explore field is generated because of Dimensional Group in View - # then the field_name should ends with field_group_variant + ) # If the explore field is generated because of Dimensional Group in View, + # then the field_name should end with field_group_variant.
Line range hint
459-467
: LGTM!The function
create_view_project_map
is correct and straightforward.
Line range hint
844-895
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- # The view name that the explore refers to is resolved in the following order of priority: - # 1. view_name: https://cloud.google.com/looker/docs/reference/param-explore-view-name - # 2. from: https://cloud.google.com/looker/docs/reference/param-explore-from - # 3. default to the name of the explore + # The view name that the explore refers to is resolved in the following order of priority: + # 1. view_name: https://cloud.google.com/looker/docs/reference/param-explore-view-name + # 2. from: https://cloud.google.com/looker/docs/reference/param-explore-from + # 3. Default to the name of the explore.
1083-1103
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- # form upstream of fields as all information is now available + # Form upstream of fields as all information is now available.
Line range hint
1217-1267
: Improve comment clarity.The comments explaining the logic can be made clearer for better understanding.
- # if we raise error on file_path equal to None then existing test-cases will fail as mock data - # doesn't have required attributes. + # If we raise an error on file_path equal to None, then existing test cases will fail as mock data + # doesn't have the required attributes.metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (27)
3-16
: Ensure container properties are correctly defined.The container properties aspect appears to be correctly defined with custom properties, name, and other metadata.
27-32
: Ensure status aspect is correctly defined.The status aspect for the container is correctly defined with the
removed
field set tofalse
.
43-48
: Ensure dataPlatformInstance aspect is correctly defined.The dataPlatformInstance aspect correctly identifies the platform as Looker.
59-66
: Ensure subTypes aspect is correctly defined.The subTypes aspect correctly identifies the type as "LookML Project".
77-86
: Ensure browsePathsV2 aspect is correctly defined.The browsePathsV2 aspect correctly defines the path for the container.
97-104
: Ensure subTypes aspect is correctly defined.The subTypes aspect correctly identifies the type as "View".
133-138
: Ensure container aspect is correctly defined.The container aspect correctly identifies the container URN.
146-246
: Ensure proposedSnapshot is correctly defined.The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.
283-290
: Ensure subTypes aspect is correctly defined.The subTypes aspect correctly identifies the type as "View".
319-324
: Ensure container aspect is correctly defined.The container aspect correctly identifies the container URN.
332-526
: Ensure proposedSnapshot is correctly defined.The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.
563-570
: Ensure subTypes aspect is correctly defined.The subTypes aspect correctly identifies the type as "View".
599-604
: Ensure container aspect is correctly defined.The container aspect correctly identifies the container URN.
612-774
: Ensure proposedSnapshot is correctly defined.The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.
811-818
: Ensure subTypes aspect is correctly defined.The subTypes aspect correctly identifies the type as "View".
847-852
: Ensure container aspect is correctly defined.The container aspect correctly identifies the container URN.
860-1022
: Ensure proposedSnapshot is correctly defined.The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.
1059-1066
: Ensure subTypes aspect is correctly defined.The subTypes aspect correctly identifies the type as "View".
1095-1100
: Ensure container aspect is correctly defined.The container aspect correctly identifies the container URN.
1108-1270
: Ensure proposedSnapshot is correctly defined.The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.
1283-1296
: Ensure browsePathsV2 aspect is correctly defined.The browsePathsV2 aspect correctly defines the path for the dataset.
1307-1312
: Ensure tagKey aspect is correctly defined.The tagKey aspect correctly defines the name of the tag as "Dimension".
1323-1328
: Ensure tagKey aspect is correctly defined.The tagKey aspect correctly defines the name of the tag as "Measure".
301-308
: Ensure viewProperties aspect is correctly defined.The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.
115-122
: Ensure viewProperties aspect is correctly defined.The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.
Verification successful
We need to validate the SQL in the viewLogic field within the LookML configuration. The previous attempt using
sqlglot
failed because the command was not found in the environment. Let's check for other available tools that might help us validate LookML SQL.
The automated SQL validation tools are not available in the current environment. Therefore, we need to manually inspect the SQL in the viewLogic field for correctness.
Ensure viewProperties aspect is correctly defined:
- The SQL logic within the
viewLogic
field appears to be using LookML templating correctly.- The
sql_table_name
is dynamically set based on the_user_attributes['looker_env']
value, which is a common practice in LookML.- The
dimension
definition forgenerated_message_id
seems properly formatted with the necessary attributes (group_label
,primary_key
,type
,sql
).Given the context and the manual inspection, the SQL logic seems to be correctly defined.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Validate the SQL in the viewLogic field. # Test: Extract and validate the SQL. echo 'view: activity_logs { sql_table_name: {% if _user_attributes['looker_env'] == 'dev' %} {{ _user_attributes['dev_database_prefix'] }}analytics.{{ _user_attributes['dev_schema_prefix'] }}staging_app.stg_app__activity_logs {% elsif _user_attributes['looker_env'] == 'prod' %} analytics.staging_app.stg_app__activity_logs {% else %} analytics.staging_app.stg_app__activity_logs {% endif %} ;; dimension: generated_message_id { group_label: "IDs" primary_key: yes type: number sql: ${TABLE}."GENERATED_MESSAGE_ID" ;; } }' | sqlglot transpile --dialect lookmlLength of output: 720
Script:
#!/bin/bash # Check for available SQL validation tools in the environment command -v sqlfluff command -v sqlparse command -v lookmlLength of output: 63
829-836
: Ensure viewProperties aspect is correctly defined.The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.
581-588
: Ensure viewProperties aspect is correctly defined.The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.
metadata-ingestion/tests/integration/lookml/expected_output.json (12)
170-187
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD)
is correct and consistent across the file.Verification successful
Dataset URNs are Consistently Formatted
The dataset URN
urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD)
is consistently used across the relevant files.
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\)'Length of output: 2430
894-894
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD)
is correct and consistent across the file.Verification successful
Consistent Use of Dataset URNs Verified
The dataset URN
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD)
is consistently formatted across the relevant JSON files.
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.events,PROD\)'Length of output: 1300
1020-1020
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD)
is correct and consistent across the file.Verification successful
Ensure Consistent Use of Dataset URNs
The dataset URN
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD)
is consistently formatted and used across the filesmetadata-ingestion/tests/integration/lookml/expected_output.json
andmetadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
.
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.events,PROD\)'Length of output: 1300
1330-1330
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD)
is correct and consistent across the file.Verification successful
Consistent Use of Dataset URNs Verified
The dataset URN
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD)
is consistently formatted across the file and other related files.
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD\)'Length of output: 885
1456-1471
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,fragment_derived_view,PROD)
is correct and consistent across the file.Verification successful
Dataset URNs are consistent
The dataset URNs are consistently formatted across the relevant JSON files.
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\)'Length of output: 1739
488-511
: Ensure Consistent Use of Derived View URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD)
is correct and consistent across the file.Verification successful
Consistent Use of Derived View URNs Verified
The dataset URN for the upstream lineage
urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD)
is consistently formatted across the file.
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
metadata-ingestion/tests/integration/looker/golden_test_ingest_unaliased_joins.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of derived view URNs in the JSON file. # Test: Search for all derived view URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD\)'Length of output: 23279
Line range hint
1807-1826
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,.ecommerce.ability,PROD)
is correct and consistent across the file.Verification successful
Ensure Consistent Use of Dataset URNs
The dataset URN for the upstream lineage is consistent across the file and other related files.
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.ecommerce.ability,PROD\)'Length of output: 1285
188-229
: Verify Field Lineage ConsistencyEnsure that the field lineage information is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.
Verification successful
Verify Field Lineage Consistency
The field lineage information is accurate and consistent with the dataset URNs and field paths.
- Fields verified:
city
,is_latest
,timestamp
,measurement
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of field lineage information in the JSON file. # Test: Search for all field URNs. Expect: Consistent formatting and correct field paths. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),country\)'Length of output: 508
Script:
#!/bin/bash # Description: Verify consistency of field lineage information in the JSON file. # Test: Search for all field URNs mentioned in the review comment. Expect: Consistent formatting and correct field paths. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),city\)' rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),is_latest\)' rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),timestamp\)' rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),measurement\)'Length of output: 2050
512-533
: Verify Field Lineage Consistency for Derived ViewsEnsure that the field lineage information for derived views is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.
Verification successful
Field Lineage Consistency Verified
The field lineage information for derived views, including the dataset URNs and field paths for "timestamp" and "measurement," is accurate and consistent across the JSON files.
- File:
metadata-ingestion/tests/integration/lookml/expected_output.json
- Lines: 512-533
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of field lineage information for derived views in the JSON file. # Test: Search for all field URNs in derived views. Expect: Consistent formatting and correct field paths. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD\),country\)'Length of output: 3370
Script:
#!/bin/bash # Description: Verify consistency of field lineage information for derived views in the JSON file. # Test: Search for all field URNs in derived views for "timestamp" and "measurement". Expect: Consistent formatting and correct field paths. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD\),(timestamp|measurement)\)'Length of output: 4959
1204-1204
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,..autodetect_sql_name_based_on_view_name,PROD)
is correct and consistent across the file.
1472-1493
: Verify Field Lineage ConsistencyEnsure that the field lineage information is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.
Verification successful
Field Lineage Consistency Verified
The field lineage information for 'platform' and 'country' fields is accurate and consistent with the dataset URNs and field paths.
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of field lineage information in the JSON file. # Test: Search for all field URNs. Expect: Consistent formatting and correct field paths. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\),date\)'Length of output: 538
Script:
#!/bin/bash # Description: Verify consistency of field lineage information in the JSON file for 'platform' and 'country' fields. # Test: Search for all field URNs related to 'platform'. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\),platform\)' # Test: Search for all field URNs related to 'country'. rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\),country\)'Length of output: 1098
768-768
: Ensure Consistent Use of Dataset URNsThe dataset URN for the upstream lineage should be consistently formatted. Verify that
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD)
is correct and consistent across the file.Verification successful
Ensure Consistent Use of Dataset URNs
The dataset URN
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD)
is correctly and consistently formatted across the relevant files.
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of dataset URNs in the JSON file. # Test: Search for all dataset URNs. Expect: Consistent formatting. rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD\)'Length of output: 885
Script:
#!/bin/bash # Description: List all unique dataset URNs in the JSON files to verify consistency. # Extract all dataset URNs from the JSON files and list unique occurrences. rg --type json '"dataset": "urn:li:dataset:([^"]+)"' -o -r '$1' | sort | uniq -cLength of output: 106480
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (5)
173-229
: Ensure Consistency in Field Names Between Upstream and Downstream.The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names
country
,city
,is_latest
,timestamp
, andmeasurement
in the upstream and downstream datasets are correct and consistent.
488-533
: Ensure Consistency in Field Names Between Upstream and Downstream.The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names
country
,city
,timestamp
,measurement
, andaverage_measurement
in the upstream and downstream datasets are correct and consistent.
Line range hint
1405-1493
: Verify View Logic and Field Mapping.Ensure that the view logic and field mappings are correct and consistent with the dataset schema. The field names
date
,platform
, andcountry
should be verified for correctness.
1644-1644
: Verify View Logic and Field Mapping.Ensure that the view logic and field mappings are correct and consistent with the dataset schema. The field names
customer_id
,sale_price
, andorder_region
should be verified for correctness.
1459-1493
: Ensure Consistency in Field Names Between Upstream and Downstream.The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names
date
,platform
, andcountry
in the upstream and downstream datasets are correct and consistent.metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (10)
173-229
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
488-533
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
1462-1493
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
Line range hint
1644-1649
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
Line range hint
1518-1552
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
1405-1407
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
1464-1492
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
Line range hint
1644-1650
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
Line range hint
1535-1552
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
Line range hint
1518-1535
: Ensure the consistency of field names and types in fine-grained lineages.The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (5)
173-229
: Ensure consistency in fine-grained lineage.The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.
488-533
: Ensure consistency in fine-grained lineage.The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.
1461-1493
: Ensure consistency in fine-grained lineage.The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.
1644-1644
: Ensure the correctness of view logic.The view logic for the
customer_facts
dataset includes a conditional clause. Verify that the condition syntax and logic are correct.
1405-1405
: Ensure the correctness of view logic.The view logic for the
fragment_derived_view
dataset includes a SELECT statement. Verify that the SQL syntax and logic are correct.metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (4)
Line range hint
1405-1493
: Verify the aliasing ofplatform
toaliased_platform
.The field
platform
is aliased toaliased_platform
. Ensure this aliasing is correct and intentional.Verification successful
Verification successful: The aliasing of
platform
toaliased_platform
is correct and intentional.The aliasing of
platform
toaliased_platform
is consistently applied across multiple files in the codebase, indicating that it is deliberate and part of the intended design.
- Files verified:
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the aliasing of `platform` to `aliased_platform` in the codebase. # Test: Search for the usage of `platform` and `aliased_platform` in the codebase. Expect: Only intentional aliasing. rg --type json $'platform' | rg --type json $'aliased_platform'Length of output: 4584
1644-1644
: Verify the liquid condition tag usage.Ensure the liquid condition tag
{% condition order_region %} order.region {% endcondition %}
is used correctly and intentionally.Verification successful
The liquid condition tag
{% condition order_region %} order.region {% endcondition %}
is used consistently and intentionally across multiple test files.
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the liquid condition tag in the codebase. # Test: Search for the usage of the liquid condition tag in the codebase. Expect: Only intentional usages. rg --type json $'{% condition order_region %} order.region {% endcondition %}'Length of output: 215
Script:
#!/bin/bash # Description: Verify the usage of the liquid condition tag in the codebase. # Test: Search for the usage of the liquid condition tag in the codebase. Expect: Only intentional usages. rg --type json '\{% condition order_region %\} order\.region \{% endcondition %\}'Length of output: 2731
488-533
: Verify the mapping ofmeasurement
toaverage_measurement
.The field
measurement
is mapped toaverage_measurement
downstream. Ensure this mapping is correct and intentional.Verification successful
The mapping of
measurement
toaverage_measurement
is intentional and consistent.The field
measurement
is consistently mapped toaverage_measurement
downstream across multiple files, indicating that this is a deliberate design choice.
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
- metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the mapping of `measurement` to `average_measurement` in the codebase. # Test: Search for the usage of `measurement` and `average_measurement` in the codebase. Expect: Only intentional mappings. rg --type json $'measurement' | rg --type json $'average_measurement'Length of output: 7650
Script:
#!/bin/bash # Description: Verify the mapping of `measurement` to `average_measurement` in the codebase by examining context. # Search for the usage of `measurement` in the codebase. rg --type json --context 5 'measurement' # Search for the usage of `average_measurement` in the codebase. rg --type json --context 5 'average_measurement'Length of output: 195286
173-229
: Verify the mapping ofmeasurement
toaverage_measurement
.The field
measurement
is mapped toaverage_measurement
downstream. Ensure this mapping is correct and intentional.metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (3)
162-229
: Ensure consistency in field naming conventions.The field
average_measurement
in the downstream is not consistent with the upstream fieldmeasurement
. This could lead to confusion or errors in lineage tracking.Is this intentional? If not, consider renaming the downstream field to
measurement
to maintain consistency.
477-533
: Ensure consistency in field naming conventions.The field
average_measurement
in the downstream is not consistent with the upstream fieldmeasurement
. This could lead to confusion or errors in lineage tracking.Is this intentional? If not, consider renaming the downstream field to
measurement
to maintain consistency.
Line range hint
1644-1653
: Ensure proper handling of liquid conditions.The view logic includes a liquid condition tag
{% condition order_region %} order.region {% endcondition %}
. Ensure that the liquid condition is correctly parsed and resolved during execution.Is there a mechanism in place to handle liquid conditions in the view logic?
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (2)
1466-1500
: Ensure Consistency in Field NamesThe
fineGrainedLineages
section contains field mappings between upstream and downstream datasets. Ensure that the field names are consistent and correctly mapped.Verification successful
Ensure Consistency in Field Names
The extracted
fineGrainedLineages
section shows consistent field mappings betweenupstreams
anddownstreams
with expected transformations (e.g., aliasing). The field names are correctly mapped, maintaining consistency across the datasets.
- Example Mappings:
upstreams: country
->downstreams: country
upstreams: platform
->downstreams: aliased_platform
upstreams: measurement
->downstreams: average_measurement
The field names in the
fineGrainedLineages
section are consistent and correctly mapped.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of field names in fineGrainedLineages. # Test: Search for field names in the provided JSON file. Expect: Consistent field names across upstream and downstream mappings. rg --type json 'fineGrainedLineages' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.jsonLength of output: 508
Script:
#!/bin/bash # Description: Extract and compare field names in fineGrainedLineages for consistency. # Extract upstream and downstream field names within fineGrainedLineages jq '.[] | select(has("fineGrainedLineages")) | .fineGrainedLineages[] | {upstreams: .upstreams, downstreams: .downstreams}' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.jsonLength of output: 205
Script:
#!/bin/bash # Description: Extract and compare field names in fineGrainedLineages for consistency. # Extract upstream and downstream field names within fineGrainedLineages jq '.. | .fineGrainedLineages? // empty | .[] | {upstreams: .upstreams, downstreams: .downstreams}' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.jsonLength of output: 6540
173-229
: Ensure Consistency in Field NamesThe
fineGrainedLineages
section contains field mappings between upstream and downstream datasets. Ensure that the field names are consistent and correctly mapped.metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (10)
170-170
: Verify the dataset reference.Ensure that the dataset reference
urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD)
is correct and consistent with the rest of the data.
173-229
: Verify the fine-grained lineage information.Ensure that the fine-grained lineage information for fields like
country
,city
,is_latest
,timestamp
, andmeasurement
is correct and follows the expected format.
488-488
: Verify the upstream dataset reference.Ensure that the upstream dataset reference
urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.foo.view.my_view,PROD)
is correct and consistent with the rest of the data.
491-536
: Verify the fine-grained lineage information.Ensure that the fine-grained lineage information for fields like
country
,city
,timestamp
,measurement
, andaverage_measurement
is correct and follows the expected format.
774-774
: Verify the upstream dataset reference.Ensure that the upstream dataset reference
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD)
is correct and consistent with the rest of the data.
1032-1032
: Verify the upstream dataset reference.Ensure that the upstream dataset reference
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD)
is correct and consistent with the rest of the data.
Line range hint
1040-1049
: Verify the fine-grained lineage information.Ensure that the fine-grained lineage information for the
additional_measure
field is correct and follows the expected format.
1219-1219
: Verify the upstream dataset reference.Ensure that the upstream dataset reference
urn:li:dataset:(urn:li:dataPlatform:postgres,..autodetect_sql_name_based_on_view_name,PROD)
is correct and consistent with the rest of the data.
1348-1348
: Verify the upstream dataset reference.Ensure that the upstream dataset reference
urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD)
is correct and consistent with the rest of the data.
1477-1477
: Verify the upstream dataset reference.Ensure that the upstream dataset reference
urn:li:dataset:(urn:li:dataPlatform:postgres,fragment_derived_view,PROD)
is correct and consistent with the rest of the data.
], | ||
"fineGrainedLineages": [ | ||
{ | ||
"upstreamType": "FIELD_SET", | ||
"upstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country)" | ||
], | ||
"downstreamType": "FIELD", | ||
"downstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),country)" | ||
], | ||
"confidenceScore": 1.0 | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistency in field names.
The downstreamType
for the field measurement
should match the upstream field's name. The downstreams
field should use the same field name measurement
instead of average_measurement
, unless this transformation is intentional and documented.
- "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),average_measurement)"
+ "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),measurement)"
Committable suggestion was skipped due to low confidence.
{ | ||
"com.linkedin.pegasus2avro.dataset.UpstreamLineage": { | ||
"upstreams": [ | ||
{ | ||
"auditStamp": { | ||
"time": 1586847600000, | ||
"actor": "urn:li:corpuser:datahub" | ||
}, | ||
"dataset": "urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD)", | ||
"type": "VIEW" | ||
} | ||
], | ||
"fineGrainedLineages": [ | ||
{ | ||
"upstreamType": "FIELD_SET", | ||
"upstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),date)" | ||
], | ||
"downstreamType": "FIELD", | ||
"downstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),date)" | ||
], | ||
"confidenceScore": 1.0 | ||
}, | ||
{ | ||
"upstreamType": "FIELD_SET", | ||
"upstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),platform)" | ||
], | ||
"downstreamType": "FIELD", | ||
"downstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),aliased_platform)" | ||
], | ||
"confidenceScore": 1.0 | ||
}, | ||
{ | ||
"upstreamType": "FIELD_SET", | ||
"upstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),country)" | ||
], | ||
"downstreamType": "FIELD", | ||
"downstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),country)" | ||
], | ||
"confidenceScore": 1.0 | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure completeness of field definitions.
The fields date
, aliased_platform
, and country
are mentioned in the view logic but not defined in the schema metadata. This could lead to incomplete metadata representation.
Ensure that all fields used in the view logic are defined in the schema metadata.
{
"fieldPath": "date",
"nullable": false,
"description": "Date",
"label": "",
"type": {
"type": {
"com.linkedin.pegasus2avro.schema.StringType": {}
}
},
"nativeDataType": "string",
"recursive": false,
"globalTags": {
"tags": []
},
"isPartOfKey": false
},
{
"fieldPath": "aliased_platform",
"nullable": false,
"description": "Platform",
"label": "",
"type": {
"type": {
"com.linkedin.pegasus2avro.schema.StringType": {}
}
},
"nativeDataType": "string",
"recursive": false,
"globalTags": {
"tags": []
},
"isPartOfKey": false
},
{
"fieldPath": "country",
"nullable": false,
"description": "Country",
"label": "",
"type": {
"type": {
"com.linkedin.pegasus2avro.schema.StringType": {}
}
},
"nativeDataType": "string",
"recursive": false,
"globalTags": {
"tags": []
},
"isPartOfKey": false
}
def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]: | ||
filter_value: Optional[str] = cast( | ||
str, context.globals.get(self.filter_name) | ||
) # to silent lint | ||
|
||
if filter_value is None: | ||
raise CustomTagException( | ||
f'filter {self.filter_name} value is not provided for "condition" tag' | ||
) | ||
|
||
filter_value = filter_value.strip() | ||
|
||
buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'") | ||
|
||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential edge cases in the render_to_output method.
Ensure that the filter_value
is properly escaped to prevent SQL injection.
filter_value = filter_value.strip()
+ # Escape single quotes to prevent SQL injection
+ filter_value = filter_value.replace("'", "''")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]: | |
filter_value: Optional[str] = cast( | |
str, context.globals.get(self.filter_name) | |
) # to silent lint | |
if filter_value is None: | |
raise CustomTagException( | |
f'filter {self.filter_name} value is not provided for "condition" tag' | |
) | |
filter_value = filter_value.strip() | |
buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'") | |
return True | |
def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]: | |
filter_value: Optional[str] = cast( | |
str, context.globals.get(self.filter_name) | |
) # to silent lint | |
if filter_value is None: | |
raise CustomTagException( | |
f'filter {self.filter_name} value is not provided for "condition" tag' | |
) | |
filter_value = filter_value.strip() | |
# Escape single quotes to prevent SQL injection | |
filter_value = filter_value.replace("'", "''") | |
buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'") | |
return True |
"aspectName": "subTypes", | ||
"aspect": { | ||
"json": { | ||
"materialized": false, | ||
"viewLogic": "view: owners {\n dimension: id {\n primary_key: yes\n sql: ${TABLE}.id ;;\n }\n dimension: owner_name {\n sql: ${TABLE}.owner_name ;;\n }\n}", | ||
"viewLanguage": "lookml" | ||
"typeNames": [ | ||
"View" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Ensure dataset URNs are updated consistently.
The dataset URN urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.owners,PROD)
is still present in multiple JSON files. Please ensure that all instances of this URN are updated consistently across the codebase.
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
Analysis chain
Ensure dataset URNs are updated consistently.
The dataset URN for lkml_samples.view.owners
should be updated consistently across all aspects.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify all instances of dataset URNs are updated consistently.
# Test: Search for all instances of the old and new dataset URNs. Expect: Only new URNs.
rg --type json $'urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.owners,PROD)'
Length of output: 92
Script:
#!/bin/bash
# Description: Verify all instances of dataset URNs are updated consistently.
# Test: Search for all instances of `lkml_samples.view.owners` URNs in JSON files.
rg --type json 'lkml_samples.view.owners'
Length of output: 10208
def parse(self, stream: TokenStream) -> Node: | ||
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START) | ||
|
||
start_token = stream.current | ||
|
||
stream.next_token() | ||
expect(stream, TOKEN_EXPRESSION) | ||
filter_name: str = stream.current.value.strip() | ||
|
||
stream.next_token() | ||
expect(stream, TOKEN_LITERAL) | ||
|
||
sql_or_lookml_reference: str = stream.current.value.strip() | ||
|
||
stream.next_token() | ||
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END) | ||
|
||
return ConditionNode( | ||
tok=start_token, | ||
sql_or_lookml_reference=sql_or_lookml_reference, | ||
filter_name=filter_name, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper error handling in the parse method.
Add error handling for unexpected tokens to improve robustness.
def parse(self, stream: TokenStream) -> Node:
try:
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)
start_token = stream.current
stream.next_token()
expect(stream, TOKEN_EXPRESSION)
filter_name: str = stream.current.value.strip()
stream.next_token()
expect(stream, TOKEN_LITERAL)
sql_or_lookml_reference: str = stream.current.value.strip()
stream.next_token()
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)
return ConditionNode(
tok=start_token,
sql_or_lookml_reference=sql_or_lookml_reference,
filter_name=filter_name,
)
+ except Exception as e:
+ raise CustomTagException(f"Error parsing condition tag: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def parse(self, stream: TokenStream) -> Node: | |
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START) | |
start_token = stream.current | |
stream.next_token() | |
expect(stream, TOKEN_EXPRESSION) | |
filter_name: str = stream.current.value.strip() | |
stream.next_token() | |
expect(stream, TOKEN_LITERAL) | |
sql_or_lookml_reference: str = stream.current.value.strip() | |
stream.next_token() | |
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END) | |
return ConditionNode( | |
tok=start_token, | |
sql_or_lookml_reference=sql_or_lookml_reference, | |
filter_name=filter_name, | |
) | |
def parse(self, stream: TokenStream) -> Node: | |
try: | |
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START) | |
start_token = stream.current | |
stream.next_token() | |
expect(stream, TOKEN_EXPRESSION) | |
filter_name: str = stream.current.value.strip() | |
stream.next_token() | |
expect(stream, TOKEN_LITERAL) | |
sql_or_lookml_reference: str = stream.current.value.strip() | |
stream.next_token() | |
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END) | |
return ConditionNode( | |
tok=start_token, | |
sql_or_lookml_reference=sql_or_lookml_reference, | |
filter_name=filter_name, | |
) | |
except Exception as e: | |
raise CustomTagException(f"Error parsing condition tag: {e}") |
def _get_generic_definition( | ||
looker_connection: DBConnection, platform: Optional[str] = None | ||
) -> Tuple[str, Optional[str], Optional[str]]: | ||
if platform is None: | ||
# We extract the platform from the dialect name | ||
dialect_name = looker_connection.dialect_name | ||
assert dialect_name is not None | ||
# generally the first part of the dialect name before _ is the name of the platform | ||
# versions are encoded as numbers and can be removed | ||
# e.g. spark1 or hive2 or druid_18 | ||
platform = re.sub(r"[0-9]+", "", dialect_name.split("_")[0]) | ||
|
||
assert ( | ||
platform is not None | ||
), f"Failed to extract a valid platform from connection {looker_connection}" | ||
db = looker_connection.database | ||
schema = looker_connection.schema # ok for this to be None | ||
return (platform, db, schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing Looker connection dialect name.
Ensure that missing dialect names are handled gracefully.
def _get_generic_definition(
looker_connection: DBConnection, platform: Optional[str] = None
) -> Tuple[str, Optional[str], Optional[str]]:
if platform is None:
# We extract the platform from the dialect name
dialect_name = looker_connection.dialect_name
assert dialect_name is not None
# generally the first part of the dialect name before _ is the name of the platform
# versions are encoded as numbers and can be removed
# e.g. spark1 or hive2 or druid_18
platform = re.sub(r"[0-9]+", "", dialect_name.split("_")[0])
assert (
platform is not None
), f"Failed to extract a valid platform from connection {looker_connection}"
db = looker_connection.database
schema = looker_connection.schema # ok for this to be None
return (platform, db, schema)
Committable suggestion was skipped due to low confidence.
def _get_bigquery_definition( | ||
looker_connection: DBConnection, | ||
) -> Tuple[str, Optional[str], Optional[str]]: | ||
platform = "bigquery" | ||
# bigquery project ids are returned in the host field | ||
db = looker_connection.host | ||
schema = looker_connection.database | ||
return (platform, db, schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing Looker connection fields.
Ensure that missing fields in the Looker connection are handled gracefully.
def _get_bigquery_definition(
looker_connection: DBConnection,
) -> Tuple[str, Optional[str], Optional[str]]:
platform = "bigquery"
# bigquery project ids are returned in the host field
db = looker_connection.host
+ if db is None:
+ raise ConfigurationError("BigQuery project ID (host) is missing in the Looker connection.")
schema = looker_connection.database
return (platform, db, schema)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _get_bigquery_definition( | |
looker_connection: DBConnection, | |
) -> Tuple[str, Optional[str], Optional[str]]: | |
platform = "bigquery" | |
# bigquery project ids are returned in the host field | |
db = looker_connection.host | |
schema = looker_connection.database | |
return (platform, db, schema) | |
def _get_bigquery_definition( | |
looker_connection: DBConnection, | |
) -> Tuple[str, Optional[str], Optional[str]]: | |
platform = "bigquery" | |
# bigquery project ids are returned in the host field | |
db = looker_connection.host | |
if db is None: | |
raise ConfigurationError("BigQuery project ID (host) is missing in the Looker connection.") | |
schema = looker_connection.database | |
return (platform, db, schema) |
def _load_viewfile( | ||
self, project_name: str, path: str, reporter: LookMLSourceReport | ||
) -> Optional[LookerViewFile]: | ||
# always fully resolve paths to simplify de-dup | ||
path = str(pathlib.Path(path).resolve()) | ||
allowed_extensions = [_VIEW_FILE_EXTENSION, _EXPLORE_FILE_EXTENSION] | ||
matched_any_extension = [ | ||
match for match in [path.endswith(x) for x in allowed_extensions] if match | ||
] | ||
if not matched_any_extension: | ||
# not a view file | ||
logger.debug( | ||
f"Skipping file {path} because it doesn't appear to be a view file. Matched extensions {allowed_extensions}" | ||
) | ||
return None | ||
|
||
if self.is_view_seen(str(path)): | ||
return self.viewfile_cache[path] | ||
|
||
try: | ||
with open(path) as file: | ||
raw_file_content = file.read() | ||
except Exception as e: | ||
logger.debug(f"An error occurred while reading path {path}", exc_info=True) | ||
self.reporter.report_failure( | ||
path, f"failed to load view file {path} from disk: {e}" | ||
) | ||
return None | ||
try: | ||
logger.debug(f"Loading viewfile {path}") | ||
|
||
parsed = load_lkml(path) | ||
|
||
resolve_liquid_variable_in_view_dict( | ||
raw_view=parsed, | ||
liquid_variable=self.liquid_variable, | ||
) | ||
|
||
looker_viewfile = LookerViewFile.from_looker_dict( | ||
absolute_file_path=path, | ||
looker_view_file_dict=parsed, | ||
project_name=project_name, | ||
root_project_name=self._root_project_name, | ||
base_projects_folder=self._base_projects_folder, | ||
raw_file_content=raw_file_content, | ||
reporter=reporter, | ||
) | ||
logger.debug(f"adding viewfile for path {path} to the cache") | ||
self.viewfile_cache[path] = looker_viewfile | ||
return looker_viewfile | ||
except Exception as e: | ||
logger.debug(f"An error occurred while parsing path {path}", exc_info=True) | ||
self.reporter.report_failure(path, f"failed to load view file {path}: {e}") | ||
return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling in the _load_viewfile method.
Ensure that the method handles file reading and parsing errors gracefully.
def _load_viewfile(
self, project_name: str, path: str, reporter: LookMLSourceReport
) -> Optional[LookerViewFile]:
# always fully resolve paths to simplify de-dup
path = str(pathlib.Path(path).resolve())
allowed_extensions = [_VIEW_FILE_EXTENSION, _EXPLORE_FILE_EXTENSION]
matched_any_extension = [
match for match in [path.endswith(x) for x in allowed_extensions] if match
]
if not matched_any_extension:
# not a view file
logger.debug(
f"Skipping file {path} because it doesn't appear to be a view file. Matched extensions {allowed_extensions}"
)
return None
if self.is_view_seen(str(path)):
return self.viewfile_cache[path]
try:
with open(path) as file:
raw_file_content = file.read()
except Exception as e:
logger.debug(f"An error occurred while reading path {path}", exc_info=True)
self.reporter.report_failure(
path, f"failed to load view file {path} from disk: {e}"
)
return None
try:
logger.debug(f"Loading viewfile {path}")
parsed = load_lkml(path)
resolve_liquid_variable_in_view_dict(
raw_view=parsed,
liquid_variable=self.liquid_variable,
)
looker_viewfile = LookerViewFile.from_looker_dict(
absolute_file_path=path,
looker_view_file_dict=parsed,
project_name=project_name,
root_project_name=self._root_project_name,
base_projects_folder=self._base_projects_folder,
raw_file_content=raw_file_content,
reporter=reporter,
)
logger.debug(f"adding viewfile for path {path} to the cache")
self.viewfile_cache[path] = looker_viewfile
return looker_viewfile
except Exception as e:
logger.debug(f"An error occurred while parsing path {path}", exc_info=True)
self.reporter.report_failure(path, f"failed to load view file {path}: {e}")
return None
Committable suggestion was skipped due to low confidence.
continue | ||
elif inc.startswith("/"): | ||
glob_expr = f"{resolved_project_folder}{inc}" | ||
|
||
# The include path is sometimes '/{project_name}/{path_within_project}' | ||
# instead of '//{project_name}/{path_within_project}' or '/{path_within_project}'. | ||
# | ||
# TODO: I can't seem to find any documentation on this pattern, but we definitely | ||
# have seen it in the wild. Example from Mozilla's public looker-hub repo: | ||
# https://github.com/mozilla/looker-hub/blob/f491ca51ce1add87c338e6723fd49bc6ae4015ca/fenix/explores/activation.explore.lkml#L7 | ||
# As such, we try to handle it but are as defensive as possible. | ||
|
||
non_base_project_name = project_name | ||
if project_name == _BASE_PROJECT_NAME and root_project_name is not None: | ||
non_base_project_name = root_project_name | ||
if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith( | ||
f"/{non_base_project_name}/" | ||
): | ||
# This might be a local include. Let's make sure that '/{project_name}' doesn't | ||
# exist as normal include in the project. | ||
if not pathlib.Path( | ||
f"{resolved_project_folder}/{non_base_project_name}" | ||
).exists(): | ||
path_within_project = pathlib.Path(*pathlib.Path(inc).parts[2:]) | ||
glob_expr = f"{resolved_project_folder}/{path_within_project}" | ||
else: | ||
# Need to handle a relative path. | ||
glob_expr = str(pathlib.Path(path).parent / inc) | ||
# "**" matches an arbitrary number of directories in LookML | ||
# we also resolve these paths to absolute paths so we can de-dup effectively later on | ||
included_files = [ | ||
str(p.resolve()) | ||
for p in [ | ||
pathlib.Path(p) | ||
for p in sorted( | ||
glob.glob(glob_expr, recursive=True) | ||
+ glob.glob(f"{glob_expr}.lkml", recursive=True) | ||
) | ||
] | ||
# We don't want to match directories. The '**' glob can be used to | ||
# recurse into directories. | ||
if p.is_file() | ||
] | ||
logger.debug( | ||
f"traversal_path={traversal_path}, included_files = {included_files}, seen_so_far: {seen_so_far}" | ||
) | ||
if "*" not in inc and not included_files: | ||
reporter.report_failure(path, f"cannot resolve include {inc}") | ||
elif not included_files: | ||
reporter.report_failure( | ||
path, f"did not resolve anything for wildcard include {inc}" | ||
) | ||
# only load files that we haven't seen so far | ||
included_files = [x for x in included_files if x not in seen_so_far] | ||
for included_file in included_files: | ||
# Filter out dashboards - we get those through the looker source. | ||
if ( | ||
included_file.endswith(".dashboard") | ||
or included_file.endswith(".dashboard.lookml") | ||
or included_file.endswith(".dashboard.lkml") | ||
): | ||
logger.debug( | ||
f"include '{included_file}' is a dashboard, skipping it" | ||
) | ||
continue | ||
|
||
logger.debug( | ||
f"Will be loading {included_file}, traversed here via {traversal_path}" | ||
) | ||
try: | ||
parsed = load_lkml(included_file) | ||
seen_so_far.add(included_file) | ||
if "includes" in parsed: # we have more includes to resolve! | ||
resolved.extend( | ||
LookerModel.resolve_includes( | ||
parsed["includes"], | ||
resolved_project_name, | ||
root_project_name, | ||
base_projects_folder, | ||
included_file, | ||
reporter, | ||
seen_so_far, | ||
traversal_path=traversal_path | ||
+ "." | ||
+ pathlib.Path(included_file).stem, | ||
) | ||
) | ||
except Exception as e: | ||
reporter.report_warning( | ||
path, f"Failed to load {included_file} due to {e}" | ||
) | ||
# continue in this case, as it might be better to load and resolve whatever we can | ||
|
||
resolved.extend( | ||
[ | ||
ProjectInclude(project=resolved_project_name, include=f) | ||
for f in included_files | ||
] | ||
) | ||
return resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Consider simplifying nested if statements.
The resolve_includes
method is well-structured and handles different include patterns effectively. Consider simplifying nested if statements for readability.
- if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith(
- f"/{non_base_project_name}/"
- ):
+ if (non_base_project_name != _BASE_PROJECT_NAME and
+ inc.startswith(f"/{non_base_project_name}/")):
Committable suggestion was skipped due to low confidence.
Tools
Ruff
145-152: Use a single
if
statement instead of nestedif
statements(SIM102)
"aspectName": "viewProperties", | ||
"aspect": { | ||
"json": { | ||
"materialized": false, | ||
"viewLogic": "view: employee_tax_report {\n sql_table_name: data-warehouse.finance.form-16;;\n\n dimension: id {\n type: number\n sql: ${TABLE}.id;;\n }\n\n dimension: name {\n type: string\n sql: ${TABLE}.name;;\n }\n\n measure: taxable_income {\n type: sum\n sql: ${TABLE}.tax;;\n }\n}\n", | ||
"viewLanguage": "lookml" | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Ensure viewProperties aspect is correctly defined.
The placeholders and structure of the SQL statements in the viewLogic field are incorrect. The placeholders should follow the format ${TABLE}.<column_name>
, but they do not match the expected pattern.
- Incorrect structure or placeholder in:
${TABLE}.id
- Incorrect structure or placeholder in:
${TABLE}.name
- Incorrect structure or placeholder in:
${TABLE}.tax
Please correct these placeholders to ensure the SQL statements are valid.
Analysis chain
Ensure viewProperties aspect is correctly defined.
The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Validate the SQL in the viewLogic field.
# Test: Extract and validate the SQL.
echo 'view: employee_tax_report {
sql_table_name: data-warehouse.finance.form-16;;
dimension: id {
type: number
sql: ${TABLE}.id;;
}
dimension: name {
type: string
sql: ${TABLE}.name;;
}
measure: taxable_income {
type: sum
sql: ${TABLE}.tax;;
}
}' | sqlglot transpile --dialect lookml
Length of output: 409
Script:
#!/bin/bash
# Description: Validate the SQL in the viewLogic field using sqlite3.
# Extract the SQL statements from the viewLogic field.
sql_statements=$(echo 'view: employee_tax_report {
sql_table_name: data-warehouse.finance.form-16;;
dimension: id {
type: number
sql: ${TABLE}.id;;
}
dimension: name {
type: string
sql: ${TABLE}.name;;
}
measure: taxable_income {
type: sum
sql: ${TABLE}.tax;;
}
}' | grep -oP '(?<=sql: ).*(?=;;)')
# Validate each SQL statement using sqlite3.
for sql in $sql_statements; do
echo "Validating SQL: $sql"
echo "$sql;" | sqlite3 :memory: 2>&1
done
Length of output: 1030
Script:
#!/bin/bash
# Description: Check the placeholders and structure of the SQL statements in the viewLogic field.
# Extract the SQL statements from the viewLogic field.
sql_statements=$(echo 'view: employee_tax_report {
sql_table_name: data-warehouse.finance.form-16;;
dimension: id {
type: number
sql: ${TABLE}.id;;
}
dimension: name {
type: string
sql: ${TABLE}.name;;
}
measure: taxable_income {
type: sum
sql: ${TABLE}.tax;;
}
}' | grep -oP '(?<=sql: ).*(?=;;)')
# Check the placeholders and structure of each SQL statement.
for sql in $sql_statements; do
echo "Checking SQL structure: $sql"
if [[ $sql =~ \${TABLE}\.[a-zA-Z_]+ ]]; then
echo "Placeholder and structure are correct for: $sql"
else
echo "Incorrect structure or placeholder in: $sql"
fi
done
Length of output: 1201
@@ -166,6 +166,7 @@ | |||
"sql-metadata==2.2.2", | |||
*sqllineage_lib, | |||
"GitPython>2", | |||
"python-liquid", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we remove dependencies sql-metadata and sqllineage_lib now that sqlglot parser is used ?
- Does sqlglot_lib need to be added to looker_common libraries instead of lookml only, unless the sqlglot imports are strictly separated into lookml only files ?
view_name: Optional[str] = ( | ||
self.explore.name | ||
if self.field.original_view is not None | ||
else self.field.original_view | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. Else would always be None ? Am I missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a couple comments about golden file changes
{ | ||
"upstreamType": "FIELD_SET", | ||
"upstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:postgres,order,PROD),customer_id)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this lineage disappear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same reason as mentioned below.
"time": 1586847600000, | ||
"actor": "urn:li:corpuser:datahub" | ||
}, | ||
"dataset": "urn:li:dataset:(urn:li:dataPlatform:bigquery,project-foo.default-db.order,PROD)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to this lineage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These was disappear because of condition tag ({% condition order_region %} order.region {% endcondition %}
) in liquid.view.lkml
, It is not getting parsed by sqlgalot parser. I added liquid_variable
in existing configuration so that this tag resolved while resolving liquid templates. Now the lineage appeared in golden file
2. Fix the existing golden files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Outside diff range and nitpick comments (3)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3)
Line range hint
856-915
: Consider breaking down the function into smaller methods.The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.
Line range hint
916-1156
: Consider breaking down the function into smaller methods.The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.
Line range hint
1157-1306
: Consider breaking down the function into smaller methods.The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (23)
- metadata-ingestion/setup.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_view_id_cache.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (22 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
- metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (6 hunks)
- metadata-ingestion/tests/integration/lookml/expected_output.json (19 hunks)
- metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (6 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (6 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (9 hunks)
- metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (20 hunks)
- metadata-ingestion/tests/integration/lookml/test_lookml.py (5 hunks)
Files not summarized due to errors (1)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (14)
- metadata-ingestion/setup.py
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py
- metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json
- metadata-ingestion/tests/integration/lookml/expected_output.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
- metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
409-412: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
592-595: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (53)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_view_id_cache.py (3)
17-30
: LGTM!The function
determine_view_file_path
correctly determines the file path and includes appropriate logging for debugging.
33-77
: LGTM!The class
LookerViewIdCache
is correctly initialized with necessary attributes.
78-120
: LGTM!The method
get_looker_view_id
correctly retrieves the Looker view ID with appropriate logging and error handling.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (11)
18-62
: LGTM!The class
LookerRefinementResolver
is correctly initialized with necessary attributes.
63-65
: LGTM!The method
is_refinement
correctly checks if a view name is a refinement.
68-94
: LGTM!The method
merge_column
correctly merges columns from the original and refinement dictionaries.
97-105
: LGTM!The method
merge_and_set_column
correctly merges columns and sets the result in the new raw view.
107-132
: LGTM!The method
merge_refinements
correctly merges refinements into the raw view and handles additive parameters.
134-146
: LGTM!The method
get_refinements
correctly retrieves refinements from the views based on the view name.
148-166
: LGTM!The method
get_refinement_from_model_includes
correctly retrieves refinements from the model includes and handles missing view files.
168-175
: LGTM!The method
should_skip_processing
correctly checks if processing should be skipped based on the view name and source configuration.
177-202
: LGTM!The method
apply_view_refinement
correctly applies refinements to a view and handles caching.
205-222
: LGTM!The method
add_extended_explore
correctly adds extended explores to the raw explore.
223-251
: LGTM!The method
apply_explore_refinement
correctly applies refinements to an explore and handles caching.metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py (3)
147-155
: LGTM!The function
_get_bigquery_definition
correctly retrieves the BigQuery connection definition.
157-175
: LGTM!The function
_get_generic_definition
correctly retrieves the generic connection definition and handles platform extraction from the dialect name.
177-220
: LGTM!The class
LookerConnectionDefinition
is correctly initialized with necessary attributes, and the methods handle validation and creation of connection definitions.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (10)
24-54
: LGTM!The class
LookerFieldContext
is correctly initialized with necessary attributes, and the methods handle field context operations.
57-164
: LGTM!The class
LookerViewContext
is correctly initialized with necessary attributes, and the methods handle view context operations.
192-217
: LGTM!The method
resolve_extends_view_name
correctly resolves the extends view name and handles missing views with appropriate logging.
219-249
: LGTM!The method
get_including_extends
correctly retrieves the field from the current view or the extended view.
251-253
: LGTM!The method
_get_sql_table_name_field
correctly retrieves the SQL table name field.
254-263
: LGTM!The method
_is_dot_sql_table_name_present
correctly checks if the SQL table name contains a dot.
265-277
: LGTM!The method
sql_table_name
correctly retrieves the SQL table name and handles special cases.
279-287
: LGTM!The method
derived_table
correctly retrieves the derived table and handles missing tables with assertions.
289-297
: LGTM!The method
explore_source
correctly retrieves the explore source and handles missing sources with assertions.
299-322
: LGTM!The method
sql
correctly retrieves the SQL query and handles transformations.metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (7)
170-170
: Update dataset URN topostgres
.The dataset URN has been updated from
conn
topostgres
. Ensure this change is consistent with the intended data platform.
178-178
: Update schema field URN topostgres
.The schema field URN has been updated from
conn
topostgres
. Verify that this change aligns with the data platform schema.
189-189
: Update schema field URN topostgres
.The schema field URN has been updated from
conn
topostgres
. Ensure this change is consistent with the intended data platform.
200-200
: Update schema field URN topostgres
.The schema field URN has been updated from
conn
topostgres
. Verify that this change aligns with the data platform schema.
211-211
: Update schema field URN topostgres
.The schema field URN has been updated from
conn
topostgres
. Ensure this change is consistent with the intended data platform.
222-222
: Update schema field URN topostgres
.The schema field URN has been updated from
conn
topostgres
. Verify that this change aligns with the data platform schema.
233-233
: Update schema field URN topostgres
.The schema field URN has been updated from
conn
topostgres
. Ensure this change is consistent with the intended data platform.metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (8)
39-43
: LGTM!The
is_derived_view
function correctly checks if a view name contains theDERIVED_VIEW_SUFFIX
.
46-66
: LGTM! But verify edge cases.The
get_derived_looker_view_id
function appears correct. Ensure that edge cases for regex and string manipulation are handled properly.
69-95
: LGTM! But verify edge cases.The
resolve_derived_view_urn_of_col_ref
function appears correct. Ensure that all potential edge cases are handled properly.
98-124
: LGTM! But verify edge cases.The
fix_derived_view_urn
function appears correct. Ensure that all potential edge cases are handled properly.
153-196
: LGTM! But verify edge cases.The
_generate_fully_qualified_name
function appears correct. Ensure that all potential edge cases are handled properly.
235-367
: LGTM! But verify edge cases.The
SqlBasedDerivedViewUpstream
class appears correct. Ensure that all potential edge cases are handled properly.
372-454
: LGTM! But verify edge cases.The
NativeDerivedViewUpstream
class appears correct. Ensure that all potential edge cases are handled properly.
581-633
: LGTM! But verify edge cases.The
create_view_upstream
function appears correct. Ensure that all potential edge cases are handled properly.metadata-ingestion/tests/integration/lookml/test_lookml.py (6)
71-73
: Comments approved.The comments provide useful context for the configuration.
901-927
: New test function approved.The new test function
test_view_to_view_lineage_and_liquid_template
validates the new configuration optionliquid_variable
.
937-1010
: New test function approved.The new test function
test_special_liquid_variables
validates the behavior of special liquid variables.
47-53
: Ensure the new configuration is tested.The new configuration option
liquid_variable
should be validated in the test cases to ensure it works as expected.Verification successful
The new configuration option
liquid_variable
is being tested.The search results show that the
liquid_variable
configuration is present and being validated within the test filemetadata-ingestion/tests/integration/lookml/test_lookml.py
.
metadata-ingestion/tests/integration/lookml/test_lookml.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the new configuration option `liquid_variable`. # Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence. rg --type python $'liquid_variable'Length of output: 65
Script:
#!/bin/bash # Description: Verify the usage of the new configuration option `liquid_variable`. # Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence. rg --glob '*.py' 'liquid_variable'Length of output: 6154
15-18
: Ensure the new imports are used.The imports for
SpecialVariable
andresolve_liquid_variable
are new. Ensure these are used in the test cases to avoid unnecessary imports.Verification successful
The new imports are used.
The imports for
SpecialVariable
andresolve_liquid_variable
are utilized in the test cases, confirming their necessity.
metadata-ingestion/tests/integration/lookml/test_lookml.py
:
SpecialVariable
is used on lines 15-18, 496-498.resolve_liquid_variable
is used on lines 15-18, 496-498.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of newly imported entities. # Test: Search for the usage of imported entities. Expect: At least one occurrence of each import. rg --type python $'SpecialVariable|resolve_liquid_variable'Length of output: 91
Script:
#!/bin/bash # Description: Verify the usage of newly imported entities. # Test: Search for the usage of imported entities. Expect: At least one occurrence of each import. rg 'SpecialVariable|resolve_liquid_variable' --glob '*.py'Length of output: 1816
496-498
: Ensure the new configuration is tested.The new configuration option
liquid_variable
should be validated in the test cases to ensure it works as expected.Verification successful
Let's correct the command and re-run the search for
liquid_variable
in Python test files to verify if this new configuration is being tested.
The new configuration is tested.
The new configuration option
liquid_variable
is validated in the test cases, particularly in themetadata-ingestion/tests/integration/lookml/test_lookml.py
file. Thetest_special_liquid_variables
function specifically tests this configuration.
metadata-ingestion/tests/integration/lookml/test_lookml.py
- Function:
test_special_liquid_variables
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the new configuration option `liquid_variable`. # Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence. rg --type python $'liquid_variable'Length of output: 65
Script:
#!/bin/bash # Description: Verify the usage of the new configuration option `liquid_variable`. # Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence. rg -t py 'liquid_variable'Length of output: 6146
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1)
774-774
: LGTM!The code changes are approved.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (4)
131-157
: Add detailed comments to explain the deduplication criteria.The function's logic is clear, but adding detailed comments will make it more understandable for future maintainers.
+ # Create a list of field names that are of type DIMENSION_GROUP
301-335
: Add type hints for dictionary keys.Adding type hints for the dictionary keys will improve readability and maintainability.
- field_dict: Dict, - upstream_column_ref: List[ColumnRef], - type_cls: ViewFieldType, - populate_sql_logic_in_descriptions: bool, + field_dict: Dict[str, Any], + upstream_column_ref: List[ColumnRef], + type_cls: ViewFieldType, + populate_sql_logic_in_descriptions: bool,
343-404
: Verify the correctness of theelse
statement.The
else
statement might always beNone
. Verify if this is the intended behavior.
406-459
: Add detailed comments to explain the conditions.The function's logic is clear, but adding detailed comments will make it more understandable for future maintainers.
def find_view_from_resolved_includes( | ||
connection: Optional[LookerConnectionDefinition], | ||
resolved_includes: List["ProjectInclude"], | ||
looker_viewfile_loader: LookerViewFileLoader, | ||
target_view_name: str, | ||
reporter: LookMLSourceReport, | ||
) -> Optional[Tuple["ProjectInclude", dict]]: | ||
# It could live in one of the included files. We do not know which file the base view | ||
# lives in, so we try them all! | ||
for include in resolved_includes: | ||
included_looker_viewfile = looker_viewfile_loader.load_viewfile( | ||
include.include, | ||
include.project, | ||
connection, | ||
reporter, | ||
) | ||
if not included_looker_viewfile: | ||
continue | ||
for raw_view in included_looker_viewfile.views: | ||
raw_view_name = raw_view["name"] | ||
# Make sure to skip loading view we are currently trying to resolve | ||
if raw_view_name == target_view_name: | ||
return include, raw_view | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor for early returns to reduce nesting.
The function can be refactored to use early returns, which will improve readability and reduce nesting.
- for include in resolved_includes:
- included_looker_viewfile = looker_viewfile_loader.load_viewfile(
- include.include,
- include.project,
- connection,
- reporter,
- )
- if not included_looker_viewfile:
- continue
- for raw_view in included_looker_viewfile.views:
- raw_view_name = raw_view["name"]
- if raw_view_name == target_view_name:
- return include, raw_view
+ for include in resolved_includes:
+ included_looker_viewfile = looker_viewfile_loader.load_viewfile(
+ include.include,
+ include.project,
+ connection,
+ reporter,
+ )
+ if included_looker_viewfile:
+ for raw_view in included_looker_viewfile.views:
+ raw_view_name = raw_view["name"]
+ if raw_view_name == target_view_name:
+ return include, raw_view
return None
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def find_view_from_resolved_includes( | |
connection: Optional[LookerConnectionDefinition], | |
resolved_includes: List["ProjectInclude"], | |
looker_viewfile_loader: LookerViewFileLoader, | |
target_view_name: str, | |
reporter: LookMLSourceReport, | |
) -> Optional[Tuple["ProjectInclude", dict]]: | |
# It could live in one of the included files. We do not know which file the base view | |
# lives in, so we try them all! | |
for include in resolved_includes: | |
included_looker_viewfile = looker_viewfile_loader.load_viewfile( | |
include.include, | |
include.project, | |
connection, | |
reporter, | |
) | |
if not included_looker_viewfile: | |
continue | |
for raw_view in included_looker_viewfile.views: | |
raw_view_name = raw_view["name"] | |
# Make sure to skip loading view we are currently trying to resolve | |
if raw_view_name == target_view_name: | |
return include, raw_view | |
return None | |
def find_view_from_resolved_includes( | |
connection: Optional[LookerConnectionDefinition], | |
resolved_includes: List["ProjectInclude"], | |
looker_viewfile_loader: LookerViewFileLoader, | |
target_view_name: str, | |
reporter: LookMLSourceReport, | |
) -> Optional[Tuple["ProjectInclude", dict]]: | |
# It could live in one of the included files. We do not know which file the base view | |
# lives in, so we try them all! | |
for include in resolved_includes: | |
included_looker_viewfile = looker_viewfile_loader.load_viewfile( | |
include.include, | |
include.project, | |
connection, | |
reporter, | |
) | |
if included_looker_viewfile: | |
for raw_view in included_looker_viewfile.views: | |
raw_view_name = raw_view["name"] | |
# Make sure to skip loading view we are currently trying to resolve | |
if raw_view_name == target_view_name: | |
return include, raw_view | |
return None |
"""Returns a fully qualified dataset name, resolved through a connection definition. | ||
Input sql_table_name can be in three forms: table, db.table, db.schema.table""" | ||
# TODO: This function should be extracted out into a Platform specific naming class since name translations | ||
# are required across all connectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: Consider refactoring the name resolution logic.
The TODO comment suggests extracting the function into a platform-specific naming class. This can improve modularity and maintainability.
Do you want me to refactor this function into a platform-specific naming class or open a GitHub issue to track this task?
|
||
# Parse SQL to extract dependencies. | ||
if parse_table_names_from_sql: | ||
( | ||
fields, | ||
sql_table_names, | ||
) = cls._extract_metadata_from_derived_table_sql( | ||
reporter, | ||
sql_parser_path, | ||
view_name, | ||
sql_table_name, | ||
view_logic, | ||
fields, | ||
use_external_process=process_isolation_for_sql_parsing, | ||
) | ||
view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length] | ||
|
||
elif "explore_source" in derived_table: | ||
# This is called a "native derived table". | ||
# See https://cloud.google.com/looker/docs/creating-ndts. | ||
explore_source = derived_table["explore_source"] | ||
|
||
# We want this to render the full lkml block | ||
# e.g. explore_source: source_name { ... } | ||
# As such, we use the full derived_table instead of the explore_source. | ||
view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length] | ||
view_lang = VIEW_LANGUAGE_LOOKML | ||
|
||
( | ||
fields, | ||
upstream_explores, | ||
) = cls._extract_metadata_from_derived_table_explore( | ||
reporter, view_name, explore_source, fields | ||
) | ||
if view_context.is_sql_based_derived_case(): | ||
view_logic = view_context.sql(transformed=False) | ||
# Parse SQL to extract dependencies. | ||
view_details = ViewProperties( | ||
materialized=False, | ||
viewLogic=view_logic, | ||
viewLanguage=VIEW_LANGUAGE_SQL, | ||
) | ||
elif view_context.is_native_derived_case(): | ||
# We want this to render the full lkml block | ||
# e.g. explore_source: source_name { ... } | ||
# As such, we use the full derived_table instead of the explore_source. | ||
view_logic = str(lkml.dump(view_context.derived_table()))[ | ||
:max_file_snippet_length | ||
] | ||
view_lang = VIEW_LANGUAGE_LOOKML | ||
|
||
materialized = False | ||
for k in derived_table: | ||
if k in ["datagroup_trigger", "sql_trigger_value", "persist_for"]: | ||
materialized = True | ||
if "materialized_view" in derived_table: | ||
materialized = derived_table["materialized_view"] == "yes" | ||
materialized = view_context.is_materialized_derived_view() | ||
|
||
view_details = ViewProperties( | ||
materialized=materialized, viewLogic=view_logic, viewLanguage=view_lang | ||
) | ||
else: | ||
# If not a derived table, then this view essentially wraps an existing | ||
# object in the database. If sql_table_name is set, there is a single | ||
# dependency in the view, on the sql_table_name. | ||
# Otherwise, default to the view name as per the docs: | ||
# https://docs.looker.com/reference/view-params/sql_table_name-for-view | ||
sql_table_names = ( | ||
[view_name] if sql_table_name is None else [sql_table_name] | ||
) | ||
view_details = ViewProperties( | ||
materialized=False, | ||
viewLogic=view_logic, | ||
viewLanguage=VIEW_LANGUAGE_LOOKML, | ||
) | ||
|
||
file_path = LookerView.determine_view_file_path( | ||
base_folder_path, looker_viewfile.absolute_file_path | ||
) | ||
|
||
return LookerView( | ||
id=LookerViewId( | ||
project_name=project_name, | ||
model_name=model_name, | ||
view_name=view_name, | ||
file_path=file_path, | ||
), | ||
absolute_file_path=looker_viewfile.absolute_file_path, | ||
connection=connection, | ||
sql_table_names=sql_table_names, | ||
upstream_explores=upstream_explores, | ||
fields=fields, | ||
raw_file_content=looker_viewfile.raw_file_content, | ||
id=looker_view_id, | ||
absolute_file_path=view_context.view_file.absolute_file_path, | ||
connection=view_context.view_connection, | ||
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(), | ||
fields=view_fields, | ||
raw_file_content=view_context.view_file.raw_file_content, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider breaking down the from_looker_dict
method.
The method is quite large and handles multiple responsibilities. Breaking it down into smaller methods can improve readability and maintainability.
@classmethod
def from_looker_dict(
cls,
project_name: str,
model_name: str,
view_context: LookerViewContext,
looker_view_id_cache: LookerViewIdCache,
reporter: LookMLSourceReport,
max_file_snippet_length: int,
config: LookMLSourceConfig,
ctx: PipelineContext,
extract_col_level_lineage: bool = False,
populate_sql_logic_in_descriptions: bool = False,
) -> Optional["LookerView"]:
view_name = view_context.name()
logger.debug(f"Handling view {view_name} in model {model_name}")
looker_view_id = cls._create_looker_view_id(project_name, model_name, view_name, view_context)
view_upstream = cls._create_view_upstream(view_context, looker_view_id_cache, config, ctx, reporter)
view_fields = cls._extract_view_fields(view_context, view_upstream, extract_col_level_lineage, populate_sql_logic_in_descriptions)
view_fields = deduplicate_fields(view_fields)
view_details = cls._determine_view_details(view_context, max_file_snippet_length)
return cls._build_looker_view(looker_view_id, view_context, view_upstream, view_fields, view_details)
@classmethod
def _create_looker_view_id(cls, project_name: str, model_name: str, view_name: str, view_context: LookerViewContext) -> LookerViewId:
return LookerViewId(
project_name=project_name,
model_name=model_name,
view_name=view_name,
file_path=view_context.view_file_name(),
)
@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
return create_view_upstream(
view_context=view_context,
looker_view_id_cache=looker_view_id_cache,
config=config,
ctx=ctx,
reporter=reporter,
)
@classmethod
def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]:
field_type_vs_raw_fields = OrderedDict(
{
ViewFieldType.DIMENSION: view_context.dimensions(),
ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
ViewFieldType.MEASURE: view_context.measures(),
}
)
view_fields = []
for field_type, fields in field_type_vs_raw_fields.items():
for field in fields:
upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else []
view_fields.append(
ViewField.view_fields_from_dict(
field_dict=field,
upstream_column_ref=upstream_column_ref,
type_cls=field_type,
populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
)
)
if not view_fields and view_context.is_sql_based_derived_view_without_fields_case():
view_fields = view_upstream.create_fields()
return view_fields
@classmethod
def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties:
view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]
if view_context.is_sql_based_derived_case():
view_logic = view_context.sql(transformed=False)
return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL)
elif view_context.is_native_derived_case():
view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length]
return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
else:
return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
@classmethod
def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView":
return LookerView(
id=looker_view_id,
absolute_file_path=view_context.view_file.absolute_file_path,
connection=view_context.view_connection,
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
fields=view_fields,
raw_file_content=view_context.view_file.raw_file_content,
view_details=view_details,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/setup.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/setup.py
…hub-fork into master+ing-510-lookml-cll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Outside diff range and nitpick comments (1)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1)
39-87
: Consider simplifying nested if statements.The
resolve_includes
method has nested if statements that can be simplified for readability.- if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith( - f"/{non_base_project_name}/" - ): + if (non_base_project_name != _BASE_PROJECT_NAME and + inc.startswith(f"/{non_base_project_name}/")):
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (6)
- metadata-ingestion/setup.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (19 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
Files not summarized due to errors (1)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
Files skipped from review due to trivial changes (1)
- metadata-ingestion/setup.py
Files skipped from review as they are similar to previous changes (2)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py
148-155: Use a single
if
statement instead of nestedif
statements(SIM102)
metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py
40-43: Return the condition
DERIVED_VIEW_SUFFIX in view_name.lower()
directlyReplace with
return DERIVED_VIEW_SUFFIX in view_name.lower()
(SIM103)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
319-322: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
502-505: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (13)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (3)
18-21
: LGTM!The
ProjectInclude
dataclass is well-defined and straightforward.
24-29
: LGTM!The
LookerField
dataclass is well-defined and straightforward.
244-290
: LGTM!The
LookerViewFile
dataclass is well-defined and straightforward.metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (7)
200-231
: LGTM!The
AbstractViewUpstream
class is well-defined and follows the abstract class pattern.
237-372
: LGTM!The
SqlBasedDerivedViewUpstream
class is well-defined and follows the class pattern.
374-456
: LGTM!The
NativeDerivedViewUpstream
class is well-defined and follows the class pattern.
458-511
: LGTM!The
RegularViewUpstream
class is well-defined and follows the class pattern.
513-571
: LGTM!The
DotSqlTableNameViewUpstream
class is well-defined and follows the class pattern.
573-580
: LGTM!The
EmptyImplementation
class is well-defined and straightforward.
583-636
: LGTM!The
create_view_upstream
function is well-defined and follows the factory pattern.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (3)
42-50
: New imports look necessary and relevant.The new imports are necessary for the changes made in the file and align with the updated functionality.
Also applies to: 54-70, 98-98
109-111
: New fieldupstream_dataset_urns
looks good.The new field
upstream_dataset_urns
is necessary for tracking upstream dependencies.
307-307
: Initialization looks good.The initialization of
ctx
andreporter
is necessary and relevant to the changes made.
view_context: LookerViewContext, | ||
looker_view_id_cache: LookerViewIdCache, | ||
reporter: LookMLSourceReport, | ||
max_file_snippet_length: int, | ||
parse_table_names_from_sql: bool = False, | ||
sql_parser_path: str = "datahub.utilities.sql_parser.DefaultSQLParser", | ||
config: LookMLSourceConfig, | ||
ctx: PipelineContext, | ||
extract_col_level_lineage: bool = False, | ||
populate_sql_logic_in_descriptions: bool = False, | ||
process_isolation_for_sql_parsing: bool = False, | ||
) -> Optional["LookerView"]: | ||
view_name = looker_view["name"] | ||
|
||
view_name = view_context.name() | ||
|
||
logger.debug(f"Handling view {view_name} in model {model_name}") | ||
# The sql_table_name might be defined in another view and this view is extending that view, | ||
# so we resolve this field while taking that into account. | ||
sql_table_name: Optional[str] = LookerView.get_including_extends( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvement suggestion: Break down the from_looker_dict
method.
The method is quite large and handles multiple responsibilities. Breaking it down into smaller methods can improve readability and maintainability.
@classmethod
def from_looker_dict(
cls,
project_name: str,
model_name: str,
view_context: LookerViewContext,
looker_view_id_cache: LookerViewIdCache,
reporter: LookMLSourceReport,
max_file_snippet_length: int,
config: LookMLSourceConfig,
ctx: PipelineContext,
extract_col_level_lineage: bool = False,
populate_sql_logic_in_descriptions: bool = False,
) -> Optional["LookerView"]:
view_name = view_context.name()
logger.debug(f"Handling view {view_name} in model {model_name}")
looker_view_id = cls._create_looker_view_id(project_name, model_name, view_name, view_context)
view_upstream = cls._create_view_upstream(view_context, looker_view_id_cache, config, ctx, reporter)
view_fields = cls._extract_view_fields(view_context, view_upstream, extract_col_level_lineage, populate_sql_logic_in_descriptions)
view_fields = deduplicate_fields(view_fields)
view_details = cls._determine_view_details(view_context, max_file_snippet_length)
return cls._build_looker_view(looker_view_id, view_context, view_upstream, view_fields, view_details)
@classmethod
def _create_looker_view_id(cls, project_name: str, model_name: str, view_name: str, view_context: LookerViewContext) -> LookerViewId:
return LookerViewId(
project_name=project_name,
model_name=model_name,
view_name=view_name,
file_path=view_context.view_file_name(),
)
@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
return create_view_upstream(
view_context=view_context,
looker_view_id_cache=looker_view_id_cache,
config=config,
ctx=ctx,
reporter=reporter,
)
@classmethod
def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]:
field_type_vs_raw_fields = OrderedDict(
{
ViewFieldType.DIMENSION: view_context.dimensions(),
ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
ViewFieldType.MEASURE: view_context.measures(),
}
)
view_fields = []
for field_type, fields in field_type_vs_raw_fields.items():
for field in fields:
upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else []
view_fields.append(
ViewField.view_fields_from_dict(
field_dict=field,
upstream_column_ref=upstream_column_ref,
type_cls=field_type,
populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
)
)
if not view_fields and view_context.is_sql_based_derived_view_without_fields_case():
view_fields = view_upstream.create_fields()
return view_fields
@classmethod
def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties:
view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]
if view_context.is_sql_based_derived_case():
view_logic = view_context.sql(transformed=False)
return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL)
elif view_context.is_native_derived_case():
view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length]
return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
else:
return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
@classmethod
def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView":
return LookerView(
id=looker_view_id,
absolute_file_path=view_context.view_file.absolute_file_path,
connection=view_context.view_connection,
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
fields=view_fields,
raw_file_content=view_context.view_file.raw_file_content,
view_details=view_details,
)
project_name=project_name, | ||
model_name=model_name, | ||
view_name=view_name, | ||
looker_view=looker_view, | ||
connection=connection, | ||
looker_viewfile=looker_viewfile, | ||
looker_viewfile_loader=looker_viewfile_loader, | ||
looker_refinement_resolver=looker_refinement_resolver, | ||
field="sql_table_name", | ||
reporter=reporter, | ||
file_path=view_context.view_file_name(), | ||
) | ||
|
||
# Some sql_table_name fields contain quotes like: optimizely."group", just remove the quotes | ||
sql_table_name = ( | ||
sql_table_name.replace('"', "").replace("`", "") | ||
if sql_table_name is not None | ||
else None | ||
) | ||
derived_table = LookerView.get_including_extends( | ||
view_name=view_name, | ||
looker_view=looker_view, | ||
connection=connection, | ||
looker_viewfile=looker_viewfile, | ||
looker_viewfile_loader=looker_viewfile_loader, | ||
looker_refinement_resolver=looker_refinement_resolver, | ||
field="derived_table", | ||
view_upstream: AbstractViewUpstream = create_view_upstream( | ||
view_context=view_context, | ||
looker_view_id_cache=looker_view_id_cache, | ||
config=config, | ||
ctx=ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor suggestion: Extract create_view_upstream
to a separate method.
This part can be moved to a separate method to improve readability.
@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
return create_view_upstream(
view_context=view_context,
looker_view_id_cache=looker_view_id_cache,
config=config,
ctx=ctx,
reporter=reporter,
)
raise ValueError( | ||
"Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key." | ||
"Failed to retrieve connections from looker client. Please check to ensure that you have " | ||
"manage_models permission enabled on this API key." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve exception handling by chaining exceptions.
Use raise ... from err
to distinguish the exception from errors in exception handling.
- raise ValueError(
- "Failed to retrieve connections from looker client. Please check to ensure that you have "
- "manage_models permission enabled on this API key."
- )
+ raise ValueError(
+ "Failed to retrieve connections from looker client. Please check to ensure that you have "
+ "manage_models permission enabled on this API key."
+ ) from err
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
raise ValueError( | |
"Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key." | |
"Failed to retrieve connections from looker client. Please check to ensure that you have " | |
"manage_models permission enabled on this API key." | |
) | |
raise ValueError( | |
"Failed to retrieve connections from looker client. Please check to ensure that you have " | |
"manage_models permission enabled on this API key." | |
) from err |
Tools
Ruff
319-322: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (19 hunks)
Files not summarized due to errors (1)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
502-505: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (4)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (4)
4-4
: New imports added.The newly added imports are necessary for the new functionality introduced in this file. Ensure that these imports are used appropriately in the code.
Also applies to: 42-42, 45-47, 49-53, 54-57, 58-66, 68-70, 98-98
109-109
: New fieldupstream_dataset_urns
added.The field
upstream_dataset_urns
has been added to store the URNs of upstream datasets.
307-307
: New fieldctx
added.The field
ctx
has been added to store the pipeline context.
318-322
: Improve exception handling by chaining exceptions.Use
raise ... from err
to distinguish the exception from errors in exception handling.- raise ValueError( - "Failed to retrieve connections from looker client. Please check to ensure that you have " - "manage_models permission enabled on this API key." - ) + raise ValueError( + "Failed to retrieve connections from looker client. Please check to ensure that you have " + "manage_models permission enabled on this API key." + ) from errLikely invalid or redundant comment.
view_context: LookerViewContext, | ||
looker_view_id_cache: LookerViewIdCache, | ||
reporter: LookMLSourceReport, | ||
max_file_snippet_length: int, | ||
parse_table_names_from_sql: bool = False, | ||
sql_parser_path: str = "datahub.utilities.sql_parser.DefaultSQLParser", | ||
config: LookMLSourceConfig, | ||
ctx: PipelineContext, | ||
extract_col_level_lineage: bool = False, | ||
populate_sql_logic_in_descriptions: bool = False, | ||
process_isolation_for_sql_parsing: bool = False, | ||
) -> Optional["LookerView"]: | ||
view_name = looker_view["name"] | ||
|
||
view_name = view_context.name() | ||
|
||
logger.debug(f"Handling view {view_name} in model {model_name}") | ||
# The sql_table_name might be defined in another view and this view is extending that view, | ||
# so we resolve this field while taking that into account. | ||
sql_table_name: Optional[str] = LookerView.get_including_extends( | ||
|
||
looker_view_id: LookerViewId = LookerViewId( | ||
project_name=project_name, | ||
model_name=model_name, | ||
view_name=view_name, | ||
looker_view=looker_view, | ||
connection=connection, | ||
looker_viewfile=looker_viewfile, | ||
looker_viewfile_loader=looker_viewfile_loader, | ||
looker_refinement_resolver=looker_refinement_resolver, | ||
field="sql_table_name", | ||
reporter=reporter, | ||
file_path=view_context.view_file_name(), | ||
) | ||
|
||
# Some sql_table_name fields contain quotes like: optimizely."group", just remove the quotes | ||
sql_table_name = ( | ||
sql_table_name.replace('"', "").replace("`", "") | ||
if sql_table_name is not None | ||
else None | ||
) | ||
derived_table = LookerView.get_including_extends( | ||
view_name=view_name, | ||
looker_view=looker_view, | ||
connection=connection, | ||
looker_viewfile=looker_viewfile, | ||
looker_viewfile_loader=looker_viewfile_loader, | ||
looker_refinement_resolver=looker_refinement_resolver, | ||
field="derived_table", | ||
view_upstream: AbstractViewUpstream = create_view_upstream( | ||
view_context=view_context, | ||
looker_view_id_cache=looker_view_id_cache, | ||
config=config, | ||
ctx=ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor suggestion: Extract create_view_upstream
to a separate method.
This part can be moved to a separate method to improve readability.
@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
return create_view_upstream(
view_context=view_context,
looker_view_id_cache=looker_view_id_cache,
config=config,
ctx=ctx,
reporter=reporter,
)
field_type_vs_raw_fields = OrderedDict( | ||
{ | ||
ViewFieldType.DIMENSION: view_context.dimensions(), | ||
ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(), | ||
ViewFieldType.MEASURE: view_context.measures(), | ||
} | ||
) # in order to maintain order in golden file | ||
|
||
fields = deduplicate_fields(fields) | ||
view_fields: List[ViewField] = [] | ||
|
||
# Prep "default" values for the view, which will be overridden by the logic below. | ||
view_logic = looker_viewfile.raw_file_content[:max_file_snippet_length] | ||
sql_table_names: List[str] = [] | ||
upstream_explores: List[str] = [] | ||
|
||
if derived_table is not None: | ||
# Derived tables can either be a SQL query or a LookML explore. | ||
# See https://cloud.google.com/looker/docs/derived-tables. | ||
|
||
if "sql" in derived_table: | ||
view_logic = derived_table["sql"] | ||
view_lang = VIEW_LANGUAGE_SQL | ||
|
||
# Parse SQL to extract dependencies. | ||
if parse_table_names_from_sql: | ||
( | ||
fields, | ||
sql_table_names, | ||
) = cls._extract_metadata_from_derived_table_sql( | ||
reporter, | ||
sql_parser_path, | ||
view_name, | ||
sql_table_name, | ||
view_logic, | ||
fields, | ||
use_external_process=process_isolation_for_sql_parsing, | ||
for field_type, fields in field_type_vs_raw_fields.items(): | ||
for field in fields: | ||
upstream_column_ref: List[ColumnRef] = [] | ||
if extract_col_level_lineage: | ||
upstream_column_ref = view_upstream.get_upstream_column_ref( | ||
field_context=LookerFieldContext(raw_field=field) | ||
) | ||
|
||
elif "explore_source" in derived_table: | ||
# This is called a "native derived table". | ||
# See https://cloud.google.com/looker/docs/creating-ndts. | ||
explore_source = derived_table["explore_source"] | ||
|
||
# We want this to render the full lkml block | ||
# e.g. explore_source: source_name { ... } | ||
# As such, we use the full derived_table instead of the explore_source. | ||
view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length] | ||
view_lang = VIEW_LANGUAGE_LOOKML | ||
|
||
( | ||
fields, | ||
upstream_explores, | ||
) = cls._extract_metadata_from_derived_table_explore( | ||
reporter, view_name, explore_source, fields | ||
view_fields.append( | ||
ViewField.view_fields_from_dict( | ||
field_dict=field, | ||
upstream_column_ref=upstream_column_ref, | ||
type_cls=field_type, | ||
populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor suggestion: Extract field extraction logic to a separate method.
This part can be moved to a separate method to improve readability.
@classmethod
def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]:
field_type_vs_raw_fields = OrderedDict(
{
ViewFieldType.DIMENSION: view_context.dimensions(),
ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
ViewFieldType.MEASURE: view_context.measures(),
}
)
view_fields = []
for field_type, fields in field_type_vs_raw_fields.items():
for field in fields:
upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else []
view_fields.append(
ViewField.view_fields_from_dict(
field_dict=field,
upstream_column_ref=upstream_column_ref,
type_cls=field_type,
populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
)
)
if not view_fields and view_context.is_sql_based_derived_view_without_fields_case():
view_fields = view_upstream.create_fields()
return view_fields
if view_context.is_sql_based_derived_case(): | ||
view_logic = view_context.sql(transformed=False) | ||
# Parse SQL to extract dependencies. | ||
view_details = ViewProperties( | ||
materialized=False, | ||
viewLogic=view_logic, | ||
viewLanguage=VIEW_LANGUAGE_SQL, | ||
) | ||
elif view_context.is_native_derived_case(): | ||
# We want this to render the full lkml block | ||
# e.g. explore_source: source_name { ... } | ||
# As such, we use the full derived_table instead of the explore_source. | ||
view_logic = str(lkml.dump(view_context.derived_table()))[ | ||
:max_file_snippet_length | ||
] | ||
view_lang = VIEW_LANGUAGE_LOOKML | ||
|
||
materialized = view_context.is_materialized_derived_view() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor suggestion: Extract view details determination logic to a separate method.
This part can be moved to a separate method to improve readability.
@classmethod
def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties:
view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]
if view_context.is_sql_based_derived_case():
view_logic = view_context.sql(transformed=False)
return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL)
elif view_context.is_native_derived_case():
view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length]
return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
else:
return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
id=looker_view_id, | ||
absolute_file_path=view_context.view_file.absolute_file_path, | ||
connection=view_context.view_connection, | ||
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(), | ||
fields=view_fields, | ||
raw_file_content=view_context.view_file.raw_file_content, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor suggestion: Extract LookerView construction to a separate method.
This part can be moved to a separate method to improve readability.
@classmethod
def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView":
return LookerView(
id=looker_view_id,
absolute_file_path=view_context.view_file.absolute_file_path,
connection=view_context.view_connection,
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
fields=view_fields,
raw_file_content=view_context.view_file.raw_file_content,
view_details=view_details,
)
raise ValueError( | ||
f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file" | ||
f"Could not locate a project name for model {model_name}. Consider configuring a static project name " | ||
f"in your config file" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve exception handling by chaining exceptions.
Use raise ... from err
to distinguish the exception from errors in exception handling.
- raise ValueError(
- f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
- f"in your config file"
- )
+ raise ValueError(
+ f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
+ f"in your config file"
+ ) from err
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
raise ValueError( | |
f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file" | |
f"Could not locate a project name for model {model_name}. Consider configuring a static project name " | |
f"in your config file" | |
) | |
raise ValueError( | |
f"Could not locate a project name for model {model_name}. Consider configuring a static project name " | |
f"in your config file" | |
) from err |
Tools
Ruff
502-505: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/setup.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/setup.py
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <john@acryl.io> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <john@acryl.io> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <john@acryl.io> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <john@acryl.io> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <john@Johns-MBP.lan> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: John Joyce <john@Johns-MBP.lan> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro@acryl.io> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <pedro@acryl.io> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <john@acryl.io> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <jeff@wireform.io> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: John Joyce <john@acryl.io> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <chriscollins3456@gmail.com> Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com> Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com> Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> Co-authored-by: Kevin Chun <kevin1chun@gmail.com> Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com> Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com> Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com> Co-authored-by: Hendrik Richert <github@richert.li> Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com> Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com> Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <chris.burroughs@gmail.com> Co-authored-by: ksrinath <ksrinath@users.noreply.github.com> Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com> Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com> Co-authored-by: Tamas Nemeth <treff7es@gmail.com> Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com> Co-authored-by: haeniya <yanik.haeni@gmail.com> Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com> Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com> Co-authored-by: noggi <anton.kuraev@acryl.io> Co-authored-by: Nicholas Pena <npena@foursquare.com> Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com> Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com> Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br> Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com> Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com> Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com> Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com> Co-authored-by: Tristan Heisler <tristankheisler@gmail.com> Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io> Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io> Co-authored-by: Pedro Silva <pedro@acryl.io> Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com> Co-authored-by: Sam Black <sam.black@acryl.io> Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com> Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu> Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com> Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com> Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com> Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com> Co-authored-by: Jeff Merrick <jeff@wireform.io> Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io> Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com> Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com> Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> Co-authored-by: David Leifker <david.leifker@acryl.io>
Summary by CodeRabbit
New Features
Chores
"python-liquid"
andsqlglot_lib
for LookML support.