-
Notifications
You must be signed in to change notification settings - Fork 87
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added Py4j implementation of tables crawler to retrieve a list of HMS…
… tables in the assessment workflow (#2579) ## Changes Tests, fixes and uses the new FasterTableScanCrawler in the Assessment Job. The feature flag will determine whether the assessment uses the old scala code or the new python (using py4j) for better logging during table scans. ### Linked issues Fix #2190 ### Functionality - [x] modified existing workflow: `assesment.crawl_tables` now uses new py4j crawler over the scala one ### Tests - [x] added integration tests --------- Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>
- Loading branch information
1 parent
0b9c4d9
commit 143c637
Showing
10 changed files
with
303 additions
and
218 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
59 changes: 59 additions & 0 deletions
59
src/databricks/labs/ucx/queries/assessment/main/01_4_table_crawl_failures.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
/* | ||
--title 'Table Crawl Failures' | ||
--height 4 | ||
--width 4 | ||
*/ | ||
WITH latest_job_runs AS ( | ||
SELECT | ||
timestamp, | ||
job_id, | ||
job_run_id | ||
FROM ( | ||
SELECT | ||
CAST(timestamp AS TIMESTAMP) AS timestamp, | ||
job_id, | ||
job_run_id, | ||
ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY CAST(timestamp AS TIMESTAMP) DESC) = 1 AS latest_run_of_job | ||
FROM inventory.logs | ||
) | ||
WHERE | ||
latest_run_of_job | ||
), logs_latest_job_runs AS ( | ||
SELECT | ||
CAST(logs.timestamp AS TIMESTAMP) AS timestamp, | ||
message, | ||
job_run_id, | ||
job_id, | ||
workflow_name, | ||
task_name | ||
FROM inventory.logs | ||
JOIN latest_job_runs | ||
USING (job_id, job_run_id) | ||
WHERE | ||
workflow_name IN ('assessment') | ||
), table_crawl_failures AS ( | ||
SELECT | ||
timestamp, | ||
REGEXP_EXTRACT(message, '^failed-table-crawl: (.+?) -> (.+?): (.+)$', 1) AS error_reason, | ||
REGEXP_EXTRACT(message, '^failed-table-crawl: (.+?) -> (.+?): (.+)$', 2) AS error_entity, | ||
REGEXP_EXTRACT(message, '^failed-table-crawl: (.+?) -> (.+?): (.+)$', 3) AS error_message, | ||
job_run_id, | ||
job_id, | ||
workflow_name, | ||
task_name | ||
FROM logs_latest_job_runs | ||
WHERE | ||
STARTSWITH(message, 'failed-table-crawl: ') | ||
) | ||
SELECT | ||
timestamp, | ||
error_reason, | ||
error_entity, | ||
error_message, | ||
job_run_id, | ||
job_id, | ||
workflow_name, | ||
task_name | ||
FROM table_crawl_failures | ||
ORDER BY | ||
1 |
Oops, something went wrong.