Adds fixes for new target_cols ingest mechanism #512
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This includes a number of fixes that were required as a result of the new "target_cols" concept that is tied to the "viz_db_ingest" lambda. It's somewhat annoying to manually create the ingest tables for new files - or in this case even files we've been ingesting but which now require additional columns (i.e. "target_cols"). We historically would manually create them since it didn't make sense to create them in the "viz_db_ingest" map items loop. But I added the "iteration_index" argument to the map calls to that function, and thus can check in the function if
iteration_index == 0
then we set acreate_table
variable toTrue
. If either an UndefinedTable (i.e. table does not exist) or BadCopyFileFormat (i.e. a new column is now being ingested that wasn't before) error is thrown andcreate_table
is true, then the table schema will be recreated based exactly on the DataFrame that is being ingested, and the ingest will try again. If theiteration_index
is not 0 and one of those same errors is encountered, then the error is caught at the Step Function management level and will retry after waiting 5 seconds.With this new ingest methodology, a few column names slightly changed. This is because the actual DataFrame column name was used instead of whatever name that was previously manually assigned. Thus, a few product SQL queries had to be updated.
There may be a few remaining issues that I haven't been able to catch yet - since it's only now been less than an hour since fixing the frequent offenders (e.g. lambda_rfc, sns_ana, and man_rnr). If it looks good in the morning, we should immediately deploy this to TI. Once we quickly re-confirm that everything seems to be working, we should then deploy this to UAT.