You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
jeremyprime
changed the title
Verify - Spark Connector captures rejected/exception rows while saving the dataframe to vertica.
Verify the Spark Connector captures rejected/exception rows while saving the DataFrame to Vertica
Jun 21, 2022
The following is a summary of how rejected rows are currently handled in the Spark Connector:
Each time an operation is run the status is saved in the job status table (S2V_JOB_STATUS_USER_${USER}) if save_job_status_table=true (defaults to false). This job status table contains metadata about the operation, including if the operation was successful and the percentage of failed rows. The user can also specify the error tolerance by setting failed_rows_percent_tolerance (defaults to 0.10, or 10%).
Currently the rejected rows themselves are not persisted to a table (see #293). However, a summary of the rejected rows is printed to the logs. Up to 10 of the most common errors are printed, showing the number or rejected rows, an example, and the rejected reason. For example:
2022-06-21 17:45:32 ERROR VerticaDistributedFilesystemWritePipe:393 - Found 3 rejected rows, displaying up to 10 of the most common reasons:
2022-06-21 17:45:32 ERROR VerticaDistributedFilesystemWritePipe:394 - count | example_data | rejected_reason
2022-06-21 17:45:32 ERROR VerticaDistributedFilesystemWritePipe:396 - 3 | NULL | In column 1: Cannot set NULL value in NOT NULL column
Note that in some cases the write to Vertica will fail before the rows can be evaluated and there will be no rejected row information printed in that case. For example, if there is a schema mismatch between the source data and the Vertica table.
In order to get rejected row information there must be an error during processing. For example, a non-null constraint on a column in Vertica but no such constraint in the source data (and the existence of null values in the source, violating the Vertica constraint).
Summarize the behaviour of saving or reporting of rejected/exception rows.
For reference, see the following tickets:
The text was updated successfully, but these errors were encountered: