Skip to content

Latest commit

 

History

History
61 lines (58 loc) · 2.4 KB

schema-validator.md

File metadata and controls

61 lines (58 loc) · 2.4 KB

The SchemaValidator is for validating the schema of the input data-frame (view) with the schema provided in the definition.

  • The schema can be provided by either ddlSchemaString or ddlSchemaFile property, and must be in DDL format.
  • The type of validation must be either match or adapt
    • match: the two schemas must be matched by number of columns, the name & data type of each column
    • adapt: the schema of the dataframe is adapted to the schema from the definition. As a result of the validation, the output dataframe will be "expanded".
      Note: If the schema from the definition has columns that don't exist in the schema of the dataframe, new columns will be added with null values.
  • The mode of validation must be either strict or default
    • when the validation type is match
      • strict: the order of the columns in two schemas must be the same
      • default: ignore the order of the columns from both schemas.
    • when the validation type is adapt
      • strict: all columns from the dataframe must be included in the schema from the definition.
      • default: ignore columns from the dataframe that don't exist in the schema from the definition
        Note: this will cause data lost
  • The action after the validation must be either error or ignore
    • error: if the validation fails, the process exits with the validation error.
    • ignore: the process ignores the validation failure, but logs will be written.
  • The view is the input dataframe that its schema is validated with the schema provided in the definition.

Actor Class: com.qwshen.etl.validation.SchemaValidator

The definition of the SchemaValidator:

  • in YAML
  actor:
    type: schema-validator
    properties:
      ddlSchemaString: "id int, name string, age int, gender string, address string"
      type: match
      mode: strict
      action: error
      view: users
  • in JSON
  {
    "actor": {
      "type": "schema-validator",
      "properties": {
        "ddlSchemaFile": "${application.users.schema",
        "type": "adapt",
        "mode": "default",
        "action": "ignore",
        "view": "users"
      }
    }
  }
  • in XML
  <actor type="schema-validator">
    <properties>
      <ddlSchemString>id int, name string, age int, gender string, address string</ddlSchemString>
      <type>adapt</type>
      <mode>strict</mode>
      <action>ignore</action>
      <view>users</view>
    </properties>
  </actor>