Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Attribute Type Rules for Interval Join, Scatter Plot, Sort Partitions, Type Casting #2005

Merged
merged 43 commits into from
Jun 24, 2023

Conversation

aahei
Copy link
Contributor

@aahei aahei commented Jun 14, 2023

This is the second PR for the attribute type checking feature. The first one is #1924.

Description of Attribute Type Rules

Interval Join

leftAttributeName (and rightAttributeName) must be integer, long, double, or timestamp.

And, leftAttributeName attribute must have the same type as the rightAttributeName.

{
  "attributeTypeRules": {
    "leftAttributeName": {
      "enum": ["integer", "long", "double", "timestamp"]
    },
    "rightAttributeName": {
      "const": {
        "$data": "leftAttributeName"
      }
    }
  }
}

Note: We intentionally put enum test in front of const test, because we want to test whether they have the correct type. Or, if we put the const test first, i.e rightAttributeName rule first, and if leftAttributeName's attribute type is an invalid type like string, then it will prompt the user that rightAttributeName should have the same attribute type as leftAttributeName -- string -- which is incorrect since both should not be a string type.

Scatter Plot

xColumn and yColumn attributes must be of integer or double type.

{
  "attributeTypeRules": {
    "xColumn":{
      "enum": ["integer", "double"]
    },
    "yColumn":{
      "enum": ["integer", "double"]
    }
  }
}

Note: it may support long in the future. See #1954.

Sort Partitions

sortAttributeName attribute type must be integer, long, or double.

{
  "attributeTypeRules": {
    "sortAttributeName":{
      "enum": ["integer", "long", "double"]
    }
  }
}

Note: May support timestamp in the future. See #1954.

Type Casting

For example, if we want to convert an attribute to integer, it must have attribute type of string, long, double, or boolean. A type should not convert to the type itself. See the schema for detail.

{
	"attributeTypeRules": {
		"attribute": {
			"allOf": [{
					"if": {
						"resultType": {
							"valEnum": ["integer"]
						}
					},
					"then": {
						"enum": ["string", "long", "double", "boolean"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["double"]
						}
					},
					"then": {
						"enum": ["string", "integer", "long", "boolean"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["boolean"]
						}
					},
					"then": {
						"enum": ["string", "integer", "long", "double"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["long"]
						}
					},
					"then": {
						"enum": ["string", "integer", "double", "boolean", "timestamp"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["timestamp"]
						}
					},
					"then": {
						"enum": ["string", "long"]
					}
				}
			]
		}
	}
}

Note: The type constraint is enforced in core/amber/src/main/scala/edu/uci/ics/texera/workflow/common/tuple/schema/AttributeTypeUtils.scala.

@aahei aahei self-assigned this Jun 14, 2023
@aahei aahei requested a review from Yicong-Huang June 14, 2023 11:35
@aahei aahei marked this pull request as ready for review June 16, 2023 19:43
Base automatically changed from cli-attr-type-checking to master June 17, 2023 01:45
aahei added 2 commits June 19, 2023 03:24
The `sum`, `average`, `min`, and `max` aggregation functions in the
Aggregation operator should support all numeric types (integer, double,
long, and timestamp). The code does support these types, but the
`attributeTypeRules` injected did not include the timestamp type. This
PR adds the timestamp type for these aggregation functions.
Copy link
Collaborator

@Yicong-Huang Yicong-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! the rules are very clear.

@aahei aahei merged commit 73d2ca7 into master Jun 24, 2023
@aahei aahei deleted the cli-attr-type-checking-2 branch June 24, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants