Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AttributeType validation in text-related operator executors #1955

Closed
aahei opened this issue Jun 3, 2023 · 2 comments
Closed

Add AttributeType validation in text-related operator executors #1955

aahei opened this issue Jun 3, 2023 · 2 comments
Assignees

Comments

@aahei
Copy link
Contributor

aahei commented Jun 3, 2023

During the review of PR #1924, we found that the following operators require the input attributes to be of certain types, but currently do not have constraints in the code in the back-end.

These operators only take string-type input attributes, but do not check the type in the backend:

  • Regular Expression
  • Dictionary Matcher
  • Keyword Search
  • Sentiment Analysis
  • Unnest String
  • HTML Visualizer
  • Word Cloud

Additionally, the Linear Regression operator only takes numeric type input attributes but does not check the type.

For the Hash Join and Interval Join operators, we may consider whether enforcing the check for the equality of left and right attribute types.

@aahei aahei changed the title Add Attribute Type Checking in Operators in Backend Add Attribute Type Checking in Some Operators in Backend Jun 3, 2023
@Yicong-Huang Yicong-Huang changed the title Add Attribute Type Checking in Some Operators in Backend Add AttributeType validation in text-related operator executors Jun 4, 2023
@sadeemsaleh
Copy link
Collaborator

@aahei I suggest, before we do the change for the HTML visualizer is to do a test using some existing use cases. I observed most cases pass type ANY from Python UDF to the HTML visualizer, not sure though why users choose ANY instead of String.

@Yicong-Huang
Copy link
Collaborator

Just a side note regarding the ANY, it is a general type that represents all types for the engine to work. In #1957 we are considering removing it from the UI so that users won't select it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants