-
Notifications
You must be signed in to change notification settings - Fork 1
VTA Operator Classes
There are three high level operations that VTA
implements: selection, transformation, composition.
Selection operations select data points of interest on which subsequent operations on the pipeline are performed. The operation(s) of type select sub-selects portions of the entire data. The sub-selected data becomes the new data/dataset of the entire session. Filter operation on the other hand does not alter the data/dataset of the sessions, rather changes the data visibility to the user given certain conditions. Both types of selection operations take a condition (predicate) as input. The predicate can be some boolean conditional logic or user defined functions (UDF). Sampling is an example of UDF. An empty predicate indicates selection of the entire data.
Transformation operation transforms the type/content/organization of the selected data.
- Project operation changes the dimensionality or cardinality of the input data. For example, lowercasing, LDA, PCA.
- Mutate operation generates a new representation of the input data. For example, tranform text to tokens/words or tf-idf vectors.
- Aggregate operation computes summary statistics of the input data. For example, get average world length per review in a review corpus.
- Set operations enable set-like operators on the input data. For example, get unique tokens in a corpus.
- Visualize operation creates a new visualization, for example a barchart visualization or scatterplot visualization.
- Model operation allows users to interface with data using models. An example is generating a new column with predictions of text data--- classify a set of SMSs as SPAM or not SPAM.
Composition operation enables complex operations via amalgamation of the aforementioned operators. Combine operators enable users to specify a combination of operations similar to operator execution pipelines. Synthesize operators enable users to create new operations from these combinations which can be reused later.