-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support for data sourcing and transformation with data
template
#4958
Conversation
Signed-off-by: Simon Behar <simbeh7@gmail.com>
go.mod
Outdated
replace github.com/argoproj/argo/v2 => ./ | ||
replace ( | ||
github.com/argoproj/argo/v2 => ./ | ||
github.com/argoproj/pkg v0.3.0 => ../pkg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compilation errors will be due to this change, which I have made to test my argoproj/pkg
changes locally
Signed-off-by: Simon Behar <simbeh7@gmail.com>
workflow/artifacts/gcs/gcs.go
Outdated
@@ -249,3 +253,7 @@ func uploadObject(client *storage.Client, bucket, key, localPath string) error { | |||
} | |||
return nil | |||
} | |||
|
|||
func (g *ArtifactDriver) ListObjects(artifact *wfv1.Artifact) ([]string, error) { | |||
panic("implement me") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we should return an error here (rather than panic) - and do the implementation later, in a v1.1
config/config.go
Outdated
@@ -361,7 +361,8 @@ type MetricsConfig struct { | |||
} | |||
|
|||
type WorkflowRestrictions struct { | |||
TemplateReferencing TemplateReferencing `json:"templateReferencing"` | |||
TemplateReferencing TemplateReferencing `json:"templateReferencing,omitempty"` | |||
DataTemplatePodPolicy DataTemplatePodPolicy `json:"dataTemplatePodPolicy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I please ask you to remove this from this PR and we can follow up with >= v1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this is important this is shipped with the MVP as it protects from a possible exploit. Terry's comment describes it well: #4958 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask what v1.1 represents? I've been seeing that a lot in recent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1.1 of this feature (not of Argo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's slang:
- v1.0 - the MVP
- v1.1/v1.2/v1.3 etc- an improvement to a feature we probably want follow-up soon after v1.0
- v2.0 - a complex/big improvement we want to hold off on on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give (a) this PR will only ship with artifactsPaths
and (b) we only provide an expression transformation - I don't think we have a security risk with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regardless, lets narrow the scope v1.0 and follow up with a v1.1 after discussion
I've been thinking about other sources we might have. An obvious one is a normal So I suggest:
I feel like we may end up with the following sources, each in its own PR based on community suggestions:
|
@alexec Done! |
@@ -0,0 +1,100 @@ | |||
# See doc docs/data-sourcing-and-transformation.md | |||
apiVersion: argoproj.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump
go.mod
Outdated
@@ -2,6 +2,8 @@ module github.com/argoproj/argo-workflows/v3 | |||
|
|||
go 1.15 | |||
|
|||
replace github.com/argoproj/pkg v0.4.0 => github.com/simster7/pkg v0.2.1-0.20210205175442-61f69ff4a52f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still pending?
@@ -0,0 +1,100 @@ | |||
apiVersion: argoproj.io/v1alpha1 | |||
kind: Workflow | |||
metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could make this a testable example - BUT
a testable example should make a good example, if you need to be more complex, then better to have both examples and test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you need to be more complex, then better to have both examples and test
I think in this case we need to be more complex. See comment: #4958 (comment)
wfv1 "github.com/argoproj/argo-workflows/v3/pkg/apis/workflow/v1alpha1" | ||
) | ||
|
||
var inMemoryDataNode = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
200 lines of code - anyway we can reduce the maintenance this might need by making it somehow more focused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 200 lines of node status, not code :) I have already reduced this as much as possible. (Including removing lines of node status that are not needed, such as Progress
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great - some mino comments which could be left to anther day
@@ -0,0 +1,100 @@ | |||
# See doc docs/data-sourcing-and-transformation.md | |||
apiVersion: argoproj.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, we can change our mind later
@@ -2630,6 +2650,38 @@ func (woc *wfOperationCtx) executeResource(ctx context.Context, nodeName string, | |||
return node, err | |||
} | |||
|
|||
func (woc *wfOperationCtx) executeData(ctx context.Context, nodeName string, templateScope string, tmpl *wfv1.Template, orgTmpl wfv1.TemplateReferenceHolder, opts *executeTemplateOpts) (*wfv1.NodeStatus, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly, but it woc is MASSIVE and you might prefer to find your code in a single focussed file
Motivation
Users often source and transform data as part of their workflows. We want to add first-class support for these common operations in a new
data
template.Goals
Provide a varied set of sources and transformations so that users don't have to code and spin up their own Pods.
Be able to "automagically" decide if we are able to do the transformations in-memory in the controller, thereby saving the need to create a new Pod.
Provide an experience similar to common bash file manipulations. For example:
Here,
cat file.txt
acts as a "source", but it could also be acurl
orfind
, etc.grep ... | sed ... | sort | uniq
act as transformations.Maybe Goals
Provide a quick and efficient way to do simple, individual, parameter-level transformations. For example:
However, this will likely be accomplished by an approach such as feat(controller): Expression template tags. Resolves #4548 & #1293 #5115
Non Goals
container
. We want this to be a convenience for common use cases with sourcing and transforming data. If the convenience of thedata
template is not enough for our users, they should at said point define their own containers.Examples
A
data
template could look like:data
templates could also receive inputs from previous steps. In this case, since sourcing is not necessary, we can perform the transformations in-memory in the controller, saving the need to create a new pod and continuing immediately with execution.Open Questions
How to provide extensive transformation without needing to enumerate them ourselves? Libraries such as
expr
andSprig
can be useful, but also limited (expr
) or too domain specific (Sprig
). Provide access tobash
tools (e.g.grep
,sed
, etc.)? This is a balance between providing enough tooling for users to not need to spin up their own pods, but not enough that we're redoing work that users could do in their specific pods.What are the main use cases we should target with this template? What sources are a must-have? What transformations are a must-have?
Could
data:
templates be used for quick in-line transformations? For example