-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes support for Metaflow #644
Conversation
This looks good; it seems like the next step might be to add some functionality to the step-functions plugin, so that it can optionally compile steps to synchronous EKS executions instead of Batch executions? First pass could be all or nothing, but the ability to mix and match per step could be interesting. |
@corleyma Yep, that's the logical next step in our Kubernetes story. This PR still has some open questions. Hopefully, it will be ready to merge next week and we can fast follow with an SFN story. Mixing up Batch & Kubernetes on SFN will be a bit tricky since we rely on the response structure of |
"stderr", | ||
job_id=job.id, | ||
) | ||
t = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can avoid duplicating line 265-267 (echo) by setting status = None on line 263
self._kwargs = kwargs | ||
|
||
# Kubernetes namespace defaults to `default` | ||
self._kwargs["namespace"] = self._kwargs["namespace"] or "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if "namespace" is not in kwargs, need to use
self._kwargs.get("namespace", "default")
def create(self): | ||
# Check that job attributes are sensible. | ||
|
||
# CPU value should be greater than 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a for loop:
`
for arg in ["cpu", "memory", " "disk"]:
if float(self._kwargs.get(arg, 0)) <= 0:
raise ...
`
The |
Hmm, may be I'm doing something wrong. Tried it again on Mac with the latest #756. A flow always results in a such error:
Ahh, finally fixed it with a patch:
|
Conda environment should pack linux python binary when run on MacOS to avoid an error metaflow_PlayListFlow_osx-64_179c56284704ca8e53622f848a3df27cdd1f4327/bin/python: cannot execute binary file: Exec format error
Mostly ready for review (except for TODOs marked in the PR)
This PR provides compute integration for Metaflow. Similar to Metaflow's integration with AWS Batch, users can scale up and out from their laptops to their Kubernetes clusters (Amazon EKS, for now) trivially.
This will execute
a
on your Kubernetes cluster and provide all of Metaflow's goodness.To easily test this PR - you can
--metadata=local
,eksctl
) and configure kubernetes credentials on your laptop,bar
) - and assign an IRSA (foo
) that allows Kubernetes pods to talk to S3"Easy configure" for this feature is in-flight.
Please route all your feedback to our slack workspace - http://slack.outerbounds.co