-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstractions for artifact stores #474
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes / nits in here. Otherwise, really painstaking work you did! Looking forward to getting this merged in as soon as possible as I think it will affect some other branches currently under construction...
src/zenml/integrations/sagemaker/step_operators/sagemaker_step_operator.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I especially love the nice error messages when defining an incorrect stack component/artifact store! Only got some very minor nits
@@ -66,7 +66,7 @@ def evaluator( | |||
return test_acc | |||
|
|||
|
|||
@pipeline(required_integrations=[SKLEARN]) | |||
@pipeline(required_integrations=[SKLEARN, S3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the general step operator example, in which case S3 could also be any other artifact store? In any case as long as a StackComponent (e.g. the artifact store here) is part of your active stack, it should be recognized automatically and there is no need to specify it manually
@@ -136,19 +123,19 @@ def _activate_integrations(): | |||
# getting the local orchestrator should not require activating integrations | |||
StackComponentClassRegistry.get_class( | |||
component_type=StackComponentType.ORCHESTRATOR, | |||
component_flavor=OrchestratorFlavor.LOCAL, | |||
component_flavor="local", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should at least have string constants somewhere for all the ZenML internal flavors, what do you think?
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it! Always great to see user experience improved by simplifying the interface. I just left a few small suggestions, and a question or two for my own understanding. Excited to see this merged soon
…com:zenml-io/zenml into feature/ENG-616-artifact-store-abstractions
…_operator.py [ci skip] Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work redesigning the artifact store ! This was a real pleasure to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow what a giant of a PR - great job Baris !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love what you've done with the ClassVars
Hello everyone!
Even though this PR will be mainly focused on the abstraction of artifact stores, there is a lot to unpack here. That's why I will try to separate it into different sections.
Abstraction for Artifact Stores
First off, let's start with artifact stores. Previously, our artifact stores looked exactly as:
You can see at a first glance, the only thing that our base implementation featured was a
path
attribute and that was the entire interface for our artifact stores. As we are strictly using filesystem-dependant artifact stores, we have decided to merge the concept ofFilesystems
andArtifactStores
together and create a new interface as a combination of both:We still needed to register a
FileSystem
, that's why the base class now has a_register
method, which creates a properFileSystem
and registers it totfx.dsl.io.filesystem_registry.DEFAULT_FILESYSTEM_REGISTRY
upon the__init__
of the instance.Due to the internal functionalities of TFX, there was a need to catch the
FileNotFoundError
s andraise a
from tfx.dsl.io.fileio.NotFoundError
s instead. Due to this issue, the implementations looked like this:This did not only create a lot of clutter in code but also put the responsibility of handling this on the person who is extending it. As a solution, I have created a decorator
_catch_not_found_error
which is now used as a wrapper around these methods when you create the correspondingFilesystem
. (special thanks to @schustmi and @jwwwb for the idea)fileio
changesFor the sake of consistency and reduced complications, our implementations for
filesystem
andfilesystem_registry
have been removed. The remainingzenml.io.fileio
andzenml.io.utils
modules are now using the file systems which are registered in thetfx.dsl.io.filesystem_registry.DEFAULT_FILESYSTEM_REGISTRY
(which also holds theFilesystems
generated by ourArtifactStore
s)Previously Implemented Artifact Stores
We had already built a few
ArtifactStore
implementations either as a part of the core or as a part of an integration. However, due to the recent changes in the base implementation, I needed to update the current implementations of the artifact stores. Since everyArtifactStore
-Filesystem
pair got merged, the plugins were removed and as a direct result, the activation of corresponding integrations got updated.Also, as the concepts are now merged, the
path
attribute is now validated through a pydanticvalidator
within the base implementation using theSUPPORTED_SCHEMES
, thusensure_XX_path
validators have been removed.Changes to the
StackComponent
Working on the previously implemented artifact stores proved to be an excellent opportunity as I was able to experience the process of how a user can extend one of the base implementations of a stack component. With this in mind, I have made a few fundamental changes to the
StackComponent
:Anyone who was extending any
StackComponent
had to define asupports_local_execution
variable and asupports_remote_execution
variable. Since they were not used anywhere, both of these variables have been removed to make it cleaner.The
type
andflavor
of theStackComponent
s have also been reworked as follows:to
Old version of extension:
New version of extension:
In order to have the same effect that the
@abstractmethod
has on the first implementation, theStackComponent
now features a@root_validator
which checks whether theTYPE
andFLAVOR
is defined when creating a newStackComponent
.For the sake of extensibility, the
flavor
enums have been removed and replaced with string values.Open Questions
There are still a few questions:
TODOs
New Changes
Pre-requisites
Please ensure you have done the following:
Types of changes