Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart code in docs fixed #387

Merged
merged 3 commits into from
Feb 4, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 26 additions & 23 deletions docs/book/introduction/quickstart-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ or view it on [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/quic

## Install and initialize

```python
```shell
# Install the dependencies for the quickstart
pip install zenml tensorflow
```
Expand All @@ -26,18 +26,23 @@ HuggingFace, PyTorch Lightning etc.
Once the installation is completed, you can go ahead and create your first ZenML repository for your project. As
ZenML repositories are built on top of Git repositories, you can create yours in a desired empty directory through:

```python
```shell
# Initialize ZenML
zenml init
```

Now, the setup is completed. For the next steps, just make sure that you are executing the code within your
ZenML repository.

## Define ZenML Steps
## Run your first pipeline

In the code that follows, you can see that we are defining the various steps of our pipeline. Each step is
decorated with `@step`, the main low-level abstraction that is currently available for creating pipeline steps.
decorated with `@step`. The pipeline in turn is decorated with the `@pipeline` decorator.

{% hint style="success" %}
Note that type hints are used for inputs and outputs of each step. The routing of step outputs
to step inputs is handled within the pipeline definition.
{% endhint %}

![Quickstart steps](../assets/quickstart-diagram.png)

Expand All @@ -60,8 +65,8 @@ def importer() -> Output(

@step
def trainer(
X_train: np.ndarray,
y_train: np.ndarray,
x_train: np.ndarray,
y_train: np.ndarray,
) -> tf.keras.Model:
"""A simple Keras Model to train on the data."""
model = tf.keras.Sequential()
Expand All @@ -74,33 +79,32 @@ def trainer(
metrics=["accuracy"],
)

model.fit(X_train, y_train)
model.fit(x_train, y_train)

# write model
return model


@step
def evaluator(
X_test: np.ndarray,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually really liked the capitalized X_test for X matrices. I think its a convention (or atleast that how I learned it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I can return it to capitalized

y_test: np.ndarray,
model: tf.keras.Model,
) -> float:
x_test: np.ndarray,
y_test: np.ndarray,
model: tf.keras.Model,
) -> Output(loss=float, acc=float):
"""Calculate the accuracy on the test set"""
test_acc = model.evaluate(X_test, y_test, verbose=2)
return test_acc
loss, acc = model.evaluate(x_test, y_test, verbose=1)
return loss, acc


@pipeline
def mnist_pipeline(
importer,
trainer,
evaluator,
importer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are extra indents by mistake?

trainer,
evaluator,
):
"""Links all the steps together in a pipeline"""
X_train, y_train, X_test, y_test = importer()
model = trainer(X_train=X_train, y_train=y_train)
evaluator(X_test=X_test, y_test=y_test, model=model)
x_train, y_train, x_test, y_test = importer()
model = trainer(x_train=x_train, y_train=y_train)
evaluator(x_test=x_test, y_test=y_test, model=model)


if __name__ == "__main__":
Expand All @@ -123,13 +127,12 @@ If you had a hiccup or you have some suggestions/questions regarding our framewo

## Wait, how is this useful?

The above code looks like its yet another standard pipeline framework that added to your work, but there is a lot
The above code looks like it is yet another standard pipeline framework that added to your work, but there is a lot
going on under the hood that is mighty helpful:

- All data is versioned and tracked as it flows through the steps.
- All parameters and return values are tracked by a central metadata store that you can later query.
- Individual step outputs are now cached, so you can swap out the trainer for other implementations and iterate fast.
- Code is versioned with `git`.

With just a little more work, one can:

Expand All @@ -141,7 +144,7 @@ training loops with automatic deployments.

Best of all: We let you and your infra/ops team decide what the underlying tools are to achieve all this.

Keep reading to learn how all of the above can be achieved.
Keep reading to learn how all the above can be achieved.

## Next Steps?

Expand Down