zenml-io · htahir1 · Nov 2, 2021 · Nov 2, 2021 · Nov 2, 2021 · Nov 2, 2021
diff --git a/docs/book/assets/2021-11-02-architecture-overview.png b/docs/book/assets/2021-11-02-architecture-overview.png
diff --git a/docs/book/assets/localstack-with-airflow-orchestrator.png b/docs/book/assets/localstack-with-airflow-orchestrator.png
diff --git a/docs/book/assets/localstack.png b/docs/book/assets/localstack.png
diff --git a/docs/book/assets/quickstart-diagram.png b/docs/book/assets/quickstart-diagram.png
diff --git a/docs/book/core-concepts.md b/docs/book/core-concepts.md
@@ -4,11 +4,9 @@ description: A good place to start before diving further into the docs.
 
 # Core Concepts
 
-## Core Concepts
-
 **ZenML** consists of the following key components:
 
-![ZenML Architectural Overview](<.gitbook/assets/core_concepts_zenml.png>)
+![ZenML Architectural Overview](assets/2021-11-02-architecture-overview.png)
 
 **Repository**
 
@@ -83,8 +81,8 @@ def simplest_step_ever(basic_param_1: int, basic_param_2: str) -> int:
 
 There are only a few considerations for the parameters and return types.
 
-* All parameters passed into the signature must be [typed](https://docs.python.org/3/library/typing.html). Similarly, if you're returning something, it must be also be typed with the return operator (`->`)
-* ZenML uses [Pydantic](https://pydantic-docs.helpmanual.io/usage/types/) for type checking and serialization under-the-hood, so all [Pydantic types](https://pydantic-docs.helpmanual.io/usage/types/) are supported \[full list available soon].
+- All parameters passed into the signature must be [typed](https://docs.python.org/3/library/typing.html). Similarly, if you're returning something, it must be also be typed with the return operator (`->`)
+- ZenML uses [Pydantic](https://pydantic-docs.helpmanual.io/usage/types/) for type checking and serialization under-the-hood, so all [Pydantic types](https://pydantic-docs.helpmanual.io/usage/types/) are supported \[full list available soon].
 
 While this is just a function with a decorator, it is not super useful. ZenML steps really get powerful when you put them together with [data artifacts](broken-reference). Read about more of that here!
 
@@ -104,7 +102,7 @@ def my_step(first_artifact: int, second_artifact: torch.nn.Module -> int:
     return 1
 ```
 
-Artifacts can be serialized and deserialized (i.e. written and read from the Artifact Store) in many different ways like `TFRecord`s or saved model pickles, depending on what the step produces.The serialization and deserialization logic of artifacts is defined by  [materializers.md](reference/zenml/materializers.md "mention").
+Artifacts can be serialized and deserialized (i.e. written and read from the Artifact Store) in many different ways like `TFRecord`s or saved model pickles, depending on what the step produces.The serialization and deserialization logic of artifacts is defined by [materializers.md](reference/zenml/materializers.md "mention").
 
 **Materializers**
 
@@ -120,7 +118,7 @@ from zenml.steps.base_step_config import BaseStepConfig
 class MyStepConfig(BaseStepConfig):
     basic_param_1: int = 1
     basic_param_2: str = 2
-    
+
 @step
 def my_step(params: MyStepConfig):
     # user params here
@@ -143,9 +141,9 @@ An orchestrator is a special kind of backend that manages the running of each st
 
 A stack is made up of the following three core components:
 
-* An Artifact Store
-* A Metadata Store
-* An Orchestrator (backend)
+- An Artifact Store
+- A Metadata Store
+- An Orchestrator (backend)
 
 A ZenML stack also happens to be a Pydantic `BaseSettings` class, which means that there are multiple ways to use it.
 
@@ -172,9 +170,9 @@ On a high level, when data is read from an **artifact** the results are persiste
 
 A few rules apply:
 
-* Every **orchestrator** (local, Google Cloud VMs, etc) can run all **pipeline steps**, including training.
-* **Orchestrators** have a selection of compatible **processing backends**.
-* **Pipelines** can be configured to utilize more powerful **processing** (e.g. distributed) and **training** (e.g. Google AI Platform) **executors**.
+- Every **orchestrator** (local, Google Cloud VMs, etc) can run all **pipeline steps**, including training.
+- **Orchestrators** have a selection of compatible **processing backends**.
+- **Pipelines** can be configured to utilize more powerful **processing** (e.g. distributed) and **training** (e.g. Google AI Platform) **executors**.
 
 A quick example for large datasets makes this clearer. By default, your experiments will run locally. Pipelines that load large datasets would be severely bottlenecked, so you can configure [Google Dataflow](https://cloud.google.com/dataflow) as a **processing executor** for distributed computation, and [Google AI Platform](https://cloud.google.com/ai-platform) as a **training executor**.
 
@@ -184,11 +182,11 @@ The design choices in **ZenML** follow the understanding that production-ready m
 
 In different words, **ZenML** runs your **ML** code while taking care of the "**Op**eration**s**" for you. It takes care of:
 
-* Interfacing between the individual processing **steps** (splitting, transform, training).
-* Tracking of intermediate results and metadata
-* Caching your processing artifacts.
-* Parallelization of computing tasks.
-* Ensuring the immutability of your pipelines from data sourcing to model artifacts.
-* No matter where - cloud, on-prem, or locally.
+- Interfacing between the individual processing **steps** (splitting, transform, training).
+- Tracking of intermediate results and metadata
+- Caching your processing artifacts.
+- Parallelization of computing tasks.
+- Ensuring the immutability of your pipelines from data sourcing to model artifacts.
+- No matter where - cloud, on-prem, or locally.
 
 Since production scenarios often look complex, **ZenML** is built with integrations in mind. **ZenML** will support a range of integrations for processing, training, and serving, and you can always add custom integrations via our extensible interfaces.
diff --git a/docs/book/guides/low-level-api/chapter-1.md b/docs/book/guides/low-level-api/chapter-1.md
@@ -4,7 +4,7 @@ description: Create your  first step.
 
 If you want to see the code for this chapter of the guide, head over to the [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/low_level_guide/chapter_1.py).
 
-# Chapter 1: Create an importer step to load data
+# Create an importer step to load data
 
 The first thing to do is to load our data. We create a step that can load data from an external source (in this case a [Keras Dataset](https://keras.io/api/datasets/)). This can be done by creating a simple function and decorating it with the `@step` decorator.
 
@@ -30,8 +30,8 @@ def importer_mnist() -> Output(
 
 There are some things to note:
 
-* As this step has multiple outputs, we need to use the `zenml.steps.step_output.Output` class to indicate the names of each output. If there was only one, we would not need to do this.
-* We could have returned the `tf.keras.datasets.mnist` directly but we wanted to persist the actual data (for caching purposes), rather than the dataset object.
+- As this step has multiple outputs, we need to use the `zenml.steps.step_output.Output` class to indicate the names of each output. If there was only one, we would not need to do this.
+- We could have returned the `tf.keras.datasets.mnist` directly but we wanted to persist the actual data (for caching purposes), rather than the dataset object.
 
 Now we can go ahead and create a pipeline with one step to make sure this step works:
 
@@ -51,11 +51,13 @@ load_mnist_pipeline(importer=importer_mnist()).run()
 ```
 
 ## Run
+
 You can run this as follows:
 
 ```python
 python chapter_1.py
 ```
+
 The output will look as follows (note: this is filtered to highlight the most important logs)
 
 ```bash
@@ -66,7 +68,7 @@ Step `importer_mnist` has started.
 Step `importer_mnist` has finished in 1.726s.
 ```
 
-## Inspect 
+## Inspect
 
 You can add the following code to fetch the pipeline:
 
@@ -98,4 +100,4 @@ Output 'y_train' is an array with shape: (60000,)
 Output 'X_train' is an array with shape: (60000, 28, 28)
 ```
 
-So now we have successfully confirmed that the data is loaded with the right shape and we can fetch it again from the artifact store.
+So now we have successfully confirmed that the data is loaded with the right shape and we can fetch it again from the artifact store.
diff --git a/docs/book/guides/low-level-api/chapter-2.md b/docs/book/guides/low-level-api/chapter-2.md
@@ -4,11 +4,10 @@ description: Add some normalization
 
 If you want to see the code for this chapter of the guide, head over to the [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/low_level_guide/chapter_2.py).
 
-# Chapter 2: Normalize the data.
+# Normalize the data.
 
 Now before writing any trainers we can actually normalize our data to make sure we get better results. To do this let's add another step and make the pipeline a bit more complex.
 
-
 ## Create steps
 
 We can think of this as a `normalizer` step that takes data from the importer and normalizes it:
@@ -38,13 +37,14 @@ def load_and_normalize_pipeline(
     normalizer(X_train=X_train, X_test=X_test)
 ```
 
-
 ## Run
+
 You can run this as follows:
 
 ```python
 python chapter_2.py
 ```
+
 The output will look as follows (note: this is filtered to highlight the most important logs)
 
 ```bash
@@ -57,7 +57,7 @@ Step `normalize_mnist` has started.
 Step `normalize_mnist` has finished in 1.848s.
 ```
 
-## Inspect 
+## Inspect
 
 You can add the following code to fetch the pipeline:
 
@@ -87,4 +87,4 @@ Output 'X_train_normed' is an array with shape: (60000, 28, 28)
 Output 'X_test_normed' is an array with shape: (10000, 28, 28)
 ```
 
-Which confirms again that the data is stored properly! Now we are ready to create some trainers..
+Which confirms again that the data is stored properly! Now we are ready to create some trainers..
diff --git a/docs/book/guides/low-level-api/chapter-3.md b/docs/book/guides/low-level-api/chapter-3.md
@@ -4,9 +4,10 @@ description: Train some models.
 
 If you want to see the code for this chapter of the guide, head over to the [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/low_level_guide/chapter_3.py).
 
-# Chapter 3: Train and evaluate the model.
+# Train and evaluate the model.
+
+Finally we can train and evaluate our model.
 
-Finally we can train and evaluate our model. 
 ## Create steps
 
 For this we decide to add two steps, a `trainer` and an `evaluator` step. We also keep using TensorFlow to help with these.
@@ -26,10 +27,10 @@ class TrainerConfig(BaseStepConfig):
     epochs: int = 1
     gamma: float = 0.7
     lr: float = 0.001
-    
+
 @step
 def tf_trainer(
-    config: TrainerConfig,  # not an artifact, passed in when 
+    config: TrainerConfig,  # not an artifact, passed in when
     X_train: np.ndarray,
     y_train: np.ndarray,
 ) -> tf.keras.Model:
@@ -61,10 +62,11 @@ def tf_trainer(
 
 A few things of note:
 
-* This is our first instance of `parameterizing` a step with a `BaseStepConfig`. This allows us to specify some parameters at run-time rather than via data artifacts between steps.
-* This time the trainer returns a `tf.keras.Model`, which ZenML takes care of storing in the artifact store. We will talk about how to 'take over' this storing via `Materializers` in a later chapter.
+- This is our first instance of `parameterizing` a step with a `BaseStepConfig`. This allows us to specify some parameters at run-time rather than via data artifacts between steps.
+- This time the trainer returns a `tf.keras.Model`, which ZenML takes care of storing in the artifact store. We will talk about how to 'take over' this storing via `Materializers` in a later chapter.
 
 ### Evaluator
+
 We also add a a simple evaluator:
 
 ```python
@@ -116,6 +118,7 @@ mnist_pipeline(
 Beautiful, now the pipeline is truly doing something. Let's run it!
 
 ## Run
+
 You can run this as follows:
 
 ```python
@@ -138,7 +141,7 @@ Step `tf_evaluator` has started.
 `tf_evaluator` has finished in 0.742s.
 ```
 
-## Inspect 
+## Inspect
 
 If you add the following code to fetch the pipeline:
 
@@ -165,4 +168,4 @@ The first run has 4 steps.
 The `tf_evaluator step` returned an accuracy: 0.9100000262260437
 ```
 
-Wow, we just trained our first model! But have not stopped yet. What if did not want to use TensorFlow? Let's swap out our trainers and evaluators for different libraries.
+Wow, we just trained our first model! But have not stopped yet. What if did not want to use TensorFlow? Let's swap out our trainers and evaluators for different libraries.
diff --git a/docs/book/guides/low-level-api/chapter-4.md b/docs/book/guides/low-level-api/chapter-4.md
@@ -4,7 +4,7 @@ description: Leverage caching.
 
 If you want to see the code for this chapter of the guide, head over to the [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/low_level_guide/chapter_4.py).
 
-# Chapter 4: Swap out implementations of individual steps and see caching in action
+# Swap out implementations of individual steps and see caching in action
 
 What if we don't want to use TensorFlow but rather a [scikit-learn](https://scikit-learn.org/) model? This is easy to do.
 
@@ -35,6 +35,7 @@ def sklearn_trainer(
 A simple enough step using a sklearn `ClassifierMixin` model. ZenML also knows how to store all primitive sklearn model types.
 
 ### Evaluator
+
 We also add a simple evaluator:
 
 ```python
@@ -65,6 +66,7 @@ mnist_pipeline(
 ```
 
 ## Run
+
 You can run this as follows:
 
 ```python
@@ -89,7 +91,7 @@ Step `sklearn_evaluator` has finished in 0.191s.
 
 Note that the `importer` and `mnist` steps are now **100x** faster. This is because we have not changed the pipeline at all, and just made another run with different functions. So ZenML caches these steps and skips straight to the new trainer and evaluator.
 
-## Inspect 
+## Inspect
 
 If you add the following code to fetch the pipeline:
 
@@ -115,6 +117,6 @@ For tf_evaluator, the accuracy is: 0.91
 For sklearn_evaluator, the accuracy is: 0.92
 ```
 
-Looks like sklearn narrowly beat TensorFlow in this one. If we want we can keep extending this and add a PyTorch example (we have done with the `not_so_quickstart` [example](https://github.com/zenml-io/zenml/tree/main/examples/not_so_quickstart)). 
+Looks like sklearn narrowly beat TensorFlow in this one. If we want we can keep extending this and add a PyTorch example (we have done with the `not_so_quickstart` [example](https://github.com/zenml-io/zenml/tree/main/examples/not_so_quickstart)).
 
-Combining different complex steps with standard pipeline interfaces is a powerful tool in any MLOps setup. You can now organize, track, and manage your codebase as it grows with your use-cases.
+Combining different complex steps with standard pipeline interfaces is a powerful tool in any MLOps setup. You can now organize, track, and manage your codebase as it grows with your use-cases.
diff --git a/docs/book/guides/low-level-api/chapter-5.md b/docs/book/guides/low-level-api/chapter-5.md
@@ -4,12 +4,13 @@ description: Materialize artifacts as you want.
 
 If you want to see the code for this chapter of the guide, head over to the [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/low_level_guide/chapter_5.py).
 
-# Chapter 5: Materialize artifacts the way you want to consume them.
+# Materialize artifacts the way you want to consume them.
 
 At this point, the precise way that data passes between the steps has been a bit of a mystery to us. There is, of course, a mechanism to serialize and deserialize stuff flowing between steps. We can now take control of this mechanism if we require further control.
 
 ## Create custom materializer
-Data that flows through steps is stored in `Artifact Stores`. The logic that governs the reading and writing of data to and from the `Artifact Stores` lives in the `Materializers`. 
+
+Data that flows through steps is stored in `Artifact Stores`. The logic that governs the reading and writing of data to and from the `Artifact Stores` lives in the `Materializers`.
 
 Suppose we wanted to write the output of our `evaluator` step and store it in a SQLite table in the Artifact Store, rather than whatever the default mechanism is to store the float. Well, that should be easy. Let's create a custom materializer:
 
@@ -96,13 +97,14 @@ scikit_p = mnist_pipeline(
 ```
 
 ## Run
+
 You can run this as follows:
 
 ```python
 python chapter_5.py
 ```
 
-## Inspect 
+## Inspect
 
 We can also now read data from the SQLite table with our custom materializer:
 
@@ -120,4 +122,4 @@ Which returns:
 ```bash
 Pipeline `mnist_pipeline` has 1 run(s)
 The evaluator stored the value: 0.9238 in a SQLite database!
-```
+```
diff --git a/docs/book/guides/low-level-api/chapter-6.md b/docs/book/guides/low-level-api/chapter-6.md
@@ -4,15 +4,15 @@ description: Reading from a continuously changing datasource
 
 If you want to see the code for this chapter of the guide, head over to the [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/low_level_guide/chapter_6.py).
 
-# Chapter 6: Import data from a dynamic data source
+# Import data from a dynamic data source
 
 Until now, we've been reading from a static data importer step because we are at the experimentation phase of the ML workflow. Now as we head towards production, we want to switch over to a non-static, dynamic data importer step:
 
 This could be anything like:
 
-* A database/data warehouse that updates regularly (SQL databases, BigQuery, Snowflake)
-* A data lake (S3 Buckets/Azure Blob Storage/GCP Storage)
-* An API which allows you to query the latest data.
+- A database/data warehouse that updates regularly (SQL databases, BigQuery, Snowflake)
+- A data lake (S3 Buckets/Azure Blob Storage/GCP Storage)
+- An API which allows you to query the latest data.
 
 ## Read from a dynamic datasource
 
@@ -71,13 +71,14 @@ scikit_p = mnist_pipeline(
 ```
 
 ## Run
+
 You can run this as follows:
 
 ```python
 python chapter_6.py
 ```
 
-## Inspect 
+## Inspect
 
 Even if our data originally lives in an external API, we have now downloaded it and versioned locally as we ran this pipeline. So we can fetch it and inspect it:
 
@@ -103,4 +104,4 @@ Now we are loading data dynamically from a continously changing data source!
 
 {% hint style="info" %}
 In the near future, ZenML will help you automatically detect drift and schema changes across pipeline runs, to make your pipelines even more robust! Keep an eye out on this space and future releases!
-{% endhint %}
+{% endhint %}