-
Notifications
You must be signed in to change notification settings - Fork 6
Airflow Setup
- Go to AI Platform and start up your notebook instance.
- When the instance comes up, click on the OPEN JUPYTERLAB link.
- Open a terminal window by going to File -> New -> Terminal.
- At the prompt, enter the following commands in the specified sequence:
$ source venv/bin/activate
$ export AIRFLOW_HOME=/home/jupyter/airflow
$ mkdir -p /home/jupyter/airflow/dags
$ pip install apache-airflow[gcp_api]
$ airflow initdb
$ airflow version
You should get Airflow v1.10.10
from the last command.
- Activate your virtual environment:
$ source venv/bin/activate
- List current DAGs:
$ airflow list_dags
- List tasks for a DAG:
$ airflow list_tasks <dag_name> --tree
- Test a specific task of a DAG:
$ airflow test <dag_name> <task_name> <yyyy-mm-dd>
- For example:
$ airflow test tutorial print_date 2020-04-27
If you were able to successfully run $ airflow list_dags
, you're done with the basic setup.
There are two methods of running a full DAG: 1) from the command-line and 2) from the UI.
source venv/bin/activate
Run DAG between start date and end date: airflow backfill <dag_name> -s <yyyy-mm-dd> -e <yyyy-mm-dd>
Clear DAG runs between start date and end date: airflow clear <dag_name> -s <yyyy-mm-dd> -e <yyyy-mm-dd>
source venv/bin/activate
airflow webserver --port=8083 &
From a second terminal window:
source venv/bin/activate
airflow scheduler &
The scheduler output will contain ERROR - Cannot use more than 1 thread when using sqlite. Setting parallelism to 1
. This message is expected and can be safely ignored.
- Install the Cloud SDK for your OS. Use the interactive installer from link
- Open a terminal window on your laptop and run:
gcloud init
and go through the prompts.
- Once your gcloud setup is complete, run:
export PROJECT_ID=<YOUR PROJECT>
export ZONE=us-west1-b
export INSTANCE_NAME=<YOUR JUPYTER NOTEBOOK INSTANCE>
gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8083:localhost:8083
- Open a browser and browse to http://localhost:8083/admin/
- From the DAG view of the UI, unpause the DAG you want to run by toggling the left-hand On/Off button
- Click the Trigger DAG button (small arrow icon) from the Links section