Skip to content

Airflow Setup

Shirley Cohen edited this page Apr 28, 2020 · 5 revisions

Basic Setup

  • Go to AI Platform and start up your notebook instance.
  • When the instance comes up, click on the OPEN JUPYTERLAB link.
  • Open a terminal window by going to File -> New -> Terminal.
  • At the prompt, enter the following commands in the specified sequence:

$ source venv/bin/activate
$ export AIRFLOW_HOME=/home/jupyter/airflow
$ mkdir -p /home/jupyter/airflow/dags
$ pip install apache-airflow[gcp_api]
$ airflow initdb
$ airflow version

You should get Airflow v1.10.10 from the last command.

Test Setup

  • Activate your virtual environment: $ source venv/bin/activate
  • List current DAGs: $ airflow list_dags
  • List tasks for a DAG: $ airflow list_tasks <dag_name> --tree
  • Test a specific task of a DAG: $ airflow test <dag_name> <task_name> <yyyy-mm-dd>
  • For example: $ airflow test tutorial print_date 2020-04-27

If you were able to successfully run $ airflow list_dags, you're done with the basic setup.

How to run a DAG

There are two methods of running a full DAG: 1) from the command-line and 2) from the UI.

From the command-line:

On your Jupyter notebook instance:

source venv/bin/activate
Run DAG between start date and end date: airflow backfill <dag_name> -s <yyyy-mm-dd> -e <yyyy-mm-dd>
Clear DAG runs between start date and end date: airflow clear <dag_name> -s <yyyy-mm-dd> -e <yyyy-mm-dd>

From the UI:

On your Jupyter notebook instance:

source venv/bin/activate
airflow webserver --port=8083 &

From a second terminal window:
source venv/bin/activate
airflow scheduler &

The scheduler output will contain ERROR - Cannot use more than 1 thread when using sqlite. Setting parallelism to 1. This message is expected and can be safely ignored.

On your laptop:
  • Install the Cloud SDK for your OS. Use the interactive installer from link
  • Open a terminal window on your laptop and run:
    gcloud init and go through the prompts.
  • Once your gcloud setup is complete, run:
    export PROJECT_ID=<YOUR PROJECT>
    export ZONE=us-west1-b
    export INSTANCE_NAME=<YOUR JUPYTER NOTEBOOK INSTANCE>
    gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8083:localhost:8083
  • Open a browser and browse to http://localhost:8083/admin/
  • From the DAG view of the UI, unpause the DAG you want to run by toggling the left-hand On/Off button
  • Click the Trigger DAG button (small arrow icon) from the Links section