Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Dags are only imported when the webserver restarts! #46

Closed
r39132 opened this issue Jun 19, 2015 · 9 comments
Closed

New Dags are only imported when the webserver restarts! #46

r39132 opened this issue Jun 19, 2015 · 9 comments

Comments

@r39132
Copy link
Contributor

r39132 commented Jun 19, 2015

How does one add new dags to the system without restarting the webserver? Running "python my_dag.py" is not importing the dag in to the db either. I am running on EC2 with a Postgresql database. I am able to see the data gets imported only on restart of the web app.

Here is the code:

This code will become the EP Data Pipeline.

The flow may eventually look like : 
* 1. SQS Message Written Detect
* 2.a. Email Send Flow Started 
* 2.b. Database Row Written Detect
* 3. SQS Queue Empty Detect
* 4. Email Send Flow Complete 
"""
from airflow import DAG
from airflow.operators import EmailOperator
from datetime import datetime


print ' got here 1'
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2015, 1, 1),
    'email': ['sanand@agari.com'],
    'email_on_failure': True,
    'email_on_retry': True,
}

print ' got here 1'
dag = DAG('ep_demo_1', default_args=default_args)

print ' got here 1'
t1 = EmailOperator(
    task_id='email_pipeline_start',
    to='sanand@agari.com',
    subject='EP Demo Pipeline Started (TAD)',
    html_content='',
    dag=dag)

After restarting the webapp, I still don't see any entries in the dag table in the sqlite db for my dags. I see the dags in the UI -- there seems to be a difference between loading and importing a dag.

2015-06-18 17:49:21,175 - root - INFO - Loaded DAG <DAG: ep_demo_1>
2015-06-18 17:49:21,177 - root - INFO - Importing /usr/local/lib/python2.7/site-packages/airflow/example_dags/example_bash_operator.py
2015-06-18 17:49:21,178 - root - INFO - Loaded DAG <DAG: example_bash_operator>
2015-06-18 17:49:21,179 - root - INFO - Importing /usr/local/lib/python2.7/site-packages/airflow/example_dags/example_python_operator.py
2015-06-18 17:49:21,180 - root - INFO - Loaded DAG <DAG: example_python_operator>
2015-06-18 17:49:21,181 - root - INFO - Importing /usr/local/lib/python2.7/site-packages/airflow/example_dags/tutorial.py
2015-06-18 17:49:21,182 - root - INFO - Loaded DAG <DAG: tutorial>
@r39132 r39132 changed the title New Dags are only loaded when the webserver restarts! New Dags are only imported when the webserver restarts! Jun 19, 2015
@martingrayson
Copy link

I've also noticed this - it doesn't feel right keep shutting the webserver down and starting it during development. Is there something we've missed?

@mistercrunch
Copy link
Member

If you have a scheduler running in the background it will discover new DAGs which get added to the list (a central list of DAGs is maintained in the database). You can refresh individual dag by clicking the refresh icon on the DAGs page. So in a production setup new dags do get discovered properly and it works with as many servers as you have running (each one of our workers runs the web server as well).

We used to have an endpoint that would tell the web server to refresh its DagBag, and do it in the scope of a web request but moved away from that as we started running muli-threaded / multiple web servers as hitting the endpoint would only refresh one thread randomly. It's a much more complicated problem when you have lots of DagBags to keep in sync. A temporary solution is to expire the DagBag and force it to refresh periodically on the web server. I could also add a hidden endpoint that you'd hit when developing to refresh the DagBag without restarting the web server.

The longer term solution is to have 100% stateless web servers that load DAGs from the pickles (serialized DAGs) from the database. The reason why this is not 100% working at this time is that jinja templates aren't serializable and I haven't found a hack to serialize them yet.

@r39132
Copy link
Contributor Author

r39132 commented Jun 19, 2015

Since we are trying this out for the first time and essentially trying to convince ourselves and teammates that this is cool project, we would be running in "developer" or "test" mode, before setting it up for production. If developers don't like it here, it won't make it to production. Hence, it would be nice if we did not need to set up all of the bells and whistles just to take the UI features for a spin.

I alluded to this in Issue 51

Why not just let people drop their dags in your dag directory on a file system? Then run a background thread to check the dag directory (pointed to in airflow.cfg), compare it to the DB, and update the DB? Then have each worker poll the db and fetch new dags from the db.

@artwr
Copy link
Contributor

artwr commented Jun 19, 2015

Hey @r39132 ,
If you want to quickly iterate on DAGs, you have the ability to start the webserver with the "-d" flag. The webserver will start in debug mode, and will restart when files change in either the app or the dags folder. This might help taking some of the UI features for a spin without too much overhead.
This is what we use to test our dags DAGs before we integrate them in our production environment. I hope this helps.

@r39132
Copy link
Contributor Author

r39132 commented Jun 20, 2015

Thanks @artwr That works for us. BTW, if I run celeryexector, do I need to start another airflow worker? I essentially only want the celeryexecutor so that I can use the UI features and have people be "blown away" by the product. But, I don't want to have to stand up more than one worker right now, if I don't need to. I will have monit restart the webapp.

@mistercrunch
Copy link
Member

One worker is all you need (until you need more slots). If you don't have any workers up, messages are just going to build up in the queue.

@r39132 r39132 closed this as completed Jun 22, 2015
@r39132
Copy link
Contributor Author

r39132 commented Jun 22, 2015

So, why is that my dags not being imported into the database? I see it on the UI, but not in the DB.

I'm hitting the refresh button and running in debug mode (airflow webserver -d -p 8080) . I'm using the LocalExecutor and am at a loss. Also, I'm noticing the new files are not automatically showing up in the UI.

@r39132 r39132 reopened this Jun 22, 2015
@mistercrunch
Copy link
Member

Are you not running the scheduler?

@r39132
Copy link
Contributor Author

r39132 commented Jun 23, 2015

Ok, I see the confusion. I've been reading "scheduler" as a related to "celery scheduler". Explains a lot. Thanks. We are having some issues with Celery so running Local at small scale should be fine for a while.

@r39132 r39132 closed this as completed Jun 23, 2015
rajatsri28 pushed a commit to rajatsri28/airflow that referenced this issue May 13, 2020
…onfig section from env var (apache#46)


Co-authored-by: Vishesh Jain <visheshj@twitter.com>
mobuchowski pushed a commit to mobuchowski/airflow that referenced this issue Jan 4, 2022
* Upload code coverage to codecov

Signed-off-by: wslulciuc <willy@datakin.com>

* Fix code coverage reporting

Signed-off-by: wslulciuc <willy@datakin.com>

* continued: Fix code coverage reporting

Signed-off-by: wslulciuc <willy@datakin.com>
mobuchowski added a commit to mobuchowski/airflow that referenced this issue Jan 4, 2022
* Standarize custom facet naming

Signed-off-by: Maciej Obuchowski <maciej.obuchowski@getindata.com>
potiuk pushed a commit that referenced this issue Jun 22, 2024
Added support of teradata authorization object for cloud transfer operators to teradata. (#46)

1. Added teradata authorization object for authorization in transfer operators
2. Added security token support in s3toteradata transfer operator
romsharon98 pushed a commit to romsharon98/airflow that referenced this issue Jul 26, 2024
Added support of teradata authorization object for cloud transfer operators to teradata. (apache#46)

1. Added teradata authorization object for authorization in transfer operators
2. Added security token support in s3toteradata transfer operator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants