-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Dags are only imported when the webserver restarts! #46
Comments
I've also noticed this - it doesn't feel right keep shutting the webserver down and starting it during development. Is there something we've missed? |
If you have a scheduler running in the background it will discover new DAGs which get added to the list (a central list of DAGs is maintained in the database). You can refresh individual dag by clicking the refresh icon on the DAGs page. So in a production setup new dags do get discovered properly and it works with as many servers as you have running (each one of our workers runs the web server as well). We used to have an endpoint that would tell the web server to refresh its DagBag, and do it in the scope of a web request but moved away from that as we started running muli-threaded / multiple web servers as hitting the endpoint would only refresh one thread randomly. It's a much more complicated problem when you have lots of DagBags to keep in sync. A temporary solution is to expire the DagBag and force it to refresh periodically on the web server. I could also add a hidden endpoint that you'd hit when developing to refresh the DagBag without restarting the web server. The longer term solution is to have 100% stateless web servers that load DAGs from the pickles (serialized DAGs) from the database. The reason why this is not 100% working at this time is that jinja templates aren't serializable and I haven't found a hack to serialize them yet. |
Since we are trying this out for the first time and essentially trying to convince ourselves and teammates that this is cool project, we would be running in "developer" or "test" mode, before setting it up for production. If developers don't like it here, it won't make it to production. Hence, it would be nice if we did not need to set up all of the bells and whistles just to take the UI features for a spin. I alluded to this in Issue 51 Why not just let people drop their dags in your dag directory on a file system? Then run a background thread to check the dag directory (pointed to in airflow.cfg), compare it to the DB, and update the DB? Then have each worker poll the db and fetch new dags from the db. |
Hey @r39132 , |
Thanks @artwr That works for us. BTW, if I run celeryexector, do I need to start another airflow worker? I essentially only want the celeryexecutor so that I can use the UI features and have people be "blown away" by the product. But, I don't want to have to stand up more than one worker right now, if I don't need to. I will have monit restart the webapp. |
One worker is all you need (until you need more slots). If you don't have any workers up, messages are just going to build up in the queue. |
So, why is that my dags not being imported into the database? I see it on the UI, but not in the DB. I'm hitting the refresh button and running in debug mode (airflow webserver -d -p 8080) . I'm using the LocalExecutor and am at a loss. Also, I'm noticing the new files are not automatically showing up in the UI. |
Are you not running the scheduler? |
Ok, I see the confusion. I've been reading "scheduler" as a related to "celery scheduler". Explains a lot. Thanks. We are having some issues with Celery so running Local at small scale should be fine for a while. |
…onfig section from env var (apache#46) Co-authored-by: Vishesh Jain <visheshj@twitter.com>
* Upload code coverage to codecov Signed-off-by: wslulciuc <willy@datakin.com> * Fix code coverage reporting Signed-off-by: wslulciuc <willy@datakin.com> * continued: Fix code coverage reporting Signed-off-by: wslulciuc <willy@datakin.com>
* Standarize custom facet naming Signed-off-by: Maciej Obuchowski <maciej.obuchowski@getindata.com>
Added support of teradata authorization object for cloud transfer operators to teradata. (#46) 1. Added teradata authorization object for authorization in transfer operators 2. Added security token support in s3toteradata transfer operator
Added support of teradata authorization object for cloud transfer operators to teradata. (apache#46) 1. Added teradata authorization object for authorization in transfer operators 2. Added security token support in s3toteradata transfer operator
How does one add new dags to the system without restarting the webserver? Running "python my_dag.py" is not importing the dag in to the db either. I am running on EC2 with a Postgresql database. I am able to see the data gets imported only on restart of the web app.
Here is the code:
After restarting the webapp, I still don't see any entries in the dag table in the sqlite db for my dags. I see the dags in the UI -- there seems to be a difference between loading and importing a dag.
The text was updated successfully, but these errors were encountered: