-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add synchronous CLI option #202
Comments
- This will allow eg support staff processing a critical resource immediately while a large job is occupying the queue
[QOLSVC-4689] add CLI option to process datasets immediately, ckan#202
…n#202 - Add CLI flag for loading immediately instead of queueing - Allow sysadmins to specify an alternate queue to run on
@ThrawnCA Is it possible to run xloader synchronously in the code itself? |
The simple answer is yes, but at what cost. On our ckan instance you may be waiting up to 40 minutes to process a single file. That would mean the publisher doing the request would be left hanging or the cdn in front would have cut the connection after 60-300sec of wait time. The front ends goal is to be as fast as possible which pushes for said need to decouple. It may be possible to notify triggers instead of a queue but it's still putting load on that single instance instead of across the front end farm or dedicated loading server. When you have the harvester plugin installed it requests you to make two extra queues. The general idea is you could run direct via cli or add to a less busy queue to get priority loading. |
Yes, that is the cost. Unfortunately, we are developing a data validation pipeline that requires the dataset to already be in the datastore before it can continue. So we have to wait for xloader to finish the job. Is there already a function that can wait for xloader to complete the submission? Or at least, a function that can continually poll the task status and terminate when the status is complete? |
Based on your requirements, do create a new extension that hooks into xloader finish and triggers your pipeline via https. Please see https://github.com/ckan/ckanext-xloader/blob/master/ckanext/xloader/action.py#L270 where there is already a hook system already available for you to connect into when the loader is complete || table loaded but index's still being built.
Looking over https://data.ontario.ca/api/3/action/status_show vs https://www.data.qld.gov.au/api/3/action/status_show You may want to ensure that datastore tables are not dropped or messed up. via having the ckanext-validation plugin installed and look at trying out this current being worked on piece of code to belay xloader till a schema pass occurs. https://github.com/qld-gov-au/ckanext-xloader/pull/71/files To make this simpler for authors who just want to ensure columns are not messed up, I also highly suggest you add the nice validation gui generator ckanext-validation-schema-generator |
Thank you @duttonw ! Will review |
@KatiRG for your reference, www.data.qld.gov.au uses the plugins shown on this file https://github.com/qld-gov-au/ckan-qld-infrastructure/blob/master/vars/shared-OpenData.var.yml In that repo and many of our plugins, we show a way of loading them into vanilla docker container for full end to end testing in a browser TDD style. qld-gov-au team currently deploy via Chef (OpsWorks). Major work is underway to move this to Systems Manager Chef, you can look under the hood here: https://github.com/qld-gov-au/opswx-ckan-cookbook I'm unsure how your kit is setup but we (in qld-gov-au, @ThrawnCA and co) have been working on trying to ensure that ckan can roll up and down the number of nodes required to the load being received to be as cost effective as possible as well as allowing us to have minimal to no downtime with one click deploy standards. |
add more options for maintainers to expedite XLoader runs, GitHub #202
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits) Add note about 2.11 support Update images and actions, test 2.11 feat(tests): added tyoe guess on mixed integers; add more ignorable blank lines to test sample, ckan#206 add more options for maintainers to expedite XLoader runs, GitHub ckan#202 strip extra space for column name In plugin.py, there is an fix of resource format key error fix list syntax for combining range and dict skip blank rows in source files, ckan#206 add unit test for handling empty lines, ckan#206 add sample file with extra blank line at end, ckan#206 fix(tests): less complicated; further cleanup fix(tests): finalized test method; fix(tests): subrequest params; fix(tests): module path; feat(tests): added new test; fix(syntax): flake8; fix(helpers): comments and better syntax; fix(templates): set in block; ... # Resolved conflicts: # .github/workflows/test.yml # ckanext/xloader/controllers.py # ckanext/xloader/plugin.py # ckanext/xloader/templates-bs2/package/resource_edit_base.html # ckanext/xloader/templates/package/resource_edit_base.html # ckanext/xloader/utils.py # ckanext/xloader/views.py
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits) Add note about 2.11 support Update images and actions, test 2.11 feat(tests): added tyoe guess on mixed integers; add more ignorable blank lines to test sample, ckan#206 add more options for maintainers to expedite XLoader runs, GitHub ckan#202 strip extra space for column name In plugin.py, there is an fix of resource format key error fix list syntax for combining range and dict skip blank rows in source files, ckan#206 add unit test for handling empty lines, ckan#206 add sample file with extra blank line at end, ckan#206 fix(tests): less complicated; further cleanup fix(tests): finalized test method; fix(tests): subrequest params; fix(tests): module path; feat(tests): added new test; fix(syntax): flake8; fix(helpers): comments and better syntax; fix(templates): set in block; ... # Resolved conflicts: # .github/workflows/test.yml # ckanext/xloader/controllers.py # ckanext/xloader/plugin.py # ckanext/xloader/templates-bs2/package/resource_edit_base.html # ckanext/xloader/templates/package/resource_edit_base.html # ckanext/xloader/utils.py # ckanext/xloader/views.py
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits) Add note about 2.11 support Update images and actions, test 2.11 feat(tests): added tyoe guess on mixed integers; add more ignorable blank lines to test sample, ckan#206 add more options for maintainers to expedite XLoader runs, GitHub ckan#202 strip extra space for column name In plugin.py, there is an fix of resource format key error fix list syntax for combining range and dict skip blank rows in source files, ckan#206 add unit test for handling empty lines, ckan#206 add sample file with extra blank line at end, ckan#206 fix(tests): less complicated; further cleanup fix(tests): finalized test method; fix(tests): subrequest params; fix(tests): module path; feat(tests): added new test; fix(syntax): flake8; fix(helpers): comments and better syntax; fix(templates): set in block; ... # Resolved conflicts: # .github/workflows/test.yml # ckanext/xloader/controllers.py # ckanext/xloader/plugin.py # ckanext/xloader/templates-bs2/package/resource_edit_base.html # ckanext/xloader/templates/package/resource_edit_base.html # ckanext/xloader/utils.py # ckanext/xloader/views.py
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits) Add note about 2.11 support Update images and actions, test 2.11 feat(tests): added tyoe guess on mixed integers; add more ignorable blank lines to test sample, ckan#206 add more options for maintainers to expedite XLoader runs, GitHub ckan#202 strip extra space for column name In plugin.py, there is an fix of resource format key error fix list syntax for combining range and dict skip blank rows in source files, ckan#206 add unit test for handling empty lines, ckan#206 add sample file with extra blank line at end, ckan#206 fix(tests): less complicated; further cleanup fix(tests): finalized test method; fix(tests): subrequest params; fix(tests): module path; feat(tests): added new test; fix(syntax): flake8; fix(helpers): comments and better syntax; fix(templates): set in block; ... # Resolved conflicts: # .github/workflows/test.yml # ckanext/xloader/controllers.py # ckanext/xloader/plugin.py # ckanext/xloader/templates-bs2/package/resource_edit_base.html # ckanext/xloader/templates/package/resource_edit_base.html # ckanext/xloader/utils.py # ckanext/xloader/views.py
It would sometimes be beneficial to have the ability to immediately and synchronously run an XLoader update from the command line, instead of submitting a job to the queue (which may be long). Rather like how ckanext-archiver has both
update
to submit an asynchronous job, orupdate-test
to run immediately.The text was updated successfully, but these errors were encountered: