Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synchronous CLI option #202

Closed
ThrawnCA opened this issue Jan 3, 2024 · 6 comments
Closed

Add synchronous CLI option #202

ThrawnCA opened this issue Jan 3, 2024 · 6 comments

Comments

@ThrawnCA
Copy link
Collaborator

ThrawnCA commented Jan 3, 2024

It would sometimes be beneficial to have the ability to immediately and synchronously run an XLoader update from the command line, instead of submitting a job to the queue (which may be long). Rather like how ckanext-archiver has both update to submit an asynchronous job, or update-test to run immediately.

ThrawnCA added a commit to qld-gov-au/ckanext-xloader that referenced this issue Jan 31, 2024
- This will allow eg support staff processing a critical resource immediately while a large job is occupying the queue
ThrawnCA added a commit to qld-gov-au/ckanext-xloader that referenced this issue Feb 2, 2024
[QOLSVC-4689] add CLI option to process datasets immediately, ckan#202
ThrawnCA added a commit to qld-gov-au/ckanext-xloader that referenced this issue Mar 13, 2024
…n#202

- Add CLI flag for loading immediately instead of queueing
- Allow sysadmins to specify an alternate queue to run on
@KatiRG
Copy link

KatiRG commented Mar 20, 2024

@ThrawnCA Is it possible to run xloader synchronously in the code itself?

@duttonw
Copy link
Collaborator

duttonw commented Mar 20, 2024

The simple answer is yes, but at what cost. On our ckan instance you may be waiting up to 40 minutes to process a single file. That would mean the publisher doing the request would be left hanging or the cdn in front would have cut the connection after 60-300sec of wait time.

The front ends goal is to be as fast as possible which pushes for said need to decouple. It may be possible to notify triggers instead of a queue but it's still putting load on that single instance instead of across the front end farm or dedicated loading server.

When you have the harvester plugin installed it requests you to make two extra queues. The general idea is you could run direct via cli or add to a less busy queue to get priority loading.

@KatiRG
Copy link

KatiRG commented Mar 20, 2024

Yes, that is the cost. Unfortunately, we are developing a data validation pipeline that requires the dataset to already be in the datastore before it can continue. So we have to wait for xloader to finish the job. Is there already a function that can wait for xloader to complete the submission? Or at least, a function that can continually poll the task status and terminate when the status is complete?

@duttonw
Copy link
Collaborator

duttonw commented Mar 20, 2024

Based on your requirements, do create a new extension that hooks into xloader finish and triggers your pipeline via https.
You may want to make a sysadmin screen to view/alter end points and I do feel building for a popular spec like aws sqs for the payload would be best as well as maybe the ability to fire different endpoints based on the resource uuid/group/org or dataset tags.

Please see https://github.com/ckan/ckanext-xloader/blob/master/ckanext/xloader/action.py#L270 where there is already a hook system already available for you to connect into when the loader is complete || table loaded but index's still being built.

 for plugin in p.PluginImplementations(xloader_interfaces.IXloader):
            plugin.after_upload(context, resource_dict, dataset_dict)

Looking over https://data.ontario.ca/api/3/action/status_show vs https://www.data.qld.gov.au/api/3/action/status_show

You may want to ensure that datastore tables are not dropped or messed up. via having the ckanext-validation plugin installed and look at trying out this current being worked on piece of code to belay xloader till a schema pass occurs. https://github.com/qld-gov-au/ckanext-xloader/pull/71/files

To make this simpler for authors who just want to ensure columns are not messed up, I also highly suggest you add the nice validation gui generator ckanext-validation-schema-generator

@KatiRG
Copy link

KatiRG commented Mar 20, 2024

Thank you @duttonw ! Will review

@duttonw
Copy link
Collaborator

duttonw commented Mar 20, 2024

@KatiRG for your reference,

www.data.qld.gov.au uses the plugins shown on this file https://github.com/qld-gov-au/ckan-qld-infrastructure/blob/master/vars/shared-OpenData.var.yml

In that repo and many of our plugins, we show a way of loading them into vanilla docker container for full end to end testing in a browser TDD style.

qld-gov-au team currently deploy via Chef (OpsWorks). Major work is underway to move this to Systems Manager Chef, you can look under the hood here: https://github.com/qld-gov-au/opswx-ckan-cookbook

I'm unsure how your kit is setup but we (in qld-gov-au, @ThrawnCA and co) have been working on trying to ensure that ckan can roll up and down the number of nodes required to the load being received to be as cost effective as possible as well as allowing us to have minimal to no downtime with one click deploy standards.

ThrawnCA added a commit that referenced this issue Apr 5, 2024
add more options for maintainers to expedite XLoader runs, GitHub #202
@ThrawnCA ThrawnCA closed this as completed Apr 5, 2024
peterVorman added a commit to OpenGov-OpenData/ckanext-xloader that referenced this issue Aug 5, 2024
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits)
  Add note about 2.11 support
  Update images and actions, test 2.11
  feat(tests): added tyoe guess on mixed integers;
  add more ignorable blank lines to test sample, ckan#206
  add more options for maintainers to expedite XLoader runs, GitHub ckan#202
  strip extra space for column name
  In plugin.py, there is an fix of resource format key error
  fix list syntax for combining range and dict
  skip blank rows in source files, ckan#206
  add unit test for handling empty lines, ckan#206
  add sample file with extra blank line at end, ckan#206
  fix(tests): less complicated;
  further cleanup
  fix(tests): finalized test method;
  fix(tests): subrequest params;
  fix(tests): module path;
  feat(tests): added new test;
  fix(syntax): flake8;
  fix(helpers): comments and better syntax;
  fix(templates): set in block;
  ...

# Resolved conflicts:
#	.github/workflows/test.yml
#	ckanext/xloader/controllers.py
#	ckanext/xloader/plugin.py
#	ckanext/xloader/templates-bs2/package/resource_edit_base.html
#	ckanext/xloader/templates/package/resource_edit_base.html
#	ckanext/xloader/utils.py
#	ckanext/xloader/views.py
peterVorman added a commit to OpenGov-OpenData/ckanext-xloader that referenced this issue Aug 5, 2024
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits)
  Add note about 2.11 support
  Update images and actions, test 2.11
  feat(tests): added tyoe guess on mixed integers;
  add more ignorable blank lines to test sample, ckan#206
  add more options for maintainers to expedite XLoader runs, GitHub ckan#202
  strip extra space for column name
  In plugin.py, there is an fix of resource format key error
  fix list syntax for combining range and dict
  skip blank rows in source files, ckan#206
  add unit test for handling empty lines, ckan#206
  add sample file with extra blank line at end, ckan#206
  fix(tests): less complicated;
  further cleanup
  fix(tests): finalized test method;
  fix(tests): subrequest params;
  fix(tests): module path;
  feat(tests): added new test;
  fix(syntax): flake8;
  fix(helpers): comments and better syntax;
  fix(templates): set in block;
  ...

# Resolved conflicts:
#	.github/workflows/test.yml
#	ckanext/xloader/controllers.py
#	ckanext/xloader/plugin.py
#	ckanext/xloader/templates-bs2/package/resource_edit_base.html
#	ckanext/xloader/templates/package/resource_edit_base.html
#	ckanext/xloader/utils.py
#	ckanext/xloader/views.py
peterVorman added a commit to OpenGov-OpenData/ckanext-xloader that referenced this issue Aug 5, 2024
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits)
  Add note about 2.11 support
  Update images and actions, test 2.11
  feat(tests): added tyoe guess on mixed integers;
  add more ignorable blank lines to test sample, ckan#206
  add more options for maintainers to expedite XLoader runs, GitHub ckan#202
  strip extra space for column name
  In plugin.py, there is an fix of resource format key error
  fix list syntax for combining range and dict
  skip blank rows in source files, ckan#206
  add unit test for handling empty lines, ckan#206
  add sample file with extra blank line at end, ckan#206
  fix(tests): less complicated;
  further cleanup
  fix(tests): finalized test method;
  fix(tests): subrequest params;
  fix(tests): module path;
  feat(tests): added new test;
  fix(syntax): flake8;
  fix(helpers): comments and better syntax;
  fix(templates): set in block;
  ...

# Resolved conflicts:
#	.github/workflows/test.yml
#	ckanext/xloader/controllers.py
#	ckanext/xloader/plugin.py
#	ckanext/xloader/templates-bs2/package/resource_edit_base.html
#	ckanext/xloader/templates/package/resource_edit_base.html
#	ckanext/xloader/utils.py
#	ckanext/xloader/views.py
peterVorman added a commit to OpenGov-OpenData/ckanext-xloader that referenced this issue Aug 5, 2024
* commit 'a96ce28c589dfe6b1b850d8eeeb14f1e1dfe9759': (80 commits)
  Add note about 2.11 support
  Update images and actions, test 2.11
  feat(tests): added tyoe guess on mixed integers;
  add more ignorable blank lines to test sample, ckan#206
  add more options for maintainers to expedite XLoader runs, GitHub ckan#202
  strip extra space for column name
  In plugin.py, there is an fix of resource format key error
  fix list syntax for combining range and dict
  skip blank rows in source files, ckan#206
  add unit test for handling empty lines, ckan#206
  add sample file with extra blank line at end, ckan#206
  fix(tests): less complicated;
  further cleanup
  fix(tests): finalized test method;
  fix(tests): subrequest params;
  fix(tests): module path;
  feat(tests): added new test;
  fix(syntax): flake8;
  fix(helpers): comments and better syntax;
  fix(templates): set in block;
  ...

# Resolved conflicts:
#	.github/workflows/test.yml
#	ckanext/xloader/controllers.py
#	ckanext/xloader/plugin.py
#	ckanext/xloader/templates-bs2/package/resource_edit_base.html
#	ckanext/xloader/templates/package/resource_edit_base.html
#	ckanext/xloader/utils.py
#	ckanext/xloader/views.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants