Skip to content

dvolgyes/zenodo_get

Repository files navigation

zenodo_get: a downloader for Zenodo records

AppVeyor:Build status

Coveralls:Coverage Status Codecov:codecov Snyk:Known Vulnerabilities

This is a Python3 tool that can mass-download files from Zenodo records.

pyversion PyPI - License DOI

Source code

The code is hosted at Github, former Gitlab hosting is discontinued.

Install

From PyPI:

pip3 install zenodo_get

Or from Github:

pip3 install git+https://github.com/dvolgyes/zenodo_get

Afterwards, you can query the command line options:

zenodo_get -h

but the default settings should work for most use cases:

zenodo_get RECORD_ID_OR_DOI

Documentation

The tool itself is simple, and the help message is reasonable:

zenodo_get -h

but if you need more, open a github ticket and explain what is missing.

Basic usage:

zenodo_get RECORD_ID_OR_DOI

Special parameters:

  • -m : generate md5sums.txt for verification. Beware, if md5sums.txt is present in the dataset, it will overwrite this generated file. Verification example: md5sum -c md5sums.txt
  • -g GLOB : A glob expression to select a subset of record files.
  • -w FILE : instead of downloading the record files, it will generate a FILE which contains direct links to the Zenodo site. These links could be downloaded with any download manager, e.g. with wget: wget -i urls.txt
  • -e : continue on error. It will skip the files with errors, but it will try to download the rest of the files.
  • -k : keep files: it will keep files with invalid md5 checksum. The main purpose is debugging.
  • -R N: retry on error N times.
  • -p N: Waiting time in sec before retry attempt. Default: 0.5 sec.
  • -n : do not continue. The default behaviour is to download only the files which are not yet download or where the checksum does not match with the file. This flag disables this feature, and it will force download existing files, and assigning a new name to the files (e.g. file(1).ext )

Remark for batch processing: the program always exits with non-zero exit code, if any error has happened, for instance, checksum mismatch, download error, time-out, etc. Only perfectly correct downloads end with 0 exit code.

Citation

You don't really need to cite this software, except if you use it for another academic publication. E.g. if you download something from Zenodo with zenodo-get: no need to cite anything. If you download a lot from Zenodo, and you publish about Zenodo, and my tool is integral part of the methodology, then you could cite it. You could always ask the code to print the most up-to-date reference producing plain text and bibtex references too:

zenodo_get --cite