Cachito is a service to store (and serve) source code for applications. Upon a request, Cachito will fetch a specific revision of a given repository from the Internet and store it permanently in its internal storage. Namely, it stores the source code for a specific Git commit from a given Git repository, which could be from a forge such as GitHub or GitLab. This way, even if that repository (or that revision) is deleted, it is still possible to track the pristine source code for the original sources. In fact, if the sources have already been previously fetched, Cachito will simply serve the stored copy.
Cachito also supports identifying and permanently storing dependencies for certain package managers and making them available for building the application. Like it does for source code, future requests that utilize these same dependencies will be taken from Cachito's internal storage rather than be fetched from the Internet. See the Package Manager Feature Support section for the package managers that Cachito currently supports.
Cachito will produce bundles as the output artifact of a request. The bundle is a tarball that contains the source code of the application and all the sources of its dependencies. For some package managers, these dependencies can be used directly for building the application. Other package managers will provide an alternative mechanism for this (e.g. a custom npm registry with the declared npm dependencies). Regardless of if the dependencies in the bundle are used for building the application, they are always present so that the source of these dependencies can be published alongside the application for license compliance.
- More Documentation
- Coding Standards
- Quick Start
- Pre-built Container Images
- Prerequisites
- Development
- Database Migrations
- API Documentation
- Configuring Workers
- Configuring the API
- Flags
- Nexus
- Package Managers
Documents that outgrew this README can be found in the docs/
drectory.
- docs/
- dependency_confusion.md is a short analysis of a supply chain attack and its impact on Cachito users
- metadata.md describes Cachito request metadata
- pip.md is a guide for using pip with Cachito
- using_requests_locally.md explains how to use Cachito requests to run builds on your PC
- tracing.md documents Cachito's support for OpenTelemetry tracing
The codebase conforms to the style enforced by flake8
with the following exceptions:
- The maximum line length allowed is 100 characters instead of 80 characters
In addition to flake8
, docstrings are also enforced by the plugin flake8-docstrings
with
the following exceptions:
- D100: Missing docstring in public module
- D104: Missing docstring in public package
- D105: Missing docstring in magic method
The format of the docstrings should be in the reStructuredText style such as:
Set the state of the request using the Cachito API.
:param int request_id: the ID of the Cachito request
:param str state: the state to set the Cachito request to
:param str state_reason: the state reason to set the Cachito request to
:return: the updated request
:rtype: dict
:raise CachitoError: if the request to the Cachito API fails
Additionally, black
is used to enforce other coding standards.
To verify that your code meets these standards, you may run tox -e black,flake8
.
Run the application locally (requires docker compose):
make run
Note: while running Cachito locally requires docker compose, that does not mean you have to use Docker! Podman 3.0 or greater can serve as a replacement, see https://www.redhat.com/sysadmin/podman-docker-compose.
Alternatively, you could also run the application with
podman-compose by setting the
CACHITO_COMPOSE_ENGINE
variable to the path of the podman-compose
script.
Unfortunately, the latest release of podman-compose
contains various bugs making
it unusable for running Cachito locally. Use the script from the devel
branch instead.
To facilitate this, set CACHITO_COMPOSE_ENGINE
to the special value podman-compose-auto
.
which will instruct the Makefile to download and use the correct version of podman-compose
.
Be sure to pre-install the dependencies required by podman-compose
, currently PyYAML
.
The script is available in ./tmp/podman_compose.py
. You may use this script to interact with
the local deployment.
make run CACHITO_COMPOSE_ENGINE=podman-compose-auto
Verify in the browser at http://localhost:8080/
Use curl to make requests:
# List all requests
curl http://localhost:8080/api/v1/requests
# Create a new request
curl -X POST -H "Content-Type: application/json" http://localhost:8080/api/v1/requests -d \
'{
"repo": "https://github.com/release-engineering/retrodep.git",
"ref": "e1be527f39ec31323f0454f7d1422c6260b00580",
"pkg_managers": ["gomod"]
}'
# Check the status of a request
curl http://localhost:8080/api/v1/requests/1
# Download the source archive for a completed request
curl http://localhost:8080/api/v1/requests/1/download -o source.tar.gz
Cachito container images are automatically built when changes are merged. There are two images: an httpd based image with the Cachito API and a Celery worker image with the Cachito worker code.
quay.io/containerbuildsystem/cachito-api:latest
quay.io/containerbuildsystem/cachito-workers:latest
This is built to be used with Python 3.
Some Flask dependencies are compiled during installation, so gcc
and Python header files need to be present.
For example, on Fedora:
dnf install gcc python3-devel
You may create a virtualenv with Cachito and its dependencies installed with the following command:
make venv
This installs Cachito in develop mode which allows modifying the source code directly without needing to reinstall Cachito. This is really useful for syntax highlighting in your IDE, however, it's not practical to use as a development environment since Cachito has dependencies on other services.
NOTE: you may need to ensure that you have some packages installed. In Fedora, you will need
yum install python3.11 python3-devel python3-virtualenv gcc krb5-devel
where python3.11
is the version of python required based on tox.ini
.
You may create and run the containerized development environment with docker compose (v2) with the following command:
make run-start
The will automatically create and run the following containers:
- athens - the Athens instance responsible for permanently storing
dependencies for the
gomod
package manager. - cachito-api - the Cachito REST API. This is accessible at http://localhost:8080.
- cachito-worker - the Cachito Celery worker. This container is also responsible for configuring Nexus at startup.
- db - the Postgresql database used by the Cachito REST API.
- nexus - the Sonatype Nexus Repository Manager
instance that is responsible for permanently storing dependencies for the
npm
package manager. The management UI is accessible at http://localhost:8082. The username isadmin
and the password isadmin
. - rabbitmq - the RabbitMQ instance for communicating between the API and the worker. The
management UI is accessible at http://localhost:8081. The username is
cachito
and the password iscachito
.
After the development environment is running, you can submit jobs to it with curl
requests
curl -X POST -H "Content-Type: application/json" http://localhost:8080/api/v1/requests -d \
'{
"repo": "https://github.com/athos-ribeiro/cachito-sample-pip-package.git",
"ref": "51ffb9c2412d50953ed9732c67267e5d2ff9aa68",
"pkg_managers": ["pip"],
"packages": {"pip": [{"path": "."}, {"path": "subpackage"}]}
}'
The REST API and the worker will restart if the source code is modified. Please note that the REST API may stop restarting if there is a syntax error.
If you suspect that the images used for docker compose
are out of date, you can
run the containerized development environment while forcing a rebuild of the images with the following
command:
make run-build-start
If you just want to force a rebuild of the images without running them, you can use
make run-build
To run the unit tests with tox, you may run the following command:
make test-unit
To run the integration tests with tox, you may run the following command:
make test-integration
By default, some tests will require custom configuration and will run against your local development environment. Read the integration tests readme for more information.
NOTE: The containerized development environment needs to be running before the integration tests can pass.
Instead of running the entire unit/integration test suite, you can also run a specific set of tests.
make test-suite TOX_ARGS=<test-suite-identifier>
The test-suite-identifier
can be pulled from the test result in the tox
output or constructed from
the filepath filepath and test function. For example, if you want to run
test_fetch_gomod_source
,
you would call:
make test-suite TOX_ARGS=tests/test_workers/test_tasks/test_gomod.py::test_fetch_gomod_source
Omitting the TOX_ARGS
will run all tests without performing black
/flake8
validation.
In addition to running specific tests, parameters can be passed into tox
with TOX_ARGS
and the
environment can be configured with TOX_ENVLIST
.
make test-suite TOX_ARGS="-x --no-cov tests/test_workers/test_tasks/test_gomod.py"
By default, TOX_ENVLIST
is set to python3.11
indicating that it should run on that version.
If adding environment parameters to tox
, ensure that you are setting the Python version if needed.
To remove the virtualenv, built distributions, and the local development environment, you may run the following command:
make clean
If you are using podman, do not forget to set the CACHITO_COMPOSE_ENGINE
variable:
make clean CACHITO_COMPOSE_ENGINE=podman-compose
To add more Python dependencies, add them to the following files:
If you're wondering why you need to add dependencies to both files (setup.py and one of the requirements files), see install_requires vs requirements files.
Afterwards, pip-compile the dependencies via make pip-compile
(you may need to run make venv
first, unless the venv already exists).
Additionally, if any of the newly added dependencies in the generated requirements*.txt
files
need to be compiled from C code, please install any missing C libraries in the corresponding
Dockerfile(s): requirements.txt is used in both, requirements-web.txt only in api.
If your Cachito worker needs to access private repositories in your development environment, you
may mount a
.netrc file
by adding the volume mount - /path/to/.netrc:/root/.netrc:ro,z
in your docker-compose.yml
file under the cachito-worker
container.
More details here.
This is how you would use the example request above locally (assuming it is request #1).
bin/cachito-download.sh localhost:8080/api/v1/requests/1 /tmp/cachito-test
cd /tmp/cachito-test/remote-source/
# sed will sometimes be needed for requests from the dev environment
sed 's/nexus:8081/localhost:8082/g' --in-place cachito.env app/requirements.txt
# you don't *have* to use a container but having a clean environment is usually desirable
podman run --net=host --rm -ti -v "$PWD:/remote-source:z" -w "/remote-source" fedora:33
# <inside the container>
dnf -y install python3-pip
source cachito.env
cd app
pip install -r requirements.txt
python3 setup.py install
You need to have jq installed for the script to work.
Follow the steps below for database data and/or schema migrations:
- Checkout the master branch and ensure no schema changes are present in
cachito/web/models.py
- Set
SQLALCHEMY_DATABASE_URI
tosqlite:///cachito-migration.db
incachito/web/config.py
under theConfig
class - Run
cachito db upgrade
which will create an empty database in the root of your Git repository calledcachito-migration.db
with the current schema applied - Checkout a new branch where the changes are to be made
- In case of schema changes,
- Apply any schema changes to
cachito/web/models.py
- Run
cachito db migrate
which will autogenerate a migration script incachito/web/migrations/versions
- Apply any schema changes to
- In case of no schema changes,
- Run
cachito db revision
to create an empty migration script file
- Run
- Rename the migration script so that the suffix has a description of the change
- Modify the docstring of the migration script
- For data migrations, define the schema of any tables you will be modifying. This is so that it captures the schema of the time of the migration and not necessarily what is in models.py since that reflects the latest schema.
- Modify the
upgrade
function to make the adjustments as necessary - Modify the
downgrade
function to reverse the changes that were made in theupgrade
function - Make any adjustments to the migration script as necessary
- To test the migration script,
- Populate the database with some dummy data as per the requirement
- Run
cachito db upgrade
(see upgrade optional data below) - Also test the downgrade by running
cachito db downgrade <previous revision>
(where previous revision is the revision ID of the previous migration script)
- Remove the configuration of
SQLALCHEMY_DATABASE_URI
that you set earlier - Remove
cachito-migration.db
- Commit your changes
- Check "615c19a1cee1_add_npm.py" as an example that does a schema change and a data migration
There are arguments to add migration optional data while upgrading Cachito Database:
delete_data=True
- an argument to delete unused tables from the database (usage:cachito db upgrade -x delete_data=True
).
Run cachito db upgrade --help
to get more info about additional arguments consumed by custom env.py scripts.
The documentation is generated from the API specification written in the OpenAPI 3.0 format.
It is available on Cachito's root URL.
To configure a Cachito Celery worker, create a Python file at /etc/cachito/celery.py
. Any
variables set in this file will be applied to the Celery worker when running in production mode
(default).
Custom configuration for the Celery workers are listed below:
broker_url
- the URL RabbitMQ instance to connect to. See the broker_url configuration documentation.cachito_api_url
- the URL to the Cachito API (e.g.https://cachito-api.domain.local/api/v1/
).cachito_api_timeout
- the timeout when making a Cachito API request. The default is60
seconds.cachito_athens_url
- the URL to the Athens instance to use for caching gomod dependencies. This is only necessary for workers that process gomod requests.cachito_auth_cert
- the SSL certificate to be used for authentication. See https://requests.readthedocs.io/en/master/user/advanced/#client-side-certificates for reference on how to provide this certificate.cachito_auth_type
- the authentication type to use when accessing protected Cachito API endpoints. If this value isNone
, authentication will not be used. This defaults tokerberos
in production. Thecert
value is also valid and would use an SSL certificate for authentication. This requirescachito_auth_cert
to be provided.cachito_bundles_dir
- the directory for storing bundle archives which include the source archive and dependencies. This configuration is required, and the directory must already exist and be writeable.cachito_default_environment_variables
- a dictionary where the keys are names of package managers. The values are dictionaries where the keys are default environment variables to set for that package manager and the values are dictionaries with the keysvalue
andkind
. Thevalue
must be a string which specifies the value of the environment variable. Thekind
must also be a string which specifies the type of value, either"path"
or"literal"
. Checkcachito/workers/config.py::Config
for the default value of this configuration.cachito_gomod_download_max_tries
- how many times to trygo mod
subprocess calls used for downloading dependencies. Cachito will retry the entire operation for any non-zero return code.cachito_gomod_ignore_missing_gomod_file
- ifTrue
and the request specifies thegomod
package manager but there is nogo.mod
file present in the repository, Cachito will skip thegomod
package manager for the request. IfFalse
, the request will fail if thego.mod
file is missing. This is only supported if a single path is provided to thegomod
package manager. This defaults toFalse
.cachito_gomod_strict_vendor
- the bool to disable/enable the strict vendor mode. This defaults toFalse
. For a repo that has gomod dependencies, if thevendor
directory exists and this config option is set toTrue
, Cachito will fail the request.cachito_js_concurrency_limit
- the maximum number of concurrent download tasks in javascript requests. Upon reaching this limit, a task must end for another to be created. This defaults to5
.cachito_log_level
- the log level to configure the workers with (e.g.DEBUG
,INFO
, etc.).cachito_nexus_ca_cert
- the CA certificate that signed the SSL certificate used by the Nexus instance. This defaults to/etc/cachito/nexus_ca.pem
. If this file does not exist, Cachito will not provide the CA certificate in the package manager configuration.cachito_nexus_hoster_password
- the password of the Nexus service account used by Cachito for the Nexus instance that has the hosted repositories. This is used instead ofcachito_nexus_password
for uploading content if you are using the two Nexus instance approach as described in the "Nexus Common Configuration" section. If this is set,cachito_nexus_hoster_username
must also be set.cachito_nexus_hoster_url
- the URL to the Nexus instance that has the hosted repositories. This is used instead ofcachito_nexus_url
for uploading content if you are using the two Nexus instance approach as described in the "Nexus Common Configuration" section.cachito_nexus_hoster_username
- the username of the Nexus service account used by Cachito for the Nexus instance that has the hosted repositories. This is used instead ofcachito_nexus_username
for uploading content if you are using the two Nexus instance approach as described in the "Nexus Common Configuration" section. If this is set,cachito_nexus_hoster_password
must also be set.cachito_nexus_js_hosted_repo_name
- the name of the Nexus hosted repository for JavaScript package managers. This defaults tocachito-js-hosted
.cachito_nexus_max_search_attempts
- the number of times Cachito will retry searching for non PyPI assets in the raw pip repositories to retrieve a URL to append to the requirements file.cachito_nexus_npm_proxy_url
- the URL to thecachito-js
repository which is a Nexus group that points to thecachito-js-hosted
hosted repository and thecachito-js-proxy
proxy repository. This defaults tohttp://localhost:8081/repository/cachito-js/
. This only needs to change if you are using the two Nexus instance approach as described in the "Nexus For Java Script" section or you use a different name for the repository.cachito_nexus_password
- the password of the Nexus service account used by Cachito.cachito_nexus_pip_raw_repo_name
- the name of the Nexus raw repository for thepip
package manager. This defaults tocachito-pip-raw
.cachito_nexus_pypi_proxy_url
- the URL of the Nexus PyPI proxy repository for thepip
package manager. Configured using a full URL rather than just a repo name because we need the additional flexibility.cachito_nexus_rubygems_proxy_url
- the URL of the Nexus RubyGems proxy repository for therubygems
package manager. Configured using a full URL rather than just a repo name because we need the additional flexibility.cachito_nexus_rubygems_raw_repo_name
- the name of the Nexus raw repository for therubygems
package manager. This defaults tocachito-rubygems-raw
.cachito_nexus_proxy_password
- the password of the unprivileged user that has read access to the main Cachito repositories (e.g.cachito-js
). This is needed if the Nexus instance that hosts the main Cachito repositories has anonymous access disabled. This is the case if Cachito utilizes just a single Nexus instance.cachito_nexus_proxy_username
- the username of the unprivileged user that has read access to the main Cachito repositories (e.g.cachito-js
). This is needed if the Nexus instance that hosts the main Cachito repositories has anonymous access disabled. This is the case if Cachito utilizes just a single Nexus instance.cachito_nexus_request_repo_prefix
- the prefix of Nexus proxy repositories made for each request for applicable package managers (e.g.cachito-npm-1
). This defaults tocachito-
.cachito_nexus_timeout
- the timeout when making a Nexus API request. The default is60
seconds.cachito_nexus_url
- the base URL to the Nexus Repository Manager 3 instance used by Cachito.cachito_nexus_username
- the username of the Nexus service account used by Cachito. The following privileges are required:nx-repository-admin-*-*-*
,nx-repository-view-npm-*-*
,nx-roles-all
,nx-script-*-*
,nx-users-all
andnx-userschangepw
. This defaults tocachito
.cachito_npm_file_deps_allowlist
- the npm "file" dependencies that are allowed in the lock file for the "npm" package manager. This configuration is a dictionary with the keys as package names and the values as lists of dependency names. This defaults to{}
.cachito_yarn_file_deps_allowlist
- the yarn "file" dependencies that are allowed in the lock file for the "yarn" package manager. Seecachito_npm_file_deps_allowlist
.cachito_gomod_file_deps_allowlist
- the gomod dependencies that Cachito will allow to be replaced by local paths, e.g.replace github.com/org/some-module => ./staging/src/some-module
. This is a dictionary where keys are module names and values are lists of packages that the corresponding module is allowed to replace. The packages may contain wildcards supported by Python'sfnmatch
, e.g.github.com/org/*
(this will allow all packages starting withgithub.com/org/
). A submodule allowed to be replaced by a local module by default (e.g.<this-module>/submodule => ./local-module
),where a submodule is an internal module (placed in non-root directory) in a multi-module hierarchy (read more about multi-module repositories).cachito_workers_rubygems_file_deps_allowlist
- for each package, it contains a list of RubyGems PATH dependencies that are allowed to be present inGemfile.lock
. This configuration is a dictionary with the keys as package names and the values as lists of dependency names. This defaults to{}
.cachito_request_file_logs_dir
- the directory to write the request specific log files. IfNone
, per request log files are not created. This defaults toNone
.cachito_request_file_logs_format
- the format for the log messages of the request specific log files. This defaults to"[%(asctime)s %(name)s %(levelname)s %(module)s.%(funcName)s] %(message)s"
.cachito_request_file_logs_level
- the log level for the request specific log files. This defaults toDEBUG
.cachito_request_file_logs_perm
- the log file permission for the request specific log files. This defaults to0o660
.cachito_request_lifetime
- the number of days before a request that is in thecomplete
state or that is stuck in thein_progress
state will be marked as stale by thecachito-cleanup
script. This defaults to1
.cachito_request_lifetime_failed
- the number of days before a request that is in thefailed
state will be marked as stale by thecachito-cleanup
script. This defaults to7
.
cachito_sources_dir
- the directory for long-term storage of app source archives. This configuration is required, and the directory must already exist and be writeable.cachito_task_log_format
- the log format that Celery displays when a task is executing. This defaults to"[%(asctime)s #%(request_id)s %(name)s %(levelname)s %(module)s.%(funcName)s] %(message)s"
.cachito_subprocess_timeout
- a number (in seconds) to set a timeout for commands executed by thesubprocess
module. Default is 3600 seconds. A timeout is always required, and there is no way provided by Cachito to disable it. Set a larger number to give the subprocess execution more time.cachito_otlp_exporter_endpoint
- A valid URL with a port number as necessary to a OTLP/http-compatible endpoint to receive OpenTelemetry trace data.
To configure the workers to use a Kerberos keytab for authentication, set the KRB5_CLIENT_KTNAME
environment variable to the path of the keytab. Additional Kerberos configuration can be made in
/etc/krb5.conf
.
Custom configuration for the API:
CACHITO_BUNDLES_DIR
- the root of the bundles directory that is also accessible by the workers. This is used to download the bundle archives created by the workers.CACHITO_DEFAULT_PACKAGE_MANAGERS
- the default package managers to use when no package managers are specified on a request. This defaults to["gomod"]
.CACHITO_MAX_PER_PAGE
- the maximum amount of items in a page for paginated results.CACHITO_MUTUALLY_EXCLUSIVE_PACKAGE_MANAGERS
- the list of pairs of mutually exclusive package managers (e.g.[("npm", "yarn"), ("gomod", "git-submodule")]
). If two package managers are configured as mutually exclusive, then Cachito will validate that they do not process the same package in a request.CACHITO_PACKAGE_MANAGERS
- the list of enabled package managers. This defaults to["gomod"]
.CACHITO_REQUEST_FILE_LOGS_DIR
- the directory to load the request specific log files. IfNone
, per request log files information will not appear in the API response. This defaults toNone
.CACHITO_USER_REPRESENTATIVES
- the list of usernames that are allowed to submit requests on behalf of other users.CACHITO_WORKER_USERNAMES
- the list of usernames that are allowed to use the/requests/<id>
PATCH endpoint.LOGIN_DISABLED
- disables authentication requirements.CACHITO_OTLP_EXPORTER_ENDPOINT
- A valid URL with a port number as necessary to a OTLP/http-compatible endpoint to receive OpenTelemetry trace data.
Additionally, to configure the communication with the Cachito Celery workers, create a Python file
at /etc/cachito/celery.py
, and set the
broker_url
configuration to point to your RabbitMQ instance.
If you are planning to deploy Cachito with authentication enabled, you'll need to use
a web server that supplies the REMOTE_USER
environment variable when the user is
properly authenticated. A common deployment option is using httpd (Apache web server)
with the mod_auth_gssapi
module.
-
gomod-vendor
- the flag to indicate the vendoring requirement for gomod dependencies. If present in the Cachito request, Cachito will rungo mod vendor
instead ofgo mod download
to gather dependencies. See gomod vendoring for more details. -
gomod-vendor-check
- likegomod-vendor
, but if thevendor/
directory is already present, Cachito will refuse to make changes in your repository. Should be preferred overgomod-vendor
. -
force-gomod-tidy
- when used, Cachito will unconditionally rungo mod tidy
even when dependency replacments are not present. -
include-git-dir
- when used,.git
file objects are not removed from the source bundle created by Cachito. This is useful when the git history is important to the build process. -
cgo-disable
- use this flag to make Cachito setCGO_ENABLED=0
while processing gomod packages. This environment variable will only be used internally by Cachito, it will not be set in the environment variables for the completed request. Typically, you will only want to use this if your package does use C files, and the Cachito request is failing. -
remove-unsafe-symlinks
- the flag forces Cachito to remove all symlinks that points to some location outside of a cloned repository. Otherwise, if the flag isn't set, Cachito will raise a validation error right after cloning, in case when such symlinks are present in the source.
The Java Script(JS) package managers (npm, yarn) functionality relies on
Nexus Repository Manager 3 to store JS dependencies. The Nexus instance will have a
JS group repository (e.g. cachito-js
) which points to a JS hosted repository (e.g.
cachito-js-hosted
) and a JS proxy repository
(e.g. cachito-js-proxy
) that points to the npm/yarn registry (registry.npmjs.org and
registry.yarnpkg.com, which points to the same registry server). The hosted repository will contain
all non-registry dependencies and the proxy repository will contain all dependencies from the
JS registry. The union of these two repositories gives the set of all the JS dependencies ever
encountered by Cachito.
On each request, Cachito will create a proxy repository to the JS group repository
(e.g. cachito-js
). Cachito will populate this proxy repository to contain the subset of
dependencies declared in the repository's lock file. Once populated, Cachito will block the
repository from getting additional content. This prevents the consumer of the repository from
installing something that was not declared in the lock file. This is further enforced by locking
down the repository to a single user created for the request, which the consumer will use. Please
keep in mind that for this to function properly, anonymous access needs to be disabled on the Nexus
instance or at least not set to have read access on all repositories.
These repositories and users created per request are deleted when the request is marked as stale or the request fails.
The pip package manager functionality relies on Nexus Repository Manager 3 to store
pip dependencies. The Nexus instance will have a PyPI proxy repository (e.g. cachito-pip-proxy
)
that points to pypi.org and a raw repository (e.g. cachito-pip-raw
) which will be used to store
external dependencies. The PyPI proxy repository will cache all PyPI packages that Cachito downloads
through it and the raw repository will hold tarballs or zip archives of external dependencies that
Cachito will upload after fetching them from the original locations.
On each request, Cachito will create a PyPI hosted repository and a raw repository, e.g.
cachito-pip-hosted-1
and cachito-pip-raw-1
. Cachito will upload all dependencies for the request
to these repositories (dependencies from PyPI to the hosted repository, external dependencies to the
raw one). Cachito will provide environment variables and configuration files that, when applied
to the user's environment, will allow them to install their dependencies from the above-mentioned
repositories. When installing dependencies from the Cachito-provided repositories, the user is
inherently blocked from installing anything that they did not declare as a dependency, because the
repositories will only contain content that Cachito has made available.
These repositories are created per request and deleted when the request is marked as stale or the request fails.
The RubyGems package manager functionality relies on Nexus Repository Manager 3 to
store RubyGems dependencies. The Nexus instance consists of two repositories that act as a long
terms storage - RubyGems proxy repository (e.g. cachito-rubygems-proxy
) that points to
rubygems.org
and raw repository (e.g. cachito-rubygems-raw
) used for storing Git dependencies.
The RubyGems proxy repository caches all RubyGems packages that Cachito downloads through it and the
raw repository holds tarballs of Git dependencies that Cachito uploads there after fetching them
from the original locations.
On each request, Cachito creates a RubyGems hosted repository (e.g. cachito-rubygems-hosted-1
)
and uploads there all GEM dependencies for the request. This repository is created per request and
deleted when the request is marked as stale or the request fails. Redirecting Bundler to use this
repository instead of a default RubyGems server is done by providing a configuration file. Note that
there's no request specific repository for external dependencies as other package managers do, instead,
dependencies are installed from the downloaded bundle (see Package Managers section
for more details).
When installing dependencies from the Cachito-provided repositories, the user is inherently blocked from installing anything that they did not declare as a dependency, because the repositories will only contain content that Cachito has made available.
Refer to the "Configuring Workers" section to see how to configure Cachito to use Nexus. Please
note that you may choose to use two Nexus instances. One for hosting the permanent content and the
other for the ephemeral repositories created per request. This is useful if your organization
already has a shared Nexus instance but doesn't want Cachito to have near admin level access on it.
In this case, you will need to configure the following additional settings that point to the
Nexus instance that hosts the permanent content: cachito_nexus_hoster_username
,
cachito_nexus_hoster_password
, and cachito_nexus_hoster_url
.
The table below shows the supported package managers and their support level in Cachito.
Feature | gomod | npm | pip | yarn | rubygems |
---|---|---|---|---|---|
Baseline | âś“ | âś“ | âś“ | âś“ | âś“ |
Content Manifest | âś“ | âś“ | âś“ | âś“ | âś“ |
Dependency Replacements | âś“ | x | x | x | x |
Dev Dependencies | âś“ | âś“ | âś“ | âś“ | x |
External Dependencies | N/A | âś“ | âś“ | âś“ | âś“ |
Multiple Paths | âś“ | âś“ | âś“ | âś“ | âś“ |
Nested Dependencies | âś“ | âś“ | x | âś“ | âś“ |
Offline Installations | âś“ | x | x | x | x |
- Baseline - The basic requirements are all met and this is ready for production use. This means that all dependencies from official sources declared in a lock file will be properly identified and shown in the REST API. The dependencies will be permanently stored by Cachito and be reused when a future request declares the same dependency. Additionally, Cachito will provide a mechanism for the application to be built using just the declared dependencies from Cachito. The dependency sources are also included in the bundle generated by Cachito for convenience so that the sources can be published alongside of the application for licensing requirements.
- Content Manifest - The
/api/<version>/requests/<id>/content-manifest
returns a Content Manifest JSON document that describes the application's dependencies and sources. - Dependency Replacements - Dependency replacements can be specified when creating a Cachito request. This is a convenient feature to allow dependencies to be swapped without making changes in the source repository. Dependency replacement is only supported if a single package is referenced in the repository.
- Dev Dependencies - Cachito can distinguish between dependencies used for running the
application and building/testing the application. For example, for the
npm
package manager, the application may requirewebpack
to minify their JavaScript and CSS files but that is not used at runtime. - External Dependencies - External dependencies are supported such as those not from the default
registry/package index. For example, for the
npm
package manager, thepackage-lock.json
file may have a dependency installed directly from GitHub and not from the npm registry. - Multiple Paths - Cachito supports a source repository with multiple applications within it. The paths within the source repository are provided by the user when creating the request.
- Nested Dependencies - Dependencies that are stored directly in the source Git repository.
For example,
npm
allowsfile
dependencies with thecachito_npm_file_deps_allowlist
configuration.gomod
allows this through thego.mod
replace directive. - Offline Installations - The dependencies can be installed solely with the contents of the
bundle. This is true for the
gomod
package manager, however, thenpm
andpip
package managers rely on Nexus to be online and properly configured by Cachito. If users were so inclined, they could find ways to do an offline install for any package manager, but onlygomod
supports this out of the box (i.e. the user does not need to change their workflow).
Tool | Version |
---|---|
Go* | 1.20.7, 1.23.0 (no workspace vendoring support) |
Npm | 9.5.0 |
Node | 18.16.1 |
Pip | 22.3.1 |
Python | 3.11.4 |
Git | 2.41.0 |
Yarn* | 1.x |
Bundler* | 2.x |
- Cachito does not use the Yarn runtime. The processing of yarn.lock files is handled by PYarn, which is compatible with any 1.x file.
- Cachito does not use the Ruby runtime (no ruby is interpreted from
Gemfile
s). The processing of Gemfile.lock files is handled by gemlock-parser. - Starting with Go 1.21 Go changed the meaning of the
go
directive ingo.mod
file slightly and made the constraint stricter in that the line now denotes the minimum required version of Go instead of a suggested version of Go. If a project recommending an older version of Go is processed with Go >=1.21 it might happen (based on other dependencies) that its own required version of Go will be bumped to 1.21+, hence dirtying the git repo - to prevent this cachito uses two releases of Go SDK concurrently.
The gomod package manager works by parsing the go.mod
file present in the source repository to
determine which dependencies are required to build the application. By default, the top level module
is discovered, but optional path
s can be provided to point Cachito to the module(s) to discover.
Cachito then downloads the dependencies through Athens so that they are permanently stored and at the same time create a Go module cache to be stored in the request's bundle.
Cachito will produce a bundle that is downloadable at /api/v1/requests/<id>/download
. This
bundle will contain the application source code in the app
directory and Go module cache of all
the dependencies in the deps/gomod
directory.
Cachito will provide environment variables in the REST API to set for the Go tooling to use this cache when building the application.
When the user enables vendoring mode via the gomod-vendor[-check]
flag, Cachito will
not build the module cache. The deps/gomod
directory will be empty. Instead, the vendored modules
will be present in the main module's vendor
directory. Check the official documentation about
vendoring for more details.
One important thing to note is that only a subset of the module dependency graph will be vendored.
As explained in the docs, only modules containing packages needed for building and testing the main
module will be present. Commands that expect the entire dependency graph to be available may not
work as expected, if at all. Notably, go mod tidy
and other go mod
commands ignore the vendor
directory and instead try to download the modules or access the module cache (which is empty).
When reporting Go sources, Cachito differentiates between modules and packages. To simplify a bit,
any directory that contains a go.mod
file is a module and any directory that contains .go
files is a package. A directory that contains both go.mod
and .go
files is both a module and
a package. In Cachito, all packages should have parent modules (or be modules themselves).
In the JSON response at the /api/v1/requests/<id>
endpoint, Go modules use the gomod
type, Go
packages use go-package
. Packages can be matched to their parent modules based on name; package
names always start with the module name. In the dependencies
section of a Go package, Cachito
will list only the packages that were imported by that package (a.k.a. package level deps). In the
dependencies
section of a Go module, Cachito will list all the modules specified as dependencies
in go.mod
. Submodules allowed to be replaced by a local module by default, no entry required
in the cachito_gomod_file_deps_allowlist
config variable.
In the Content Manifests shipped at the /api/v1/requests/<id>/content-manifest
API endpoint, all
top-level purls and the purls of all dependencies
refer to Go packages. The purls for the parent
Go modules of those dependencies are present in sources
.
The npm package manager works by parsing the npm-shrinkwrap.json
or package-lock.json
file
present in the source repository to determine what dependencies are required to build the
application.
Cachito then creates an npm registry in an instance of Nexus it manages that contains just
the dependencies discovered in the lock file. The registry is locked down so that no other
dependencies can be added. The connection information is stored in an
.npmrc file accessible at the
/api/v1/requests/<id>/configuration-files
API endpoint.
Cachito will produce a bundle that is downloadable at /api/v1/requests/<id>/download
. This
bundle will contain the application source code in the app
directory and individual tarballs
of all the dependencies in the deps/npm
directory. These tarballs are not meant to be used to
build the application. They are there for convenience so that the dependency sources can be
published alongside your application sources. In addition, they can be used to populate a local npm
registry in the event that the application needs to be built without Cachito and the Nexus instance
it manages.
Cachito can also handle dependencies that are not from the npm registry such as those directly
from GitHub, a Git repository, or an HTTP(S) URL. Please note that if the dependency is from a
private repository, set the
.netrc and
known_hosts
files for the Cachito workers. If the dependency location is not supported, Cachito
will fail the request. When Cachito encounters a supported location, it will download the
dependency, modify the version in the package.json to
be unique, upload it to Nexus, modify the top level project's
package.json and lock files to use the dependency from
Nexus instead. The modified files will be accessible at the
/api/v1/requests/<id>/configuration-files
API endpoint. If Cachito encounters this same dependency
again in a future request, it will use it directly from Nexus rather than downloading it and
uploading it again. This guarantees that any dependency used for a Cachito request can be used again
in a future Cachito request.
The pip package manager works by parsing the requirements.txt
and requirements-build.txt
files
present in the source repository to determine what dependencies are required to build the
application. It is possible to specify different file path(s) for the requirements files as long
as the files use the expected format.
Cachito then creates two repositories in an instance of Nexus it manages that contain just the
dependencies discovered in the requirements files. PyPI dependencies are uploaded to a PyPI hosted
repository, external dependencies are uploaded to a raw repository. Connection information for the
hosted repository is provided as the PIP_INDEX_URL
environment variable accessible at the
/api/v1/requests/<id>/environment-variables
endpoint. To make external dependencies available,
Cachito modifies the requirements files for the request by replacing relevant entries with their
corresponding URLs from the raw repository. The modified requirements files are accessible at the
/api/v1/requests/<id>/configuration-files
endpoint.
Note that the PIP_INDEX_URL
variable exposes the username and password of the temporary user
created for your request. This should not be a security concern, the user only has read access for
the repositories and the only reason why we do not allow anonymous read access is due to a technical
limitation in Nexus.
Cachito will produce a bundle that is downloadable at /api/v1/requests/<id>/download
. This
bundle will contain the application source code in the app
directory and individual source
archives of all the dependencies in the deps/pip
directory. These archives are not meant to be
used to build the application. They are there for convenience so that the dependency sources can be
published alongside your application sources. In addition, they can be used to to install packages
directly from the filesystem with pip install --no-index --no-deps <path/to/archive>
(for each
individual source archive) in the event that the application needs to be built without Cachito and
the Nexus instance it manages.
As mentioned above, Cachito can also handle dependencies that are not from PyPI, such as those from a Git repository or an HTTP(S) URL. After downloading such a dependency, Cachito will upload it to the Nexus instance used for hosting permanent content. If Cachito encounters this same dependency again in a future request, it will use it directly from Nexus rather than downloading it and uploading it again. This guarantees that any dependency used for a Cachito request can be used again in a future Cachito request.
Compared to gomod and npm, Cachito support for pip has restrictions and limitations that users may not expect. For more details, see the Cachito pip documentation.
With git-submodule as a package manager, Cachito is able to fetch git submodules within given Cachito requested repo and make them available in the Cachito API request response. The git submodules are fetched before any other package managers are processed.
Cachito will produce a bundle that is downloadable at /api/v1/requests/<id>/download
. This
bundle will contain the application source code in the app
directory. When git-submodule
is passed as a pkg_managers
argument for any Cachito request, the available git submodules
within the requested repo will also become available as part of the downloadable bundle. If the
repo contains multiple submodules, Cachito will fetch them all. Although, recursion is not supported
and hence only one level of submodules will be fetched.
The git submodules information will be included in the Cachito API request response at the
/api/v1/requests/<id>
endpoint as packages with the git-submodule
type.
Finally, the packages information will be used to compose Content Manifests shipped at the
/api/v1/requests/<id>/content-manifest
API endpoint.
Examples:
curl -X POST -H "Content-Type: application/json" http://localhost:8080/api/v1/requests \
-d '{
"repo": "https://github.com/nirzari/retrodep.git",
"ref": "18002daac67f82f1a0f3b1f41beb3469f23116ea",
"pkg_managers": ["gomod", "git-submodule"]
}'
In the above case, submodules tour
and go-github
within specified retrodep
repo are fetched
as part of the downloadable bundle. They would also be available as packages for Cachito API request
response. Further, they become part of the Content Manifest.
If paths to specific git submodules are provided as part of the packages
configuration,
Cachito would fetch the submodules and then process them as regular packages.
curl "localhost:8080/api/v1/requests" \
-X POST \
-H 'content-type: application/json' \
-d '{
"repo": "https://github.com/chmeliik/cachito-sample-pip-package/",
"ref": "1ca07be3001450dbc4f0224e0f763c60353d0f01",
"pkg_managers": ["git-submodule", "pip", "npm"],
"packages": {
"pip": [
{"path": "cachito-pip-with-deps"}
],
"npm": [
{"path": "cachito-npm-test"}
]
}
}'
In the above case, Cachito would fetch the submodules cachito-pip-with-deps
, cachito-npm-test
and
then process them as a regular pip and npm package respectively.
Cachito handles the yarn package manager in much the same way as the npm package manager.
The yarn package manager works by parsing the yarn.lock
file present in the source repository to determine what dependencies are required to build the application.
All requests for the yarn package manager with package-lock.json
, npm-shrinkwrap.json
files in
the root directory will fail because those files are dedicated for npm.
After parsing, Cachito creates a yarn registry in an instance of Nexus it manages that contains just
the dependencies discovered in the lock file. The registry is locked down so that no other
dependencies can be added. The connection information is stored in an
.npmrc file accessible at the
/api/v1/requests/<id>/configuration-files
API endpoint. Cachito also generates a
.yarnrc file in the same directory as the
.npmrc file, overwriting any existing
.yarnrc files if they exist.
Cachito will produce a bundle that is downloadable at /api/v1/requests/<id>/download
. This
bundle will contain the application source code in the app
directory and individual tarballs
of all the dependencies in the deps/yarn
directory. These tarballs are not meant to be used to
build the application. They are there for convenience so that the dependency sources can be
published alongside your application sources. In addition, they can be used to populate a local yarn
registry in the event that the application needs to be built without Cachito and the Nexus instance
it manages.
Cachito can also handle dependencies that are not from the yarn registry such as those directly
from GitHub, a Git repository, or an HTTP(S) URL. Please note that if the dependency is from a
private repository, set the
.netrc and
known_hosts
files for the Cachito workers. If the dependency location is not supported, Cachito
will fail the request. When Cachito encounters a supported location, it will download the
dependency, modify the version in the package.json to
be unique, upload it to Nexus, modify the top level project's
package.json and
yarn.lock to use the dependency from
Nexus instead. The modified files will be accessible at the
/api/v1/requests/<id>/configuration-files
API endpoint. If Cachito encounters this same dependency
again in a future request, it will use it directly from Nexus rather than downloading it and
uploading it again. This guarantees that any dependency used for a Cachito request can be used again
in a future Cachito request.
The Bundler package manager works by parsing the Gemfile.lock
file present in the source
repository to determine what dependencies are required to build the application.
Cachito then creates a RubyGems repository in an instance of Nexus it manages that contains just the
GEM dependencies discovered in the lock file. Also, Cachito produces a bundle downloadable at
/api/v1/requests/<id>/download
containing app/
directory with the application source code
(including PATH dependencies) and /deps/rubygems
directory with all GEM and GIT dependencies.
Since multiple packages in a single repo are supported, for each of these packages a configuration
file is provided at /api/v1/requests/<id>/configuration-files
endpoint. This file redirects
Bundler to use Nexus proxy for downloading GEM dependencies and contains an entry for every Git
dependency to be overridden by the corresponding dependency from deps/rubygems
(instead of
downloading it from the internet, see local Git
repos for more details).
If a GIT dependency is specified with branch:
in the Gemfile, this branch is checked out so that
local GIT repo redirection works.
Note that configuration files expose the username and password of the temporary user created for your request. This should not be a security concern, the user only has read access for the repositories and the only reason why we do not allow anonymous read access is due to a technical limitation in Nexus.
There are several constraints on RubyGems packages that are enforced by Cachito and not meeting them raises an exception sooner or later:
- To prevent Cachito from downloading native content (binaries),
Gemfile.lock
has to contain only one platform in itsPLATFORMS
section, and it has to beruby
. - All PATH dependencies listed in
Gemfile.lock
have to be explicitly allowed in Cachito's config file. For example, a package which is located at the subpathfirst_pkg/
from the root of a repository at URLgithub.com/cachito-testing/cachito-rubygems-multiple
which has PATH dependencypathgem
will be processed properly only if Cachito's config contains the following entry
cachito_rubygems_file_deps_allowlist = {
"cachito-rubygems-multiple/first_pkg": ["pathgem"]
}
Note that the name of the package (the key in the dictionary) is the last component of its repo URL.
If the package isn't located in the root of the repo, then its /subpath
is appended to the name
(/first_pkg
in the example above). The value in the dictionary is an array of all PATH dependencies
of the given package, where the names are parsed from their .gemspec
files (= names which are
listed in Gemfile.lock
).
- Git dependencies must use
https://
and specify the exact commit hash in theGemfile.lock
(it's done automatically by Bundler). - As mentioned above, Cachito provides config files so that user can simply unpack the bundle and
run
bundle install
from theapp
directory. This config uses local Git repos redirection, but not all dependencies have.gemspec
file supporting this. To prevent failure duringbundle install
execution, check.gemspec
files of all GIT dependencies listed inGemfile.lock
and make sure that if there are anyrequire
statements, these statements are working relative to the .gemspec file of that dependency, ideally by usingrequire_relative
keyword as suggested in this RubyGems guide.
Cachito can be used without specifying a package manager in a request. In that case, only the source code present in the specified commit in a repository will be downloaded and cached.
Even if there are package manager definitions in the source code (such as a package.json
or a
requirements.txt
file), they'll be ignored using this approach. Besides not being cached, the dependencies
will also be absent from the content manifest.
This approach can be useful in case there's need to cache and use only the actual source code for that commit, which will then be present in the tarball served by Cachito. Here's how to create a request without package managers:
curl "localhost:8080/api/v1/requests" \
-X POST \
-H 'content-type: application/json' \
-d '{
"repo": "https://github.com/cachito-testing/cachito-pip-with-deps/",
"ref": "56efa5f7eb4ff1b7ea1409dbad76f5bb378291e6",
"pkg_managers": []
}'
It is important to use an empty array in the pkg_managers
key, since omitting it will make Cachito fallback
to a default package manager.
By default, the Git history is omitted from the tarball, but it can be included in case the include-git-dir
flag is used.