Proof-of-concept for linking media content to create a network joining people of similar interests together.
To run the code in here, ensure your system meets the following requirements:
- Unix-like operating system (macOS, Linux, ...);
direnv
installed, including shell hooks;.envrc
allowed/trusted bydirenv
to use the environment variables - see below;- Python 3.8 or above; and
- Poetry installed.
Note there may be some Python IDE-specific requirements around loading environment variables, which are not considered here.
For quickstart set-up of the project, run the below in your shell:
# 1. read project-specific environment variables
direnv allow
# 2. activate virtual environment and install package dependencies
poetry shell
poetry install
# 3. check adherence to good standards on every commit
pre-commit install
Otherwise, read on below to understand each part in more detail.
To read project-specific environment variables, allow/trust the .envrc
run the below in your shell at the top-level of this directory (where this README.md
is located).
direnv allow
Note: If you are using PyCharm, then you will need to apply a few more steps before running
direnv allow
in your shell:
- In your shell, run
poetry add python-dotenv
.- On PyCharm, click
PyCharm
->Preferences
->Plugins
and download theEnvFile
plugin.- On PyCharm, edit your configuration to
Enable EnvFile
by ticking the checkbox.- On PyCharm, click
PyCharm
->Preferences
->Build, Execution, Deployment
->Console
->Python Console
and in theStarting script
section, add the following Python code:
from dotenv import load_dotenv
load_dotenv()
This repository uses Poetry to manage virtual environments and package dependencies. To set this up so you are working within the same Python version and with the same package versions, run the following in your shell:
# activate virtual environment
poetry shell
# install dependencies in poetry.lock
poetry install
More information can be found on the Poetry docs.
Note: If you are using PyCharm, then you will need to apply a few more steps to use this Poetry environment:
- On PyCharm, click
PyCharm
->Preferences
->Plugins
and download thePoetry
by @koxudaxi plugin.- On PyCharm, add the existing Poetry environment to the project.
When adding new packages using Poetry, run the following in your shell:
poetry add <package_name>
This should then update your poetry.lock
file with the new package installed. Ensure this change is also version-controlled.
This repo is configured so pre-commit
is run automatically on every commit. By running on each commit, we ensure that pre-commit
will be able to detect all contraventions and keep our repo in a healthy state.
In order for pre-commit
to run, configure it on your system by running the following in your shell.
pre-commit install
If pre-commit
detects any secrets when you try to create a commit, it will detail what it found and where to go to check the secret.
If the detected secret is a false-positive, you should update the secrets baseline through the following steps:
- Run
detect-secrets scan --update .secrets.baseline
to index the false-positive(s); - Next, audit all indexed secrets via
detect-secrets audit .secrets.baseline
(the same as during initial set-up, if a secrets baseline doesn't exist); and - Finally, ensure that you commit the updated secrets baseline in the same commit as the false-positive(s) it has been updated for.
If the detected secret is actually a secret (or other sensitive information), remove the secret and re-commit. There is no need to update the secrets baseline in this case.
If your commit contains a mixture of false-positives and actual secrets, remove the actual secrets first before updating and auditing the secrets baseline.
It may be necessary or useful to keep certain output cells of a Jupyter notebook, for example charts or graphs visualising some set of data. To do this, add the following comment at the top of the input block:
# [keep_output]
This will tell pre-commit
not to strip the resulting output of this cell, allowing it to be committed.
To hold secrets like API credentials without exposing them in the repo, create a .secrets
file like this example. This should be automatically read by the .envrc
file.
For instance, if you want to download data onto your system directly through the Kaggle API, you will need credentials to connect to the Kaggle API. To do this, go the Account Tab (https://www.kaggle.com//account) and click ‘Create API Token’. This will download a kaggle.json
file. Copy and paste the entries into your .secrets
file, by populating it with the following:
export KAGGLE_USERNAME="<your_user_name>"
export KAGGLE_KEY="<your_api_key>"