Table of Contents
- 1. Getting the code
- 2. Tool for Python environments
- 3. Setting up the environment
- 4. Preparing data and models
- 5. Finally…
For retrieving the code that we will use in the workshop you have two options:
-
without using Git
- Go to the GitHub repository page
- Download the entire repository from the GitHub web interface as a ZIP file to your computer. You can find the Download ZIP option when you click on the <> Code drop menu.
- Once the archive is downloaded, extract its contents to your desired directory.
-
using Git
-
Open your Terminal or Command Prompt. On Windows, you can use Command Prompt or PowerShell. On macOS and Linux, you can use the Terminal
-
Navigate to the directory where you want to clone the repository
-
Clone the repository with the command:
git clone https://github.com/KISZ-BB/kisz-nlp-embeddings
-
And that’s it! You now have the code for the workshop on your local machine.
Side Note: If you have problems with any of the following steps, you can open an issue in this repository and tell us about it.
For avoiding reproducibility and package dependency problems, you will be using a custom Python environment (created with Miniconda) during the workshop. You can find the requirements for the environment in the environment.yml file in the envs folder.
We recommend you to set up the environment with Miniconda for several reasons:
- Conda provides a compact and efficient way to create and manage Python environments.
- Miniconda provides a minimal installation compared with Anaconda, where the most common packages for data science are automatically installed.
- Even though installation with venv and pip is also possible, some packages like faiss can only be installed with conda.
If you don't have miniconda, you can find the installation instructions here.
Once you have miniconda installed, you can open the Miniconda shell. You can usually find it under the name Anaconda prompt (miniconda). Alternatively, you can also use the Miniconda PowerShell. You should see the tag (base) leading your prompt.
Warning: The size of the environment could easily reach 10 or 11 Gbs in your hard disc drive, and we will also need space for downloading some additional files, so please make sure you have enough space.
We have prepared everything into a Makefile. To create the conda environment and activate it, try to execute the following command in the folder where you put the code:
make
If everything went ok, you should see now the tag (.embed_env). If it did not work, we will have to do it manually. Follow this steps:
Check that there is no subfolder in envs. If you find any subfolder, please delete it
Create the environment with the required packages using the command
conda env create -f ./envs/environment.yml -p ./envs/.embed_env
- Activate the environment with the command
conda activate ./envs/.embed_env
- Check that you see the tag (.embed_env) in your prompt
As sanity check, if you want to be sure that all packages were properly installed in the right versions you can run the command
pytest -m env -v -p no:warnings
We will be using some data, models and tools that need to be downloaded before the workshop. Please be aware that the execution of the following instruction can take some minutes.
We assume you still have the Miniconda Prompt open, so just run the following command
python ./src/setup.py
This will install the basic data and models.
Side note: You might be interested in installing other models, tools or datasets that are mentioned in the workshop. In this case, feel free to edit the setup.py file in the src folder and uncomment the lines corresponding to the models you want to preload, save the file and run the command above again.
Congratulations! Everything is prepared and you are set up for the workshop.
If something went wrong or did not run as expected, if you found bugs or incomplete documentation, or even if you want to propose new features, topics or ways of improving the workshop or the code, feel free to open an issue in the GitHub repository and we will get back to you as soon as possible.
Thank you for your attention and collaboration, and we hope you enjoy the workshop.