evaluation of text embedding techniques
- Create and switch to the virtual environment:
cd text_evaluation
make create_environment
conda activate text_evaluation
make requirements
- Explore the notebooks in the
notebooks
directory
LICENSE
Makefile
- top-level makefile. Type
make
for a list of valid commands
- top-level makefile. Type
README.md
- this file
data
- Data directory. often symlinked to a filesystem with lots of space
data/raw
- Raw (immutable) hash-verified downloads
data/interim
- Extracted and interim data representations
data/processed
- The final, canonical data sets for modeling.
docs
- A default Sphinx project; see sphinx-doc.org for details
models
- Trained and serialized models, model predictions, or model summaries
models/trained
- Trained models
models/output
- predictions and transformations from the trained models
notebooks
- Jupyter notebooks. Naming convention is a number (for ordering),
the creator's initials, and a short
-
delimited description, e.g.1.0-jqp-initial-data-exploration
.
- Jupyter notebooks. Naming convention is a number (for ordering),
the creator's initials, and a short
references
- Data dictionaries, manuals, and all other explanatory materials.
reports
- Generated analysis as HTML, PDF, LaTeX, etc.
reports/figures
- Generated graphics and figures to be used in reporting
reports/tables
- Generated data tables to be used in reporting
reports/summary
- Generated summary information to be used in reporting
requirements.txt
- (if using pip+virtualenv) The requirements file for reproducing the
analysis environment, e.g. generated with
pip freeze > requirements.txt
- (if using pip+virtualenv) The requirements file for reproducing the
analysis environment, e.g. generated with
environment.yml
- (if using conda) The YAML file for reproducing the analysis environment
setup.py
- Turns contents of
src
into a pip-installable python module (pip install -e .
) so it can be imported in python code
- Turns contents of
src
- Source code for use in this project.
src/__init__.py
- Makes src a Python module
src/data
- Scripts to fetch or generate data. In particular:
src/data/make_dataset.py
- Run with
python -m src.data.make_dataset fetch
orpython -m src.data.make_dataset process
- Run with
src/analysis
- Scripts to turn datasets into output products
src/models
- Scripts to train models and then use trained models to make predictions.
e.g.
predict_model.py
,train_model.py
- Scripts to train models and then use trained models to make predictions.
e.g.
tox.ini
- tox file with settings for running tox; see tox.testrun.org
This project was built using cookiecutter-easydata, an experimental fork of [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) aimed at making your data science workflow reproducible.