The Transformer architecture, introduced in the paper "Attention Is All You Need," has become a cornerstone of many natural language processing tasks, this project implements a Transformer model from scratch using PyTorch.
The Transformer model consists of the following components:
- Encoder
- Decoder
- Multi-Head Attention
- Position-wise Feed-Forward Networks
- Positional Encoding
src/
: Contains the source code for the Transformer modelmodel/
: Transformer model componentsutils/
: Utility functions for data processingtrain.py
: Script for training the Transformertranslate.py
: Script for using the trained model for translation
tests/
: Unit tests for model componentsdata/
: Directory to store dataset filesvocab/
: Directory to store vocabulary files
-
Clone the repository:
git clone https://github.com/yourusername/transformer-from-scratch.git cd transformer-from-scratch
-
Create a virtual environment:
python -m venv transformer
-
Activate the virtual environment:
- On Windows:
transformer\Scripts\activate
- On macOS and Linux:
source transformer/bin/activate
- On Windows:
-
Install dependencies:
pip install -r requirements.txt
-
Place your English sentences in
data/english_sentences.txt
and French sentences indata/french_sentences.txt
. -
Create vocabularies:
python src/create_vocab.py
To train the Transformer model, run:
python src/train.py
This will start the training process and save the model checkpoints in the saved_models
directory.
After training the model, you can use it for translation:
python src/translate.py
Example usage in your code:
from src.translate import Translator
translator = Translator(
model_path="saved_models/final_model.pth",
src_vocab_path="vocab/english.model",
tgt_vocab_path="vocab/french.model",
device="cuda" # or "cpu" if you don't have a GPU
)
english_sentence = "Hello, how are you?"
french_translation = translator.translate(english_sentence)
print(f"English: {english_sentence}")
print(f"French: {french_translation}")
docker-compose build
- To run the training service:
docker-compose up transformer
- To run the translation service:
docker-compose up translator
To run the unit tests, execute:
python -m unittest discover tests
You can customize the model by modifying the hyperparameters in src/train.py
. The main hyperparameters are:
src_vocab_size
: Size of the source vocabularytgt_vocab_size
: Size of the target vocabularyd_model
: Dimensionality of the modelnum_heads
: Number of attention headsnum_layers
: Number of encoder and decoder layersd_ff
: Dimensionality of the feed-forward networkdropout
: Dropout ratemax_seq_length
: Maximum sequence lengthbatch_size
: Batch size for trainingnum_epochs
: Number of training epochslearning_rate
: Learning rate for the optimizer
This project requires the following main dependencies:
- PyTorch
- NumPy
- tqdm
- matplotlib
- sentencepiece
For a complete list of dependencies, please refer to the requirements.txt
file.
Contributions to this project are welcome. Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
- The Transformer architecture is based on the paper "Attention Is All You Need" by Vaswani et al.
- This implementation was inspired by various open-source Transformer implementations and tutorials available in the community.