This is an experimental weekend project for testing purposes.
Wikibert is a project that demonstrates text generation using Wikipedia data. It consists of two main components:
- π A Python script (
get_data.py
) for fetching and saving Wikipedia pages. - π A Jupyter notebook (
wikibert.ipynb
) for training a text generation model using TensorFlow.
The get_data.py
script performs the following tasks:
- π Fetches a Wikipedia page and its linked pages recursively up to a specified depth.
- πΎ Saves the content of each page into individual text files in a
data
folder. - π§Ή Sanitizes filenames to ensure compatibility with the file system.
The wikibert.ipynb
notebook includes the following steps:
- π Loads and preprocesses text data from the saved Wikipedia pages.
- ποΈ Builds and trains a GRU-based RNN model for text generation using TensorFlow.
- πΎ Saves model checkpoints during training.
- π Generates text using the trained model.
- Run
get_data.py
to fetch and save Wikipedia pages. - Open
wikibert.ipynb
in Jupyter Notebook or Google Colab. - Follow the cells in the notebook to train the text generation model and generate text.
- Python 3.x
wikipedia-api
library- TensorFlow
- Jupyter Notebook (for
wikibert.ipynb
)
Install the required libraries using pip:
pip install wikipedia-api tensorflow
This project is licensed under the MIT License. See the LICENSE file for details.