Skip to content

Commit

Permalink
Added README.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
batzner committed Aug 30, 2017
1 parent b35b76e commit ba6aa48
Show file tree
Hide file tree
Showing 4 changed files with 125 additions and 7 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include LICENSE.txt
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ Generate Shakespeare poems with 4 lines of code.

## Installation

pip install tensorflow>=1.1
pip install tensorlm
`tensorlm` is written in / for Python 3.4+

pip3 install tensorflow>=1.1
pip3 install tensorlm

## Basic Usage

Expand Down
99 changes: 99 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
tensorlm
========

Generate Shakespeare poems with 4 lines of code.

<a href="http://www.mlowl.com/post/character-language-model-lstm-tensorflow/" target="_blank">[![showcase
of the package]]</a>

Installation
------------

`tensorlm` is written in / for Python 3.

pip3 install tensorflow>=1.1
pip3 install tensorlm

Basic Usage
-----------

Use the `CharLM` or `WordLM` class:

``` {.python}
import tensorflow as tf
from tensorlm import CharLM
with tf.Session() as session:
# Create a new model. You can also use WordLM
model = CharLM(session, "datasets/sherlock/train.txt", max_vocab_size=96,
neurons_per_layer=100, num_layers=3, num_timesteps=15)
# Train it
model.train(session, max_epochs=5, max_steps=500, print_logs=True)
# Let it generate a text
generated = model.sample(session, "The ", num_steps=100)
print("The " + generated)
```

This should output something like:

The eee ee ee ee e e ee ee e e e e e e e e e e

Command Line Usage
------------------

**Train:**\
`python3 -m tensorlm.cli --train=True --level=char --train_text_path=datasets/sherlock/train.txt --max_vocab_size=96 --neurons_per_layer=100 --num_layers=3 --batch_size=10 --num_timesteps=160 --save_dir=out/model --max_epochs=300 --save_interval_hours=0.5`

**Sample:**\
`python3 -m tensorlm.cli --sample=True --level=char --neurons_per_layer=400 --num_layers=3 --num_timesteps=160 --save_dir=out/model`

**Evaluate:**\
`python3 -m tensorlm.cli --evaluate=True --level=char --evaluate_text_path=datasets/sherlock/valid.txt --neurons_per_layer=400 --num_layers=3 --batch_size=10 --num_timesteps=160 --save_dir=out/model`

See `python3 -m tensorlm.cli --help` for all options.

Advanced Usage
--------------

### Custom Input Data

The inputs and targets don’t have to be text. `GeneratingLSTM` only
expects token ids, so you can use any data type for the sequences, as
long as you can encode the data to integer ids.

``` {.python}
# We use integer ids from 0 to 19, so the vocab size is 20. The range of ids must always start
# at zero.
batch_inputs = np.array([[1, 2, 3, 4], [15, 16, 17, 18]]) # 2 batches, 4 time steps each
batch_targets = np.array([[2, 3, 4, 5], [16, 17, 18, 19]])
# Create the model in a TensorFlow graph
model = GeneratingLSTM(vocab_size=20, neurons_per_layer=10, num_layers=2, max_batch_size=2)
# Initialize all defined TF Variables
session.run(tf.global_variables_initializer())
for _ in range(5000):
model.train_step(session, batch_inputs, batch_targets)
sampled = model.sample_ids(session, [15], num_steps=3)
print("Sampled: " + str(sampled))
```

This should output something like:

Sampled: [16, 18, 19]

### Custom Training, Dropout etc.

Use the `GeneratingLSTM` class directly. This class is agnostic to the
dataset type. It expects integer ids and returns integer ids.

\`\`\`python\
import tensorflow

[showcase of the package]: http://i.cubeupload.com/8Cm5RQ.gif
[![showcase of the package]]: http://www.mlowl.com/post/character-language-model-lstm-tensorflow/
26 changes: 21 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,32 @@
from distutils.core import setup
from setuptools import setup, find_packages # Always prefer setuptools over distutils
from codecs import open # To use a consistent encoding
from os import path

here = path.abspath(path.dirname(__file__))

# Get the long description from the relevant file
with open(path.join(here, "README.rst"), encoding="utf-8") as f:
long_description = f.read()

setup(
name="tensorlm",
packages=["tensorlm"],
version="0.1",
packages=find_packages(exclude=["examples"]),
version="0.2",
description="TensorFlow wrapper for deep neural text generation on character or word level "
"with RNNs / LSTMs",
long_description=long_description,
author="Kilian Batzner",
author_email="info@kilians.net",
license="MIT",
url="https://github.com/batzner/tensorlm",
download_url="https://github.com/batzner/tensorlm/archive/v0.1.tar.gz",
download_url="https://github.com/batzner/tensorlm/archive/v0.2.tar.gz",
keywords=["tensorflow", "text", "generation", "language", "model", "rnn", "lstm", "deep",
"neural", "char", "word"],
classifiers=[],
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3 :: Only",
],
)

0 comments on commit ba6aa48

Please sign in to comment.