In images with text, besides recognizing it (which is what OCR does), there are other interesting things one can do. This repo creates a lot of real-looking images and also generates their text masks.
You can use it to:
- Identify character boundaries
- Identify font used, it's weight, italicization etc
- Identify text color
- Train a GAN to remove text
- Still do OCR if you wish to
At ImageTranslate, we need such datasets often and that too in different languages and scripts. Obviously gathering such a dataset isn't easy. But we can create one.
Each of these were generated
Dataset should have diverse:
- Backgrounds : Gathered from Unsplash
- Foreground/Text color: Gathered from user-generated palletes
- Scripts: Gathered words from wordlists
- Fonts: Gathered from Google Fonts
You will need a sample assets.tar like this. Unzip it in your project folder. The assets folder has subdirectories to which you can add your own backgrounds, palettes and wordlists.
Google fonts are excluded from the tar as it is a huge repo in itself. For Google Fonts, clone their repo and copy the folder so that final folder structure looks like:
- assets
- fonts
- google-fonts
- apache
- ufl
- ofl
- google-fonts
- fonts
And of course, do the obvious pip install -r requirements.txt
API response for /generate
endpoint
{
"image": "base64-encoded-png",
"mask": "base64-encoded-png",
"text_value": "hello",
"text_color": "#13EEF0",
"font_face": "Helvetica Neue",
"category": "SANS-SERIF",
"italicization": false,
"weight": 400,
"script": "Latin",
"language": "English"
}
Details of each attribute:
image
holds base64 string of the image PNG.mask
holds base64 string of the mask PNG.text
holds the text rendered.text_color
is color as hex string.font_face
holds font-face as Google Fonts names itcategory
can be one ofSERIF
,SANS-SERIF
,HANDWRITING
,DISPLAY
,MONOSPACE
italicization
is obviousweight
is as reported by Google Fonts. We need to quantize it later.script
should be one from the scripts of languages we supportlanguage
should be be one from the languages we support
If you just want to create a lot of such images
python work.py
If you want to serve it as an API
python api.py
import requests
import base64
from io import BytesIO
from PIL import Image
response = requests.get("http://localhost:8000/generate")
if response.status_code == 200:
response = response.json()
image = Image.open(BytesIO(base64.b64decode(response["image"])))
mask = Image.open(BytesIO(base64.b64decode(response["mask"])))
- Clone this repo and copy
work.py
to your codebase.
from work import load_assets, shuffle_assets, generate_data
# Initialize assets
load_assets()
shuffle_assets()
# Generate some data and decode as PIL images
output = generate_data()
image = Image.open(BytesIO(base64.b64decode(response["image"])))
mask = Image.open(BytesIO(base64.b64decode(response["mask"])))
- Lot of python random
- Trasformations with the elegant PIL library
- Euclidean space calculations for colors
The code is rather simple to understand and annotated with ample comments if you're interested.