Overworked Graphic Designer

What purpose will this serve

In images with text, besides recognizing it (which is what OCR does), there are other interesting things one can do. This repo creates a lot of real-looking images and also generates their text masks.

You can use it to:

Identify character boundaries
Identify font used, it's weight, italicization etc
Identify text color
Train a GAN to remove text
Still do OCR if you wish to

At ImageTranslate, we need such datasets often and that too in different languages and scripts. Obviously gathering such a dataset isn't easy. But we can create one.

Something like this

Each of these were generated

Dataset creation

Dataset should have diverse:

Backgrounds : Gathered from Unsplash
Foreground/Text color: Gathered from user-generated palletes
Scripts: Gathered words from wordlists
Fonts: Gathered from Google Fonts

Initial setup

You will need a sample assets.tar like this. Unzip it in your project folder. The assets folder has subdirectories to which you can add your own backgrounds, palettes and wordlists.

Google fonts are excluded from the tar as it is a huge repo in itself. For Google Fonts, clone their repo and copy the folder so that final folder structure looks like:

assets
- fonts
  - google-fonts
    - apache
    - ufl
    - ofl

And of course, do the obvious pip install -r requirements.txt

Schema

API response for /generate endpoint

{
    "image": "base64-encoded-png",
    "mask": "base64-encoded-png",
    "text_value": "hello",
    "text_color": "#13EEF0",
    "font_face": "Helvetica Neue",
    "category": "SANS-SERIF",
    "italicization": false,
    "weight": 400,
    "script": "Latin",
    "language": "English"
}

Details of each attribute:

image holds base64 string of the image PNG.
mask holds base64 string of the mask PNG.
text holds the text rendered.
text_color is color as hex string.
font_face holds font-face as Google Fonts names it
category can be one of SERIF, SANS-SERIF, HANDWRITING, DISPLAY, MONOSPACE
italicization is obvious
weight is as reported by Google Fonts. We need to quantize it later.
script should be one from the scripts of languages we support
language should be be one from the languages we support

How to run

If you just want to create a lot of such images

python work.py

If you want to serve it as an API

python api.py

How to decode API response on client side

import requests
import base64
from io import BytesIO
from PIL import Image

response = requests.get("http://localhost:8000/generate")

if response.status_code == 200:
    response = response.json()
    image = Image.open(BytesIO(base64.b64decode(response["image"])))
    mask = Image.open(BytesIO(base64.b64decode(response["mask"])))

Directly using in you program without API

Clone this repo and copy work.py to your codebase.

from work import load_assets, shuffle_assets, generate_data

# Initialize assets
load_assets()
shuffle_assets()

# Generate some data and decode as PIL images 
output = generate_data()
image = Image.open(BytesIO(base64.b64decode(response["image"])))
mask = Image.open(BytesIO(base64.b64decode(response["mask"])))

What is happening behind the scenes

Lot of python random
Trasformations with the elegant PIL library
Euclidean space calculations for colors

The code is rather simple to understand and annotated with ample comments if you're interested.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
api.py		api.py
requirements.txt		requirements.txt
work.py		work.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overworked Graphic Designer

What purpose will this serve

Something like this

Dataset creation

Initial setup

Schema

How to run

How to decode API response on client side

Directly using in you program without API

What is happening behind the scenes

About

Releases

Packages

Languages

License

imagetranslate/overworked-graphic-designer

Folders and files

Latest commit

History

Repository files navigation

Overworked Graphic Designer

What purpose will this serve

Something like this

Dataset creation

Initial setup

Schema

How to run

How to decode API response on client side

Directly using in you program without API

What is happening behind the scenes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages