Suddo Bangla-hunspell

To Do

Generate word frequency lists from corpus of old books in wikisource(in progess)
To understand the dic aff format | chromium developers | Ubuntu manpages | Source documentation.
Find a way to test find word coverage, preferably in firefox or libre writer.
Use wikisource to classify words in to parts of speach (helps with suffixies)

Progess

Generate word frequency lists from the books proofread by bn.wikisource.
1. Download the epub files by hand from wikisource to here,(machine downloads not permited).
2. Convert them to txt by using epub_to_txt.sh
3. Generate the most frequent words using word_frequency.py .
Test word coverge using analyze like this.
Post made at wikisource requesting help to transcribe dictionaries.
To view bangla with joint glyphs(jukthakhor) in terminal, use konsole. Use a suitable font (I use MesloLGS NF) and enable Bramhic script charactes as follows. Menu>settings> configure Konsole> Profiles> new Profile> Edit> Appearance > Complex Text Layout Check Bramhic Script Charactes.

Resources

Online Resourses

Description

Most of the .dic and .aff files have been extracted and placed in the resources folder. To open any such plugins for firefox, thunderbird or libre office use any archive manager. The Bangla Akademi word list published by SNLTR is in .doc format, it has been converted to .csv for better utility. Other than that their dictionaries use only the .dic file mainly, so it doesn't take advantage of the .aff file for compression hence has very low coverage. However I am not well versed in java to understand what they are doing with that plugin. Anyhow, the most important resource of all is the .dic and .aff files from Bangla Type Foundry. They have done a tremendous job of embedding the grammer rules of the Bangla language into the dic-aff format. The idea would be to create a bn-in dictionary following those methods, taking into account the old words(suddo).

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
epubs		epubs
resources		resources
txt		txt
README.md		README.md
bn-in.aff		bn-in.aff
bn-in.dic		bn-in.dic
epub_to_txt.sh		epub_to_txt.sh
hunspell-coverage.py		hunspell-coverage.py
hunspell-coverage.sh		hunspell-coverage.sh
token_frequency.csv		token_frequency.csv
word_frequency.py		word_frequency.py
words.csv		words.csv
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Suddo Bangla-hunspell

To Do

Progess

Resources

Online Resourses

Description

About

Releases

Languages

mm-crj/bangla-hunspell

Folders and files

Latest commit

History

Repository files navigation

Suddo Bangla-hunspell

To Do

Progess

Resources

Online Resourses

Description

About

Resources

Stars

Watchers

Forks

Releases

Languages