-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #17 from neocl/dev
[Version 0.1a7] - Added Japanese Proper Names Dictionary (JMnedict) support - Included built-in KRADFILE/RADKFile support - Improved command line tools (json, compact mode, etc.)
- Loading branch information
Showing
20 changed files
with
1,390 additions
and
130 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,11 @@ | ||
2020-05-31 | ||
- [Version 0.1a7] | ||
- Added Japanese Proper Names Dictionary (JMnedict) support | ||
- Included built-in KRADFILE/RADKFile support | ||
- Improved command line tools (json, compact mode, etc.) | ||
|
||
2017-08-18 | ||
- Support for KanjiDic2 (XML/SQLite formats) | ||
- Support KanjiDic2 (XML/SQLite formats) | ||
|
||
2016-11-09 | ||
- Release first demo to Github | ||
- Release first version to Github |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,101 +1,213 @@ | ||
Python library for manipulating Jim Breen's JMdict & KanjiDic2 | ||
|
||
# Main features | ||
* Query JMDict and KanjiDic2 in XML format directly (but slow) | ||
* Convert JMDict and KanjiDic2 into SQLite format for faster access | ||
* Basic console lookup tool | ||
* jamdol (jamdict-online) - REST API using Python/Flask (jamdol-flask) | ||
|
||
# Installation | ||
* Support querying different Japanese language resources | ||
- Japanese-English dictionary JMDict | ||
- Kanji dictionary KanjiDic2 | ||
- Kanji-radical and radical-kanji maps KRADFILE/RADKFILE | ||
- Japanese Proper Names Dictionary (JMnedict) | ||
* Data are stored using SQLite database | ||
* Console lookup tool | ||
* jamdol (jamdol-flask) - a Python/Flask server that provides Jamdict lookup via REST API (experimental state) | ||
|
||
Homepage: [https://github.com/neocl/jamdict](https://github.com/neocl/jamdict) | ||
|
||
Contributors are welcome! 🙇 | ||
|
||
# Installation | ||
|
||
Jamdict is available on PyPI at [https://pypi.org/project/jamdict/](https://pypi.org/project/jamdict/) and can be installed using pip command | ||
|
||
```bash | ||
pip install jamdict | ||
# pip script sometimes doesn't work properly, so you may want to try this instead | ||
python3 -m pip install jamdict | ||
``` | ||
|
||
# initial setup (this command will create ~/.jamdict for you | ||
# it will also tell you where to copy the data files | ||
python3 -m jamdict.tools info | ||
## Install data file | ||
|
||
1. Download the offical, pre-compiled jamdict database (`jamdict-0.1a7.tar.xz`) from Google Drive [https://drive.google.com/drive/u/1/folders/1z4zF9ImZlNeTZZplflvvnpZfJp3WVLPk](https://drive.google.com/drive/u/1/folders/1z4zF9ImZlNeTZZplflvvnpZfJp3WVLPk) | ||
2. Extract and copy `jamdict.db` to jamdict data folder (defaulted to `~/.jamdict/data/jamdict.db`) | ||
3. To know where to copy data files | ||
|
||
```bash | ||
# initial setup (this command will create ~/.jamdict for you | ||
# it will also tell you where to copy the data files | ||
python3 -m jamdict info | ||
# Jamdict 0.1a7 | ||
# Python library for manipulating Jim Breen's JMdict, KanjiDic2, KRADFILE and JMnedict | ||
# | ||
# Basic configuration | ||
# ------------------------------------------------------------ | ||
# JAMDICT_HOME : /home/tuananh/.jamdict | ||
# Config file location: /home/tuananh/.jamdict/config.json | ||
# | ||
# Data files | ||
# ------------------------------------------------------------ | ||
# Jamdict DB location: /home/tuananh/.jamdict/data/jamdict.db - [OK] | ||
# JMDict XML file : /home/tuananh/.jamdict/data/JMdict_e.gz - [OK] | ||
# KanjiDic2 XML file : /home/tuananh/.jamdict/data/kanjidic2.xml.gz - [OK] | ||
# JMnedict XML file : /home/tuananh/.jamdict/data/JMnedict.xml.gz - [OK] | ||
``` | ||
|
||
## Command line tools | ||
|
||
To make sure that jamdict is configured properly, try to look up a word using command line | ||
|
||
# to look up a word using command line | ||
python3 -m jamdict.tools lookup たべる | ||
```bash | ||
python3 -m jamdict.tools lookup 言語学 | ||
======================================== | ||
Found entries | ||
======================================== | ||
Entry: 1358280 | Kj: 食べる, 喰べる | Kn: たべる | ||
Entry: 1264430 | Kj: 言語学 | Kn: げんごがく | ||
-------------------- | ||
1. to eat ((Ichidan verb|transitive verb)) | ||
2. to live on (e.g. a salary)/to live off/to subsist on | ||
1. linguistics ((noun (common) (futsuumeishi))) | ||
|
||
======================================== | ||
Found characters | ||
======================================== | ||
Char: 食 | Strokes: 9 | ||
Char: 言 | Strokes: 7 | ||
-------------------- | ||
Readings: shi2, si4, sig, sa, 식, 사, Thực, Tự, ショク, ジキ, く.う, く.らう, た.べる, は.む | ||
Meanings: eat, food | ||
Char: 喰 | Strokes: 12 | ||
Readings: yan2, eon, 언, Ngôn, Ngân, ゲン, ゴン, い.う, こと | ||
Meanings: say, word | ||
Char: 語 | Strokes: 14 | ||
-------------------- | ||
Readings: shi2, si4, sig, 식, Thặc, Thực, Tự, く.う, く.らう | ||
Meanings: eat, drink, receive (a blow), (kokuji) | ||
Readings: yu3, yu4, eo, 어, Ngữ, Ngứ, ゴ, かた.る, かた.らう | ||
Meanings: word, speech, language | ||
Char: 学 | Strokes: 8 | ||
-------------------- | ||
Readings: xue2, hag, 학, Học, ガク, まな.ぶ | ||
Meanings: study, learning, science | ||
|
||
No name was found. | ||
``` | ||
|
||
## Data | ||
XML files (JMdict_e.xml, kanjidic2.xml) must be downloaded and copy into `~/.jamdict/data` | ||
# Sample jamdict Python code | ||
|
||
I have mirrored these files to Google Drive so you can download there too: | ||
[https://drive.google.com/drive/folders/1z4zF9ImZlNeTZZplflvvnpZfJp3WVLPk](https://drive.google.com/drive/folders/1z4zF9ImZlNeTZZplflvvnpZfJp3WVLPk) | ||
```python | ||
from jamdict import Jamdict | ||
jmd = Jamdict() | ||
|
||
Official website | ||
# use wildcard matching to find anything starts with 食べ and ends with る | ||
result = jmd.lookup('食べ%る') | ||
|
||
# print all word entries | ||
for entry in result.entries: | ||
print(entry) | ||
|
||
# [id#1358280] たべる (食べる) : 1. to eat ((Ichidan verb|transitive verb)) 2. to live on (e.g. a salary)/to live off/to subsist on | ||
# [id#1358300] たべすぎる (食べ過ぎる) : to overeat ((Ichidan verb|transitive verb)) | ||
# [id#1852290] たべつける (食べ付ける) : to be used to eating ((Ichidan verb|transitive verb)) | ||
# [id#2145280] たべはじめる (食べ始める) : to start eating ((Ichidan verb)) | ||
# [id#2449430] たべかける (食べ掛ける) : to start eating ((Ichidan verb)) | ||
# [id#2671010] たべなれる (食べ慣れる) : to be used to eating/to become used to eating/to be accustomed to eating/to acquire a taste for ((Ichidan verb)) | ||
# [id#2765050] たべられる (食べられる) : 1. to be able to eat ((Ichidan verb|intransitive verb)) 2. to be edible/to be good to eat ((pre-noun adjectival (rentaishi))) | ||
# [id#2795790] たべくらべる (食べ比べる) : to taste and compare several dishes (or foods) of the same type ((Ichidan verb|transitive verb)) | ||
# [id#2807470] たべあわせる (食べ合わせる) : to eat together (various foods) ((Ichidan verb)) | ||
|
||
* JMdict: [http://edrdg.org/jmdict/edict_doc.html](http://edrdg.org/jmdict/edict_doc.html) | ||
* kanjidic2: [http://www.edrdg.org/kanjidic/kanjd2index.html](http://www.edrdg.org/kanjidic/kanjd2index.html) | ||
* KRADFILE: [http://www.edrdg.org/krad/kradinf.html](http://www.edrdg.org/krad/kradinf.html) | ||
# print all related characters | ||
for c in result.chars: | ||
print(repr(c)) | ||
|
||
# 食:9:eat,food | ||
# 喰:12:eat,drink,receive (a blow),(kokuji) | ||
# 過:12:overdo,exceed,go beyond,error | ||
# 付:5:adhere,attach,refer to,append | ||
# 始:8:commence,begin | ||
# 掛:11:hang,suspend,depend,arrive at,tax,pour | ||
# 慣:14:accustomed,get used to,become experienced | ||
# 比:4:compare,race,ratio,Philippines | ||
# 合:6:fit,suit,join,0.1 | ||
``` | ||
|
||
## Using KRAD/RADK mapping | ||
|
||
Jamdict has built-in support for KRAD/RADK (i.e. kanji-radical and radical-kanji mapping). | ||
The terminology of radicals/components used by Jamdict can be different from else where. | ||
|
||
- A radical in Jamdict is a principal component, each character has only one radical. | ||
- A character may be decomposed into several writing components. | ||
|
||
By default jamdict provides two maps: | ||
|
||
# Sample codes | ||
- jmd.krad is a Python dict that maps characters to list of components. | ||
- jmd.radk is a Python dict that maps each available components to a list of characters. | ||
|
||
```python | ||
>>> from jamdict import Jamdict | ||
>>> jmd = Jamdict() | ||
# use wildcard matching to find anything starts with 食べ and ends with る | ||
>>> result = jmd.lookup('食べ%る') | ||
# print all found word entries | ||
>>> for entry in result.entries: | ||
... print(entry) | ||
... | ||
[id#1358280] たべる (食べる) : 1. to eat ((Ichidan verb|transitive verb)) 2. to live on (e.g. a salary)/to live off/to subsist on | ||
[id#1358300] たべすぎる (食べ過ぎる) : to overeat ((Ichidan verb|transitive verb)) | ||
[id#1852290] たべつける (食べ付ける) : to be used to eating ((Ichidan verb|transitive verb)) | ||
[id#2145280] たべはじめる (食べ始める) : to start eating ((Ichidan verb)) | ||
[id#2449430] たべかける (食べ掛ける) : to start eating ((Ichidan verb)) | ||
[id#2671010] たべなれる (食べ慣れる) : to be used to eating/to become used to eating/to be accustomed to eating/to acquire a taste for ((Ichidan verb)) | ||
[id#2765050] たべられる (食べられる) : 1. to be able to eat ((Ichidan verb|intransitive verb)) 2. to be edible/to be good to eat ((pre-noun adjectival (rentaishi))) | ||
[id#2795790] たべくらべる (食べ比べる) : to taste and compare several dishes (or foods) of the same type ((Ichidan verb|transitive verb)) | ||
[id#2807470] たべあわせる (食べ合わせる) : to eat together (various foods) ((Ichidan verb)) | ||
# print all related characters | ||
>>> for c in result.chars: | ||
... print(repr(c)) | ||
... | ||
食:9:eat,food | ||
喰:12:eat,drink,receive (a blow),(kokuji) | ||
過:12:overdo,exceed,go beyond,error | ||
付:5:adhere,attach,refer to,append | ||
始:8:commence,begin | ||
掛:11:hang,suspend,depend,arrive at,tax,pour | ||
慣:14:accustomed,get used to,become experienced | ||
比:4:compare,race,ratio,Philippines | ||
合:6:fit,suit,join,0.1 | ||
# Find all writing components (often called "radicals") of the character 雲 | ||
print(jmd.krad['雲']) | ||
# ['一', '雨', '二', '厶'] | ||
|
||
# Find all characters with the component 鼎 | ||
chars = jmd.radk['鼎'] | ||
print(chars) | ||
# {'鼏', '鼒', '鼐', '鼎', '鼑'} | ||
|
||
# look up the characters info | ||
result = jmd.lookup(''.join(chars)) | ||
for c in result.chars: | ||
print(c, c.meanings()) | ||
# 鼏 ['cover of tripod cauldron'] | ||
# 鼒 ['large tripod cauldron with small'] | ||
# 鼐 ['incense tripod'] | ||
# 鼎 ['three legged kettle'] | ||
# 鼑 [] | ||
``` | ||
|
||
## Finding name entities | ||
|
||
```bash | ||
# Find all names with 鈴木 inside | ||
result = jmd.lookup('%鈴木%') | ||
for name in result.names: | ||
print(name) | ||
|
||
# [id#5025685] キューティーすずき (キューティー鈴木) : Kyu-ti- Suzuki (1969.10-) (full name of a particular person) | ||
# [id#5064867] パパイヤすずき (パパイヤ鈴木) : Papaiya Suzuki (full name of a particular person) | ||
# [id#5089076] ラジカルすずき (ラジカル鈴木) : Rajikaru Suzuki (full name of a particular person) | ||
# [id#5259356] きつねざきすずきひなた (狐崎鈴木日向) : Kitsunezakisuzukihinata (place name) | ||
# [id#5379158] こすずき (小鈴木) : Kosuzuki (family or surname) | ||
# [id#5398812] かみすずき (上鈴木) : Kamisuzuki (family or surname) | ||
# [id#5465787] かわすずき (川鈴木) : Kawasuzuki (family or surname) | ||
# [id#5499409] おおすずき (大鈴木) : Oosuzuki (family or surname) | ||
# [id#5711308] すすき (鈴木) : Susuki (family or surname) | ||
# ... | ||
``` | ||
## Exact matching | ||
Use exact matching for faster search | ||
```python | ||
# Find an entry (word, name entity) by idseq | ||
result = jmd.lookup('id#5711308') | ||
print(result.names[0]) | ||
# [id#5711308] すすき (鈴木) : Susuki (family or surname) | ||
result = jmd.lookup('id#1467640') | ||
print(result.entries[0]) | ||
# ねこ (猫) : 1. cat 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship | ||
|
||
# use exact matching to increase searching speed (thanks to @reem-codes) | ||
result = jmd.lookup('食べる') | ||
result = jmd.lookup('猫') | ||
|
||
for entry in result.entries: | ||
print(entry) | ||
|
||
>>> for entry in result.entries: | ||
... print(entry) | ||
... | ||
[id#1358280] たべる (食べる) : 1. to eat ((Ichidan verb|transitive verb)) 2. to live on (e.g. a salary)/to live off/to subsist on | ||
# [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship | ||
# [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi))) | ||
``` | ||
See `jamdict_demo.py` and `jamdict/tools.py` for more information. | ||
# Official website | ||
* JMdict: [http://edrdg.org/jmdict/edict_doc.html](http://edrdg.org/jmdict/edict_doc.html) | ||
* kanjidic2: [https://www.edrdg.org/wiki/index.php/KANJIDIC_Project](https://www.edrdg.org/wiki/index.php/KANJIDIC_Project) | ||
* JMnedict: [https://www.edrdg.org/enamdict/enamdict_doc.html](https://www.edrdg.org/enamdict/enamdict_doc.html) | ||
* KRADFILE: [http://www.edrdg.org/krad/kradinf.html](http://www.edrdg.org/krad/kradinf.html) | ||
# Contributors | ||
- [Matteo Fumagalli](https://github.com/matteofumagalli1275) | ||
- [Reem Alghamdi](https://github.com/reem-codes) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
{ | ||
"JAMDICT_HOME": "~/.jamdict", | ||
"JAMDICT_DATA": "{JAMDICT_HOME}/data", | ||
"JAMDICT_DB": "{JAMDICT_DATA}/jamdict.db", | ||
"JMDICT_XML": "{JAMDICT_DATA}/JMdict_e.gz", | ||
"KD2_XML": "{JAMDICT_DATA}/kanjidic2.xml.gz", | ||
"KRADFILE": "{JAMDICT_DATA}/kradfile-u.gz" | ||
"JAMDICT_HOME": "~/.jamdict", | ||
"JAMDICT_DATA": "{JAMDICT_HOME}/data", | ||
"JAMDICT_DB": "{JAMDICT_DATA}/jamdict.db", | ||
"JMDICT_XML": "{JAMDICT_DATA}/JMdict_e.gz", | ||
"JMNEDICT_XML": "{JAMDICT_DATA}/JMnedict.xml.gz", | ||
"KD2_XML": "{JAMDICT_DATA}/kanjidic2.xml.gz", | ||
"KRADFILE": "{JAMDICT_DATA}/kradfile-u.gz" | ||
} |
Oops, something went wrong.