##Introduction: ###Gender guess for Chinese names in English(Pinyin) form
- language: Python
- method: Naive Bayes with different weight
- dataset: 20 million Chinese name
- accuracy: 81% for 1500 random selected samples
###Possible usage field:
- User registration on websites or New contact creation. Based on the names,
we can pre select the gender option.
- Gender analysis for Chinese people who publish papers on English journals.
##How to use: ###install:
1. use pip
$ pip install chgender
2. clone from the git
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install
1.use as module
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)
2.use on bash
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
3.use for batch
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
- python
- 使用朴素贝叶斯法,并对不同的位置的字分配权重
- 基于2000万姓名数据量
- 对1500个随机样本进行测试,准确率81%
##用法: ###安装方式:
$ pip install chgender
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896