Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修正了核心字典的”每xx"词性 #524

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 43 additions & 43 deletions data/dictionary/CoreNatureDictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81710,68 +81710,68 @@
母龟 n 1
毎 nz 10
每 rz 4131
每一年 nz 31
每一方 nz 1
每一年 t 31
每一方 rz 1
每一次 d 188
每一步 d 44
每七年 nz 1
每个 r 2135
每五年 nz 4
每亩 r 117
每七年 t 1
每个 m 2135
每五年 t 4
每亩 m 117
每人 r 904
每件 d 61
每份 r 17
每件 m 61
每份 m 17
每位 r 201
每况愈下 vl 22
每包 nz 16
每匹 nz 2
每十年 nz 5
每半年 nz 24
每吨 q 123
每周 r 603
每周三 nz 21
每四年 nz 10
每况愈下 i 22
每包 m 16
每匹 m 2
每十年 t 5
每半年 t 24
每吨 m 123
每周 t 603
每周三 t 21
每四年 t 10
每回 nz 8
每场 r 45
每块 r 16
每夜 nz 14
每天 r 5072
每头 nz 15
每套 r 23
每场 m 45
每块 m 16
每夜 t 14
每天 t 5072
每头 m 15
每套 t 23
每家 r 117
每局 r 2
每层 d 27
每年 r 3400
每张 r 116
每当 p 185
每层 m 27
每年 t 3400
每张 m 116
每当 d 185
每户 r 120
每批 nz 7
每排 nz 17
每支 r 16
每支 m 16
每斤 nz 141
每日 r 1636
每日 t 1636
每旬 nz 1
每时每刻 bl 15
每星期 d 17
每晚 r 248
每月 r 1329
每星期 t 17
每晚 t 248
每月 t 1329
每期 r 36
每条 d 54
每条 m 54
每样 nz 5
每桶 r 14
每次 r 1501
每桶 m 14
每次 d 1501
每段路 nz 1
每每 d 59
每片 d 17
每瓶 nz 35
每片 m 17
每瓶 m 35
每种 r 52
每秒 d 35
每秒钟 nz 6
每立方 nz 2
每笔 nz 37
每篇 r 11
每秒 t 35
每秒钟 t 6
每立方 m 2
每笔 m 37
每篇 m 11
每类 nz 4
每组 r 40
每组 m 40
每股 r 70
每节 r 24
每英寸 nz 1
Expand Down
1 change: 1 addition & 0 deletions src/test/java/com/hankcs/demo/DemoSegment.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ public class DemoSegment
public static void main(String[] args)
{
String[] testCase = new String[]{
"每年我都会去一次大理",
"商品和服务",
"结婚的和尚未结婚的确实在干扰分词啊",
"买水果然后来世博园最后去世博会",
Expand Down
83 changes: 83 additions & 0 deletions src/test/java/com/hankcs/test/CleanFile.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
package com.hankcs.test;



import org.junit.Test;

import java.io.File;

/**
* Created by linming on 2017/3/7.
*/
public class CleanFile {

String PATH = "C:/code/github/HanLP-hankcs";

@Test
public void cleanBinUnderDicts() {
String path = PATH + "/dicts/";
deleteFile(path, ".txt.bin");
}

@Test
public void cleanBinUnderHanlpCustom() {
String path = PATH + "/data/dictionary/custom";
deleteFile(path, ".txt.bin");
}

@Test
public void cleanBinUnderHanlpCore() {
String path = PATH;

String file1 = path + "/data/dictionary/stopwords.txt.bin";
deleteFile(file1, ".bin");

String file2 = path + "/data/dictionary/CoreNatureDictionary.txt.bin";
deleteFile(file2, ".bin");

String file3 = path + "/data/dictionary/CoreNatureDictionary.tr.txt.bin";
deleteFile(file3, ".bin");

String file4 = path + "/data/dictionary/CoreNatureDictionary.ngram.txt.table.bin";
deleteFile(file4, ".bin");

String file5 = path + "/data/dictionary/CoreNatureDictionary.ngram.mini.txt.table.bin";
deleteFile(file5, ".bin");

String file6 = path + "/data/dictionary/CoreNatureDictionary.mini.txt.bin";
deleteFile(file6, ".bin");
}

@Test
public void cleanBinUnderHanlpDat() {
String path = PATH + "/data/dictionary/";
deleteFile(path, ".dat");
}


@Test
public void cleanAll() {
cleanBinUnderDicts();
cleanBinUnderHanlpCustom();
cleanBinUnderHanlpCore();
cleanBinUnderHanlpDat();
}

public static void deleteFile(String strPath, String suffix) {
File dir = new File(strPath);
if(dir != null && dir.isFile() && dir.getName().endsWith(suffix)) {
String strFileName = dir.getAbsolutePath();
dir.delete();
System.out.println("---" + strFileName);
}

File[] files = dir.listFiles(); // 该文件目录下文件全部放入数组
if (files != null) {
for (int i = 0; i < files.length; i++) {
deleteFile(files[i].getAbsolutePath(), suffix);
}

}
}

}
31 changes: 31 additions & 0 deletions src/test/resources/hanlp.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#本配置文件中的路径的根目录,根目录+其他路径=绝对路径
#Windows用户请注意,路径分隔符统一使用/
root=C:/code/github/HanLP-hankcs
#核心词典路径
CoreDictionaryPath=data/dictionary/CoreNatureDictionary.txt
#2元语法词典路径
BiGramDictionaryPath=data/dictionary/CoreNatureDictionary.ngram.txt
#核心词典词性转移矩阵路径
CoreDictionaryTransformMatrixDictionaryPath=data/dictionary/CoreNatureDictionary.tr.txt
#停用词词典路径
CoreStopWordDictionaryPath=data/dictionary/stopwords.txt
#同义词词典路径
CoreSynonymDictionaryDictionaryPath=data/dictionary/synonym/CoreSynonym.txt
#人名词典路径
PersonDictionaryPath=data/dictionary/person/nr.txt
#人名词典转移矩阵路径
PersonDictionaryTrPath=data/dictionary/person/nr.tr.txt
#繁简词典根目录
tcDictionaryRoot=data/dictionary/tc
#自定义词典路径,用;隔开多个自定义词典,空格开头表示在同一个目录,使用“文件名 词性”形式则表示这个词典的词性默认是该词性。优先级递减。
#另外data/dictionary/custom/CustomDictionary.txt是个高质量的词库,请不要删除。所有词典统一使用UTF-8编码。
CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; 现代汉语补充词库.txt; 全国地名大全.txt ns; 人名词典.txt; 机构名词典.txt; 上海地名.txt ns;data/dictionary/person/nrf.txt nrf;
#CRF分词模型路径
CRFSegmentModelPath=data/model/segment/CRFSegmentModel.txt
#HMM分词模型
HMMSegmentModelPath=data/model/segment/HMMSegmentModel.bin
#分词结果是否展示词性
ShowTermNature=true
#IO适配器,实现com.hankcs.hanlp.corpus.io.IIOAdapter接口以在不同的平台(Hadoop、Redis等)上运行HanLP
#默认的IO适配器如下,该适配器是基于普通文件系统的。
#IOAdapter=com.hankcs.hanlp.corpus.io.FileIOAdapter