-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stopword词典加载问题 #530
Comments
通过分析代码,真正的问题发生在MDAG.java中 将原来的IOAdapter.open(dataFile.getAbsolutePath())改成 IOAdapter.open(dataFile.getPath())即可 |
感谢建议
|
我的版本是1.3.2的,上面写成了1.3.0了,写错了。 |
感谢建议,以File参数构造MDAG的确与InputStream不兼容。现在已经改为直接由IOAdapter打开的InputStream读取,欢迎测试。 |
我现在用的是hanlp 1.3.0版本. 在分析CoreStopWordDictionary.java发现以下词典加载语句:
dictionary = new StopWordDictionary(new File(HanLP.Config.CoreStopWordDictionaryPath));
之前的核心词典,用户自定义词典等均采用以下方式。以核心词典为例:CoreDictionary.java
br = new BufferedReader(new InputStreamReader(IOUtil.newInputStream(path), "UTF-8"));
是采用IOUtil的统一接口。
而StopWordDictionary直接使用了File来做,造成了不统一。是否考虑对CoreStopWordDictionary建立统一性?
因为我自己定义的JarIOAdapter.java:
public class JarIOAdapter implements IIOAdapter
{
@OverRide
public InputStream open(String path) throws FileNotFoundException
{
/*
采用第一行的方式加载资料会在分布式环境报错
改用第二行的方式
*/
//return ClassLoader.getSystemClassLoader().getResourceAsStream(path);
return JarIOAdapter.class.getClassLoader().getResourceAsStream(path);
}
}
这里是实现代码与词典数据的分离,单独把hanlp.properties与data目录做成一个jar。但由于CoreStopDictionary.java读文件接口不统一,导致读不到停用词典文件。
作者是否有意把代码与词典数据分成两个jar包,我这边已差不多完成,可以提交代码
The text was updated successfully, but these errors were encountered: