-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
认为crf里的解码tag方法有误。我修改后的请看一下。 #703
Labels
Comments
我的问题是:比使用CRF++测试的得分低,不知道是不是算法的原因 |
我也遇到类似问题,所以才修改了crf的解码方法。目前测试的几个案例(和crf++分词不同的句子),用修改后的解码方法发现都一样了。 所以才发出来一起看看。 |
感谢指正,当时写viterbi的时候确实遗漏了记录最后两个时刻的16条路径的最大得分。现在已经用滚动数组实现,更省内存。欢迎测试,如果有句子的结果与你的或CRF++不一致,欢迎报告。 |
我今天在SIGHAN2005 pku上测试了一下,修改之前的代码和CRF++0.58 viterbi解码一致,新代码F值反而降低了0.1%,检查原因是句子开头maxScoreAt[0] 值全部为0。所以这次修改后评分不升反降了? |
有意思,我也发现修改后有部分句子变差了。
|
我的意思是对前两个标签预测的时候,没有带上net[0]的score,导致效果变差了,这样改了一下:
|
太好了,发现了个新bug。麻烦提个PR如何? |
hankcs
added a commit
that referenced
this issue
Jan 13, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
/**
* 维特比后向算法标注
*
* @param table
*/
public void tag(Table table)
{
int size = table.size();
if (size == 0) return;
int tagSize = id2tag.length;
double[][] net = new double[size][tagSize];
for (int i = 0; i < size; ++i)
{
LinkedList<double[]> scoreList = computeScoreList(table, i); //一个char对应的状态BMSE的概率。double[],特征个数!
for (int tag = 0; tag < tagSize; ++tag)//4个状态!
{
net[i][tag] = computeScore(scoreList, tag);
}
}
// double maxScore = -1e10;
// int maxTag = 0;
// for (int tag = 0; tag < net[size - 1].length; ++tag)
// {
// if (net[size - 1][tag] > maxScore)
// {
// maxScore = net[size - 1][tag];
// maxTag = tag;
// }
// }
//
// table.setLast(size - 1, id2tag[maxTag]);
// maxTag = from[size - 1][maxTag];
// for (int i = size - 2; i > 0; --i)
// {
// table.setLast(i, id2tag[maxTag]);
// maxTag = from[i][maxTag];
// }
// table.setLast(0, id2tag[maxTag]);
}
The text was updated successfully, but these errors were encountered: