We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如题,原论文中report的sighan15的 sentence-wise的 detect和correct分别是 73.5和66.4,描述使用的训练集也是sighan13-15的三个training set以及他们自己构建多达5million的 news title数据。你这边sentence一口气提到了79.4,还是仅用sighan得数据finetune,这个差距也太大了吧
有没有可能你的统计指标跟它不一样呢。。我个人怀疑你用的是sighan15的全量数据进行测试,即无错误的负样本也计入了 但实际上后面一系列的csc文章,基本都只用正样本进行测试的
The text was updated successfully, but these errors were encountered:
我很好奇,评测数据中没有负样本怎么计算precision和F1,数据或评测脚本均已开源,您可自行评测,或参照pycorretor仓库的相关函数重新进行评测。pycorrctor macbert4csc
关于指标的提升,我想额外说一句,在本实现之前,很少有实现保留预训练的MLMHead层权重去做FineTune的,而本仓库包括BBCM仓库在训练时,都保留了该层预训练参数,您要是感兴趣的话,可以做一个不加载该层预训练参数的消融实验,看是否能把指标降到您预期的范围内。
Sorry, something went wrong.
No branches or pull requests
如题,原论文中report的sighan15的 sentence-wise的 detect和correct分别是 73.5和66.4,描述使用的训练集也是sighan13-15的三个training set以及他们自己构建多达5million的 news title数据。你这边sentence一口气提到了79.4,还是仅用sighan得数据finetune,这个差距也太大了吧
有没有可能你的统计指标跟它不一样呢。。我个人怀疑你用的是sighan15的全量数据进行测试,即无错误的负样本也计入了
但实际上后面一系列的csc文章,基本都只用正样本进行测试的
The text was updated successfully, but these errors were encountered: