chrF not compatible with chrF++, Moses and NLTK for sentence-level smoothing #144

ozancaglayan · 2021-03-03T09:49:11Z

Created a new issue for this problem, to detach it from #121
See the comment: #121 (comment)

To summarize, sacreBLEU applies effective order smoothing whereas others smooth with small value eps. The fix is already in my branch for sacreBLEU 2.0.0. The problem is to agree on whether we accept changing behavior for sentence-level chrF for >=2.0.0.

The text was updated successfully, but these errors were encountered:

martinpopel · 2021-03-03T09:57:05Z

Yes, I vote for being compatible with the chrF++ implementation of chrF by default. We can keep the effective-order smoothing as a non-default option, or drop it (for chrF).

- Allow using epsilon smoothing (#144) - Add multi-reference support - Add chrF++ support through the word_order argument (#124)

- Build: Add Windows and OS X testing to github workflow - Improve documentation and type annotations. - Drop `Python < 3.6` support and migrate to f-strings. - Drop input type manipulation through `isinstance` checks. If the user does not obey to the expected annotations, exceptions will be raised. Robustness attempts lead to confusions and obfuscated score errors in the past (fixes #121) - Use colored strings in tabular outputs (multi-system evaluation mode) through the help of `colorama` package. - tokenizers: Add caching to tokenizers which seem to speed up things a bit. - `intl` tokenizer: Use `regex` module. Speed goes from ~4 seconds to ~0.6 seconds for a particular test set evaluation. (fixes #46) - Signature: Formatting changed (mostly to remove '+' separator as it was interfering with chrF++). The field separator is now '|' and key values are separated with ':' rather than '.'. - Metrics: Scale all metrics into the [0, 100] range (fixes #140) - BLEU: In case of no n-gram matches at all, skip smoothing and return 0.0 BLEU (fixes #141). - BLEU: allow modifying max_ngram_order (fixes #156) - CHRF: Added multi-reference support, verified the scores against chrF++.py, added test case. - CHRF: Added chrF+ support through `word_order` argument. Added test cases against chrF++.py. Exposed it through the CLI (--chrf-word-order) (fixes #124) - CHRF: Add possibility to disable effective order smoothing (pass --chrf-eps-smoothing). This way, the scores obtained are exactly the same as chrF++, Moses and NLTK implementations. We keep the effective ordering as the default for compatibility, since this only affects sentence-level scoring with very short sentences. (fixes #144) - CLI: Allow modifying TER arguments through CLI. We still keep the TERCOM defaults. - CLI: Prefix metric-specific arguments with --chrf and --ter. To maintain compatibility, BLEU argument names are kept the same. - CLI: Added `--format/-f` flag. The single-system output mode is now `json` by default. If you want to keep the old text format persistently, you can export `SACREBLEU_FORMAT=text` into your shell. - CLI: sacreBLEU now supports evaluating multiple systems for a given test set in an efficient way. Through the use of `tabulate` package, the results are nicely rendered into a plain text table, LaTeX, HTML or RST (cf. --format/-f argument). The systems can be either given as a list of plain text files to `-i/--input` or as a tab-separated single stream redirected into `STDIN`. In the former case, the basenames of the files will be automatically used as system names. - Statistical tests: sacreBLEU now supports confidence interval estimation through bootstrap resampling for single-system evaluation (`--confidence` flag) as well as paired bootstrap resampling (`--paired-bs`) and paired approximate randomization tests (`--paired-ar`) when evaluating multiple systems (fixes #40 and fixes #78).

ozancaglayan added this to the 2.0.0 milestone Mar 3, 2021

ozancaglayan added a commit that referenced this issue Mar 26, 2021

CHRF: Adapt to new metric API

459bf06

- Allow using epsilon smoothing (#144) - Add multi-reference support - Add chrF++ support through the word_order argument (#124)

ozancaglayan mentioned this issue Mar 26, 2021

Changes for 2.0.0 #152

Merged

ozancaglayan linked a pull request Mar 26, 2021 that will close this issue

Changes for 2.0.0 #152

Merged

ozancaglayan closed this as completed in #152 Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chrF not compatible with chrF++, Moses and NLTK for sentence-level smoothing #144

chrF not compatible with chrF++, Moses and NLTK for sentence-level smoothing #144

ozancaglayan commented Mar 3, 2021

martinpopel commented Mar 3, 2021

chrF not compatible with chrF++, Moses and NLTK for sentence-level smoothing #144

chrF not compatible with chrF++, Moses and NLTK for sentence-level smoothing #144

Comments

ozancaglayan commented Mar 3, 2021

martinpopel commented Mar 3, 2021