Выпить и закусить

Интересно посмотреть, какие дают названия методикам оценки качества машинного перевода и объема постредактирования (evaluation metrics), ну и, конечно, что в них анализируют.

 

Beer

Beer is a metric trained for high correlation with human ranking by using learning-to-rank training methods. For evaluation of lexical accuracy it uses sub-word units (character n-grams) while for measuring word order it uses hierarchical representations based on PETs (permutation trees). During the last WMT metrics tasks, BEER has shown high correlation with human judgments both on the sentence and the corpus levels. In this paper we will show how BEER can be used for (i) full evaluation of MT output, (ii) isolated evaluation of word order and (iii) tuning MT systems.

Подробнее >>>

 

Rataouille

Rataouille combines MeteorWSD with nine other metrics for evaluation and outperforms the best metric (BEER) involved in its computation.

Подробнее >>>

 

CIDEr

Our protocol enables an objective comparison of machine generation approaches based on their “human-likeness”, without having to make arbitrary calls on weighing content, grammar, saliency, etc. with respect to each other.

Подробнее >>>

 

Parmesan

This paper describes Parmesan, our submission to the 2014 Workshop on Statistical Machine Translation (WMT) metrics task for evaluation English-to-Czech translation. Parmesan first performs targeted paraphrasing of reference sentences, then it computes the Meteor score using only the exact match on these new reference sentences.

Подробнее >>>