Прекрасная, очень рекомендуемая к прочтению статья: Mathieu Gautier. Takeaways from the 2019 Machine Translation Summit in Dublin.
Всего несколько цитат.
“Has machine translation (MT) become so good that we are leaving massive productivity gains on the table when we translate from scratch? In other words, are the non-MT users of today the non-CAT tool users of the early 2000s (a.k.a. dinosaurs)?”
If you don’t understand how neural machine translation (NMT) actually works, you’re in good company.
NMT is still based on statistical methods similar to SMT. The big shift is that instead of stitching building blocks together, it translates through a sophisticated sequence of encoding and decoding based on neural networks, a notion early computer scientists saw as “mystical”, and that some language researchers hope might reveal a universal grammar.
NMT engines are essentially a black box.
Speakers noted throughout the conference that the dust has yet to settle in the industry since the advent of NMT.
Successful use cases in a wide range of contexts were presented at the summit, including at the European Commission, a major Swiss bank, an Italian agency specializing in patents and e-commerce companies like eBay and Alibaba. The productivity gains range from marginal (<10%) to massive (>50%).
There was no talk at the summit of any emerging successor to NMT , which replaced the statistics model as the state-of-the-art around 2016, which itself overtook the rule-based model in the early 2000s. Nor does there seem to be an expectation of any upcoming massive breakthroughs. Because machine translation performance is highly dependent on training data, domain adaptation as well as customized upstream and downstream processes, the biggest gains to be made may be context-specific, rather than from improvements to the general model.
One downstream technique is called automatic post-editing , where the translation produced by the machine is corrected by a machine that is trained differently. For example, the post-editing machine can learn from the corrections made by humans and automatically apply those corrections to the machine translation.
Harvesting, aligning and cleaning data have become major activities in the translation industry, and programs like ParaCrawl and Sketch Engine are joining PDF converters and alignment software in an increasing number of translators’ toolkit.