PEMT

The funny damage of ignorant enthusiasm, or where is the sweet spot? (By Serge Gladkoff, Svetlana Svetova)

We live in times of change; of change brought to us by technology. Technology disrupts many industries, and as it does so it promises change, benefits, speed, low cost, etc. However, as technology becomes more sophisticated and advanced, so too should be smarter and more thoughtful.

I recently downloaded and reviewed Memsource article titled "How to Unlock the Potential of Machine Translation." It was a nice layout, presented by a respected firm. As I read it, though, a question came to mind: "Is it just me who is startled and ridiculed, or is it the entire industry? Overwhelmed by a sense of uncertainty, I contacted Svetlana Svetova, a prominent expert in Trados SDL, translation-memory technology for machine translation and post-editing, to find out whether her opinion would match or deepen my thinking.

We both are pro-technology. Each of us conducts a lot of research on MT and its real-world applications, including our service business and R&D.

Here I publish an excerpt from our conversation, in which, as in a literary piece, Svetlana is unequivocally the co-author.

Sergey: Svetlana, please give me your assessment of the following excerpt:

For organizations breaking into and establishing themselves in new markets, developing an effective localization strategy that meets market needs requires considerable planning. Human-only translation, either in-house, or outsourced, has been the norm in many global companies, but it isn't scalable, is very costly, and has a long turnaround time. Also, it isn't necessary for many use cases. Machine translation (MT), combined with human translators who edit the MT output – post-editors – can fill this gap.

It is understandable that MT developers may overhype machine translation. They love their product and are very happy with their baby. They feel that they've produced something meaningful. However, this is a very strange statement from a company that is associated with a TM-based CAT tool.

Svetlana: Sergey, I completely agree. Where is TM in this picture? This paragraph implies that one can simply take MT output and edit it. You don't seem to need a TM for that, or a glossary, based on the tone of the text because it makes no mention of the CAT tool functionality or process.

Sergey: Well, what about this:

There are two main advantages to using MT. The first is that it can increase the efficiency of "traditional" human translation. A professional linguist edits the target text to "near-human level", correcting any grammatical or linguistic errors, and the estimated productivity rate is approximately double, compared to normal translation.

Svetlana: The portion about a "double" increase in productivity is a gross overstatement. In reality, the productivity increase depends on the language pair and on the degree to which one needs to edit the MT output. If the only goal is to achieve the minimum acceptable degree of factual accuracy, the post-editor needs to verify the terminology and the meaning transfer. However, the language will be poor. It will be too literal, and consequently it won't be idiomatic in the target language. It will also be necessary to verify grammar, not to mention the fact that if you really want to achieve near-human quality, you'll need to engage a second pair of eyes.

Serge: Yes, that's right, because translation per se is only part of the process. We need to prepare the files, organize the workflow, confirm the availability of participants--who have their own busy schedules--and then, after all the process steps are completed, we need to prepare for a proper handback. Thus, the resulting productivity increase is much less than the net translation productivity increase. That's true even for language pairs that take MT well, not to mention the entire "loaded productivity" of the overall project.

Svetlana: In the majority of cases the client isn't even interested in underlying technology. He just needs a very good translation. Moreover, raw "post-editing only" productivity has little to do with real job effort or turnaround time. The phrase "a 50% increase" is a very, very misleading, ultimately harmful stretch.

The second is that it makes it possible to translate content which otherwise would not be translated, typically because opting for human-only translation would be too costly or time-consuming. The end result will still be a target text where any grammatical, syntactical, and semantic errors are eliminated.

Svetlana: You can only achieve "any grammatical, syntactical, and semantic errors eliminated" if you have very good PEMT resources and a proper process. That's a very big "if." Again, the cost is far from zero. The cost of a good PEMT process is comparable to the cost of a human-only process, simply because the market pressure pushes "human only" down to "good PEMT." In other words, pure "human-only" is nearly nonexistent these days. Too many providers are using MT without telling the customers, hoping to capture all the gain from technology. The question is how well they edit and how carefully they check their work.

Serge: That's precisely why this statement is completely false and misleading. Nowadays you cannot have a significant cost savings with the use of MT. Everyone who can use it to achieve acceptable result according to client specifications is using it already!

Over the past few years, we have seen the use of MT increase and based on recent market estimations, MT usage will continue to grow, especially given the number of tech giants entering the space – Microsoft, Google, Amazon, and now Apple. The MT market has been valued between USD 130 million to USD 400 million (Nimdzi, 2019) and is estimated to exceed USD 1.5 billion by 2024 (Marketwatch, 2019).

Svetlana: What exactly is the "MT market"?

Will "the use of MT increase" among non-professional users or professional practitioners?

Serge: I don't even understand why this statement is here in an article about the use of MT by professionals.

One key reason why MT usage is high is that general MT quality has improved significantly, thanks to increasing investment and research into neural machine translation (NMT) and deep learning. The quality improvements are reflected in Memsource data from the past few years.

Serge: For what languages, directions and fields has the MT quality improved significantly? Is this written by an MT developer who won't hesitate to make claims that everything they do is constantly improving and expanding?

Svetlana: Congratulations on whoever wrote the opening statement, which implies that Memsource is peeking into client data for commercial purposes such as marketing Memsource.

Serge: Potentially, this opens the door to a class action against Memsource for unauthorized use of client data. A good argument against Memsource goes straight into the hands of Memsource's competition. For example, XTM says it keeps customer data completely secure and will not access it. That may be wrong, but in any case they don't publish their analysis or comments about customer data.

Despite this overall improvement in quality, high-quality output is not guaranteed.

Svetlana: This is simply wonderful. I think it's the best statement in the whole article. This phrase should be tattooed on the foreheads of all MT providers and evangelists.

MT post-editing may be the most effective way to start out with machine translation. Why? It helps eliminate one of the main risks associated with machine translation – its volatile quality. By providing machine translation as an additional resource to your translators, they will be able to produce translations faster without the risk of compromising translation quality. In fact, producing high-quality human translation and using machine translation as an aid has become very common.

Serge: Frankly, this phrase is a jewel of absurdity. It might be even better than the previous one.

I once heard a nasty joke: A superior says to his subordinate who constantly makes mistakes, "You're so stupid that you'd win second place in a worldwide contest of stupidity." The poor chap asks, "Why only second?" Then the infuriated manager shouts his answer: "BECAUSE YOU'RE SO STUPID THAT YOU'D FAIL TO WIN!"

(Why did this joke surfaced my memory in relation to this article?)

Machine translation is the starting point for approximately 35% of content translated by professional human translators according to Memsource data. One major advantage is that almost any content is suitable for post-editing - perhaps with the exception of highly creative marketing or literary content that needs to be transcreated rather than translated. Otherwise, machine translation post-editing is suitable as every single word is checked - and if needed - corrected by human translators.

Serge: Here goes "Memsource data" again. It isn't "Memsource data," it's client data! Thank you for yet another confirmation that Memsource is "looking into" its client data!

Again, here goes the statement that "almost any content is suitable for post-editing." It's only 35% according to the previous sentence, but 35% is "almost" 100%, I agree. It is exemplary contradiction in this material, and not the only one.

Svetlana: Well, what on Earth does the last sentence mean?

Equally, if you define yourself as a global company or are looking to break into the international market, it's important to have a localization strategy in place which meets your global clients' expectations. When implemented correctly, along with other translation resources, MT post-editing can reduce your costs and time to market, allowing you to localize a larger volume of content and reach more clients or prospects worldwide.

Svetlana: I get it: Dial 911! The MT manufacturer has hijacked Memsource, taken all of its employees hostage in the basement and is writing the usual insane MT hype from Memsource computers!

To calculate the potential savings from MT post-editing, we compared the cost of translating 1000 words using MT post-editing with what it would have cost to translate the same 1000 words from scratch. (The 1000 words is taken from aggregated data gathered over 1 year which has been normalized to 1000 words. Also, the example assumes no other translation resources, such as translation memory, were used).

Serge: "The thousand words is taken from aggregated data gathered over 1 year which has been normalized to a thousand words" – the industry professional or mathematician, would not understand this sentence, but legal counsel and judge will take it as useful proof #3 that Memsource is carrying out unauthorized access of clients' data.

Svetlana: Holy cow! Can the TM CAT tool provider actually state, "No TM was used"!? Why use Memsource at all as a CAT tool? There are much simpler--and less expensive--ways to edit MT output without TM functionality. Definitely, Memsource is under attack. Call the police or a doctor. Better yet, call a priest who can drive out the demon!

By applying the net rate scheme and identifying the different match types, the number of words to pay decreases.

Svetlana: This will increase the quality! And all the savings needs to be given to MT developer. (Pun intended.)

One study on MT post-editing conducted by Intertranslations, reported a 40% average increase in translation productivity per hour (Intertranslations, 2019).

Serge: The industry consensus is that MT offers a productivity increase in the range of 10% to 30% at most, with strong dependence on many factors. Of course, the MT producer–which keeps Memsource hostage--picks up the most optimistic outliar . . . . Oh, excuse me, I should have said outlier.

Data from Memsource reflects this too. The main aim of this data analysis was to find trends in productivity, rather than carry out a study into precise productivity values. The translation productivity data shown in the graph is higher than the industry average of 2000-2500 words/day. We suspect this is partly due to a certain level of noise in the data which is difficult to remove and increases the productivity values. But, as this noise is common to all language pairs and workflows, the overall productivity trends are not affected. As Memsource is a productivity tool, it is likely that this has also driven up productivity values. The data is based on content translated from English-German, but in every language combination, there is a similar trend.

Serge: Proof number 4 that Memsource is looking into the data its clients have trusted to Memsource.

Svetlana: Not only is Memsource looking into it, but it's also abusing it by making incorrect statements inspired by some marketing goal. Such statements definitely aren't based on the serious research of that data, because the logic in this paragraph is completely lost: "There is a productivity increase, but perhaps it's just because Memsource is a productivity tool. Nevertheless, we confirm that the MT-related productivity increase is 40%." Really?

There is often an assumption that only high-quality MT output improves translation productivity. But, this data shows that even for lower quality MT output, with a score between 50-60, translation productivity is greater compared with translating from scratch with no MT. And for "perfect" MT output (with a score of 100), translation productivity triples for short sentences and quadruples for long sentences, compared to translating from scratch.

Svetlana: What score? "Perfect" MT output is . . . what, human-quality translation? Why just a triple or quadruple productivity increase if it's already an HQ translation with a score of 100? The paragraph is so interesting with its puzzle of absurdity. Alice in Wonderland would be proud of it. The only problem is that unlike Alice in her adventure, this scene doesn't make any sense at all.

Implementing MT post-editing sections:

A. Select appropriate content for machine translation

B. Check the personal data policy of your MT provider/s carefully

C. Create a team of post-editors

D. Run (large) samples before deployment

E. Agree on a pricing model

Serge: Hey, what does "A" mean? You just said everything is suitable for post-editing. Why should we be concerned about "B" if we know that Memsource is looking into our data, anyway? The "C" is, of course, the easiest step of all [laughter]. "D" seems like a waste of time. After all, we know there's a productivity improvement of 50% in any content. Supposedly, the speed can even quadruple, so, why bother?

Svetlana: Great advice on the deployment process! There's only one small problem with "E": After this article some naïve clients may decide they don't need to pay for translation at all. I'm sure that Memsource sales will suffer a bit as well because . . . without TM, why would one need a CAT tool?

In general, when you adopt an MT post-editing strategy, you can use MT for any other types of content.

Svetlana: The closing statement is simply perfect.

CONCLUSION:

At the end of this discussion we both agreed that it's difficult to think about any material that would be more harmful to the industry as well as for Memsource as the producer of a CAT tool.

Perhaps we see the infiltration of a double agent who's ostensibly working for Memsource but is also benefiting the competition. In that case, the competition would suffer as well. So, apparently we have a case of hostage-taking, marijuana or demonic possession.

Neither of us was able to understand what, for God's sake, the purpose of the article might be. It doesn't present a logical argument, so any intention it might have is lost to us. What have they been trying to say? Why and how this would help a CAT tool provider?

We live in times when "pro-MT" means not defending MT or overhyping its use without applying responsible usage and engagement of language-service professionals. "Pro-MT" must mean denying trivial things and confusing content providers and clients about the best practices of MT deployment.

Pro-MT and professionalism also means using ALL the best of breed tools and technologies for productive work to deliver translation of the client-required quality at best possible cost and turnaround time.

This is what our tools of Logrus Global Localization cloud ensure. They forge simplicity, TM management and advanced noise-cancellation triple quality boost of multi-engine NMT at the fingertips of translator and editor.

Check out our cloud-based Memose, Termlode, Prospector and other tools to see how you can get the fusion of technologies without overhyping and shooting too low or too high in this very competitive and complex technological landscape!

Link