Many companies make use of machine translation to realize various translation needs. This is one of the reasons why SAP provides machine translation focused on SAP and enterprise content via the Document Translation Service, one of the services available in SAP Business Technology Platform.
The service is already being used in many different business scenarios across SAP, maybe you have come across one or the other. However, machine translation technology continuously evolves and is still a busy field of research. To maintain a high-quality machine translation offering for different purposes within businesses that want to provide multilingual content, the machine translation system needs to keep up with current technology. Over the past years, we have seen an increasing demand for translation of conversational content in a business context, for example support chat dialogues, as highlighted in this blog by Janos Nagy.
The challenge in handling conversational content is to allow for incorrect input while maintaining translation quality. Conversational content often includes lower quality source input with typos, missing or inconsistent casing, lack of punctuation, informal words, etc.
Here you see some exemplary input
|User||why my gdm is not working|
|User||it says .service file is not there|
|Support||what are you doing exactly|
|Support||where is it exactly|
In the business context, conversational content is quite far removed from the oftentimes curated content such as learning or marketing materials. Looking to provide improved quality for this type of content, adding more previously translated data alone is not the answer. Partly, because the amount of translated conversational content needed would exceed available resources and in parts, because the different nature of conversational content could influence the translations of formal content, which should remain formal. There is ongoing research in the machine translation research community on how to tackle such input and improve machine translation output on such data.
The machine translation team at Language Experience investigated several methods from current research over several months. Following a research-driven approach, we were able to judge the impact of specific methods in our systems and incorporate the most promising techniques to improve quality for conversational content. Meanwhile, there were no degradations for other content types, which is of paramount importance for our offering. We are happy to report that a publication detailing our findings and experiences was accepted at the EAMT conference 2022, an established conference in the machine translation area (https://eamt.org/). Our paper was peer-reviewed to be accepted and a member of our team was able to attend the conference, present our paper and mingle with the research community.
“Hi, how can I help you?” Improving Machine Translation of Conversational Content in a Business Context – ACL Anthology provides an insight into our research and work that we put into this task. As with any research topic, the current state is only an intermediate state and there are still more improvements to make and more topics for our machine translation team to tackle.
Therefore, actively following and practicing research even in an industry setting, is a key pillar of our team’s work and allows us to improve our machine translation continuously.