This is the second post in a two-part guest blog series by ProZ.com professional trainer and conference speaker Federico Gaspari. The first post in this series can be found here: “Machine translation: Cause or solution of all evils?”
One is unlikely to make many friends among translators talking about machine translation (MT) – unless the conversation is restricted to deriding its stupid mistakes and emphasizing its uselessness. A related topic that is possibly even less popular than MT among translators is post-editing (PE), also because it’s less easy to come up with funny stories of hopeless mistakes. Let’s face it: while pretty much everybody with at least a modest knowledge of two languages can be amused by the sarcastic appreciation of what is lost in (machine) translation, deriving pleasure from blunders occurring when post-editing MT output is a rather more subtle activity, whose enjoyment requires much more effort. This post discusses some issues concerning MT, translation quality and PE, focusing on some current trends in the translation industry of interest to professional translators.
Translators, MT and PE
Surprising though it may seem, there are dozens of threads on MT in ProZ.com’s technical forums, and one finds a mixture of (mildly) positive and (extremely) negative opinions, depending on the experiences of the community members who have posted their views. One of these forum threads, entitled “What’s your opinion on machine translation and quality?” has attracted one of the largest numbers of replies (more than 130) and views (over 16,000) of all the threads in ProZ.com’s technical forums. This incredibly popular thread is particularly close to my heart, because Daniela Zambrini initiated the discussion to announce an invited talk on MT and PE that I was due to give a few weeks later at the ProZ.com 2014 International Conference which she organised in Pisa, Italy.
I’m under no illusion that I was responsible for the amazing popularity of the thread: in fact, Daniela’s well-intentioned post attracted replies which mostly ranged from outraged to exasperated, so much so that I was having second thoughts about whether I should actually go to the conference and give my presentation on MT and PE. Making many new translator friends had not been a consideration in accepting to give a talk at the conference a few months before (I already have quite a few of them, and we normally avoid discussing MT and PE…); but as the event was getting closer, I didn’t fancy the prospect of facing a particularly hostile and aggressive audience of angry professionals. As it turned out, my 45-minute talk at the conference in Pisa was rather well-received (in fairness, I smoothed over some of the contentious points that were likely to get on my listeners’ nerves…), and it was followed by a very civilised and interesting Q&A session at the end.
I even enjoyed some one-to-one conversations with translators who had listened to my talk and approached me during the rest of the conference: on the whole, they were genuinely curious about MT and PE, and I appreciated their honest questions and comments on these inevitably sensitive topics. In addition to a general curiosity to understand how MT works, several delegates at the well-attended ProZ.com 2014 International Conference in Pisa showed a keen interest in learning more about PE. As part of these conversations, some translators reported that they had been approached by LSPs and agencies as well as by direct end clients with requests for quotes for PE. As a result, these professionals were considering whether they should start offering PE services in addition to “standard” translation jobs, but they had no idea of the skills required and of the rates that they should charge. This blog post gives me the opportunity to discuss some issues related to PE that can be of interest to a wider audience of professional translators who are at least open to the prospect of securing PE jobs.
Post-editing MT output is different from translating and revising
At the risk of stating the obvious, it should be made clear that PE is very different from translating and revising translations done by (junior) human translators. The main reason for this is that MT systems make mistakes that are very different from those made by professionals, including relatively inexperienced ones. In addition, MT systems come in many shapes and forms: alongside the traditional rule-based approaches, statistical architectures are now particularly popular; these two basic types can be combined to obtain hybrid systems, and some researchers are now experimenting with neural MT, a new paradigm that seems to hold great potential for substantial improvements in output quality. Each of these types of MT systems is more likely to make certain kinds of mistakes rather than others, calling for different PE interventions.
In addition, different resources are required to develop MT systems with these approaches, and their output varies dramatically depending on the amount and quality of the available resources. A related crucial variable is the language pair involved: in principle, some approaches to MT system design are more promising for certain language pairs than others. However, the technological expertise and resources available for MT system development are unevenly distributed: while abundant human and technical resources can be tapped into for some languages (e.g. English and other widely used European languages as well as, increasingly, a few additional major world languages such as Chinese and Arabic), most languages are not well served at all by MT due to the lack of appropriate resources. There are techniques to deal with these shortcomings, but they are not always very effective.
One case in point are the huge sentence-aligned parallel corpora required for the development of statistical MT systems, whether they belong to the phrase-based or to the syntax-based category; while LSPs and freelance translators possess vast translation memory databases containing high-quality translated texts for certain language pairs, the data sets available for many others are far too small to offer the critical mass needed to kick-start the development of effective statistical MT systems. In practice, this means that the quality offered by MT systems (whatever their design) for several language pairs cannot yet be acceptable. This in turn determines whether PE is a reasonable proposition for the language pair under consideration or not. A closely related variable has to do with the text type in question: for some particularly challenging text types (even within the technical and specialised fields, say medical reports and legally-binding rental contracts) it may still be impossible to develop decent MT systems, e.g. due to the lack of relevant training data such as in-domain sentence-aligned parallel corpora in digital format, which can be very difficult to come by for certain language pairs in highly specialised and sensitive technical domains.
Many forms of post-editing
One common, but erroneous, assumption is that there exists only one type of PE; however, this is far from the truth. In fact, various PE levels can be appropriate for different purposes, given specific circumstances: at one extreme, light or minimum PE involves fixing only major errors, e.g. those that make the MT output incomprehensible or misleading (vis-à-vis the input in the source language), whereas stylistic nuances or relatively minor imperfections can be tolerated and do not require any correction – in other words, one is prepared to accept a less-than-perfect final target text, which can be good enough, for instance, for ‘gisting’ or information-gathering purposes; at the opposite extreme, there is complete or maximum PE: in this scenario, on the other hand, every inaccuracy in the raw MT output must be corrected, polishing up all minor details, i.e. the aim of complete PE is to obtain a final target text whose quality is equivalent to that of a professionally translated text. Note that, while professional translation invariably aims at delivering top-quality target texts, (light/minimum) PE can be carried out with the much more modest ambition of providing a final text that is usable in certain circumstances, accepting that it may be (very) far from perfect.
While this division may sound intuitive in theory, applying it in practice is quite complex. First of all, there are many intermediate cases between these two extremes of light/minimum and complete/maximum PE, and one has to determine which level of PE is most appropriate to a specific scenario, depending on the needs and expectations of the translation’s end users. This is a function not only of the time available for the PE job, but also of the initial quality of the raw output that is offered by the available MT system: even obtaining a final post-edited target text of average quality may require extensive PE interventions, if the initial raw MT output is particularly poor – in the end, the effort involved may not be worthwhile, compared to translating everything from scratch. Conversely, there may be cases where the raw output of a particularly effective MT system for a specific language pair in a well-defined textual domain requires only minor PE interventions to be brought to excellent final quality.
In short, the language pair and the text type in question, the design and quality of the MT system, the characteristics of the raw MT output and the intended use(r)s of the final revised target text interact in complex ways to dictate the actual level and effort of PE that are required. But this equation still leaves room for uncertainty from the post-editor’s perspective, as it is quite common for machine-translated texts to display uneven quality: for example, in a 10,000-word translation project, 10% of the raw MT output may be (nearly) perfect with little or no need for improvement, 30% may be impossible to salvage even with extensive PE (i.e. one would be better off re-translating those entire passages from scratch), and the remaining 60% may require different forms of intermediate PE (say, within the same paragraph one preposition must be changed in a sentence, a final ending agreement in another, but a whole dependent clause turns out to be wrongly translated and completely incomprehensible elsewhere). It is easy to see that PE can become a demanding activity, and the effort it requires in terms of skills and time is often difficult to predict and convert into clear rates that can be charged to clients with a transparent pricing scheme.
Factors to be considered when offering post-editing services
Still, with the increasing adoption of MT in professional translation workflows, the demand for PE is rising, so much so that many translators are considering whether they should offer PE services in addition to standard translation jobs. This is more likely, at least in the short term, for in-house translators of large LSPs that have the resources and expertise to develop their own customised MT systems for domains with constant demand from major clients, thus requiring some of their staff to take on PE roles in dedicated projects incorporating MT. But interestingly, some companies specialising in translation technology offer cloud-based “do-it-yourself” or self-service MT solutions that are accessible to freelance translators who are willing to invest in this area: this approach does not require extensive technical skills, because the training and set-up of the MT systems are guided in a step-by-step fashion for users with fee-paying accounts and managed at the back-end by the companies themselves. There are anecdotes of naïve clients looking for easy discounts who generated garbled output with free online MT systems, asking translators to fix the inevitable errors at cheap rates; however, since free web-based MT services are not customised to specific domains, but they are one-size-fits-all systems, this approach is unlikely to be successful: it is rather pointless, if not counter-productive, to carry out PE if the initial quality of the raw MT output is very poor.
Hence, even before considering the possibility of offering professional PE, one must be sure to have at least a decent-quality MT system available. Although it is very difficult to generalise, all else being equal (e.g. the domain and level of technicality of the source text, the quantity of language resources available for system training and development, etc.), MT into English (from, say, German, Russian or Chinese) tends to give better results than the opposite translation directions, i.e. from English into these target languages. As a result, in principle technical and specialised translation projects into English should be good candidates to explore the potential benefits of combining MT and PE. Although techniques for MT quality estimation are improving, it is still very difficult to accurately predict in advance the quality of raw MT output that will be obtained for a specific source text, and especially if this will be viable for subsequent PE. One must try and see whether PE (at the level required to obtain the expected final quality) is faster and more efficient than translating from scratch, e.g. with translation memories in a standard CAT environment. If they are open to this possibility, translators are well placed (more so than their clients) to gauge whether incorporating MT followed by PE in the translation workflow for specific projects can result in time gains and, potentially, in more competitive rates.
Open issues with PE
Some LSPs and freelance translators (including ProZ.com members!) have started to offer PE services, admittedly of the complete/maximum type, where the explicit goal is to deliver a final revised target text of excellent quality. Their pricing schemes vary depending on the language pairs and technical domains involved, and one open issue is whether PE should be charged pro-rata based on the regular translation fee, or by the hour: a quick survey of the online profiles of professionals offering PE services and of relevant discussion forums on ProZ.com shows huge variation in this regard, and there does not seem to be an industry-wide agreed approach yet. One crucial attraction of PE is that, given substantial volumes of MT-friendly technical material, one can in principle speed up turnaround times without sacrificing quality. With CAT tools and translation memory software increasingly integrating optional MT engines to process null matches, the practice of PE as part of technical translation projects is spreading quickly, and it may not always be easy to distinguish it from the editing of low fuzzy matches retrieved from translation memory databases: this in itself suggests that an honest discussion of the potential benefits of PE is timely and may prove in the interest of professional translators, so that they can offer clear and fair rates for their services, without relinquishing their negotiating power to budget-oriented clients.
Translators of today, post-editors of tomorrow?
Many translators are worried about being forced to become post-editors, falling victims of the seemingly unstoppable process that drives down quality and worsens working conditions to save on increasingly casualised professional services while reducing turnaround times. Now that nobody in professional translation would dream of working in technical and specialised domains without CAT tools, MT and PE are arguably the greatest source of anxiety among professionals. But it is important to recognise that a good translator does not necessarily make a good MT post-editor: PE requires quick thinking and the fast adoption of effective error fixes, and a constant monitoring of the trade-off between effort (i.e. time spent on PE interventions) and benefits (i.e. real, noticeable improvements in the final target text). In addition, with the exception of complete/maximum PE (where a perfect final target text must be delivered), post-editors must often settle for less-than-perfect translations, e.g. if quality is not paramount but must be sufficient for information-gathering purposes – this is something that can turn out to be particularly difficult and uncomfortable for translators, who tend to be perfectionists.
Quite understandably, not all translators are inclined to work as post-editors, e.g. because they feel that their professionalism would not be recognised or that they would not perform optimally having to revise MT output of variable quality; just like some translators are more familiar with certain technical domains, but struggle in others, or they may enjoy working on their own on large projects, but hate revising and editing the work of junior colleagues. Whatever your own strengths and weaknesses, opportunities for PE services seem set to grow in the coming years, especially because one can expect an overall improvement of MT quality in an ever expanding range of language pairs and technical domains. If you are looking forward to continuing your happy career as a language professional, it seems wise to at least consider whether you might benefit from also adding PE to your portfolio of translation services. At t he end of the day, investigating this area before your clients come asking for PE services might put you in good stead to discuss the pros and cons of this activity with them, without having to accept unfair rates imposed on you for a job that you hate or, possibly even worse, losing your clients to less scrupulous competitors.
Learn more about the advantages of using machine translation and performing post-editing as a service by attending one of Federico’s live or on-demand ProZ.com training sessions on the subject. The full course list is available here: http://www.proz.com/translator-training/trainers/1315/courses
Federico’s next live session, “Maximize Your Productivity with Effective Machine Translation Post-Editing,” will take place on February 8th at 14:00 GMT. You can reserve your seat in the course by visiting the session page and clicking the “Purchase” button in the top right corner under “Course registration”.
Did you know?
It is now possible to declare post-editing as a service you provide in your ProZ.com profile. This also means that outsourcers can search the directory for language professionals who offer this service. See the announcement: http://www.proz.com/topic/294136