Category Archives: Uncategorized

2010 Translations - New Year Resolution

0
Filed under Uncategorized

Hot deal welcomes 2010 - 10% Discount on any translation order.
Use the coupon code "Tomedes10" !


We, the professional translators, transcreators, transcribers, interpreters, voice talents, linguistics and language fanatics at Tomedes would like to ask you all to take the last day of 2009 to make your new year resolution for 2010.

Promise yourself to make a change in the upcoming year - an internal change and a change in your surroundings. Let’s all make positive changes and together make the world a bit better.

We will do our best to eliminate any language barrier in 2010 but will also make our best to make the world a bit better by keeping it clean - clean relationships, clean energy & clean business!

In this opportunity, we would like to thank all of our translators for an amazing 2009 and to invite them to keep building with us a different and a better professional translation service. We would also like to our great customers for believing and trusting us for delivering them the best translations. We promise to keep delivering the best translations in 2010 as well.

Nexus One Apps

1
Filed under Uncategorized

Hot deal welcomes 2010 - 10% Discount on any translation order.
Use the coupon code "Tomedes10" !


The buzz around the new or should I say the first Google phone, “Nexus One” is increasing rapidly. Will Google revolutionize the mobile industry as they did to the online industry or is it another Google product that will stay on lab? Time will tell…

As you probably know, we are into the translation business here in Tomedes and as a professional translator I believe that the recent translation tools developed by Google have a tremendous contribution to eliminating language barriers. They have been implemented quite effectively all around the web including on Google Wave and having the option to translate any web page to any language.

But think about the potential mobile translation apps, application that may be and probably will be implemented in the Nexus One.
1. A voice translation application which will make it possible to speak any language with anyone because a translation middle-ware will eliminate the language barrier.
2. A SMS translation application which will make it possible to send SMS in any language making sure the recipient gets it in his own language.

We, in Tomedes, believe in a strong and healthy relationship between Machine translation like the one provided by Google and human translation like the one provided on http://www.Tomedes.com .

Providing a human translation service on the Google phone might be real killer application.
Think about a tourist carrying a Google phone having an option to find the nearest interpreter and contact him for an instant service. Providing a real time translation by professional translators worldwide of any text sent via the Nexus One might also succeed.

This was not a Nexus One review and we do not have any idea of the exact Nexus One release date. These were just some thoughts of the inevitable mixture of the Google phone and translation.

Automatic language translator for Windows Mobile

0
Filed under Translation Tools, Uncategorized

Hot deal welcomes 2010 - 10% Discount on any translation order.
Use the coupon code "Tomedes10" !


I have news for all you windows mobile users who thought that I was focusing a lot on iPhone applications. Its about an excellent new translation application that has been released for Windows Mobile 6.x touch screen phones.

The WCI Translator can produce translations for all language pairs that can be formed between English, French, German and Spanish. Users can enter free form sentences and expect good quality translation as the application analyzes the sentence structure and then translates according to the rules of each language. One can translate up to 250 characters at one go.

The application has other nifty features like the ability to save/recall translated text, copy/paste functions, options for setting preferred languages and font sizes. WCI translator supports a vast and comprehensive vocabulary and is probably the only mobile translator that does not rely upon an Internet connection for translating complete sentences. It requires about 25 MB of storage space which is hardly a lot for today’s phones that come with several GBs of memory.

Measuring machine translation quality

1
Filed under Uncategorized

Hot deal welcomes 2010 - 10% Discount on any translation order.
Use the coupon code "Tomedes10" !


The importance of Measuring Translation Quality – BLEU

Mr. Kirti Vashee, VP of Enterprise Translation Sales at Asia Online
writes on this fascinating guest post about the assessment of translation quality.

Anybody who has tried to measure translation quality will understand the difficulty of doing this in a way that has any general credibility. Developers of statistical machine translation systems, in particular have to grapple with this issue on a constant basis to understand how to evolve the state of the technology. MT developers are constantly trying new techniques to improve the technology, and need quick feedback on whether a particular strategy is working or not.

The question of quality is a difficult question to answer, because there is no entirely objective way to measure the quality/accuracy of automated translation software, or of any translation for that matter, that is widely accepted. The localization industry has struggled for years to establish some kind of objective measure for human translation quality and has yet to really succeed on this. Competent and objective humans are usually the surest measure of quality but as we all know, objectivity and real rigor is hard to define.

In Statistical Machine Translation (SMT), it is necessary to use some form of standardized, objective and relatively rapid means of assessing quality as part of the system development process in the technology. The oddly named BLEU – (BiLingual Evaluation Understudy) is an approach developed by IBM that is widely used in the general MT arena, and especially actively used by developers in the SMT community. http://domino.watson.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/5c651a88cb24938185256acb0055e548?OpenDocument&Highlight=0,BLEU

Why Quality is Difficult to Measure: What is a BLEU score?

Measuring translation quality is difficult because there is not an absolute way to measure how “correct” a translation is. Many “correct” answers are possible, and there can be as many “correct” answers as there are translators. The most common way to measure quality is to compare the output of automated translation to a human translation of the same document. The problem is that one human translator will translate the document significantly differently than another human translator. This inconsistency in the human reference translations leads to problems when using these human references to measure the quality of an automated translation solution. A document translated by an automated software solution may have 60% of the words overlap with one translator’s translation, and only 40% with the other translator’s translation; even though both human reference translations can be technically correct, the one with the 60% overlap with machine translation provides a higher “quality” score for the automated translation than the other translator’s translation did. Therefore, although humans are the true test of correctness, they do not provide an entirely objective and consistent measurement for quality.

The BLEU metric scores a translation on a scale of 0 to 1. The closer to 1, the more overlap there is with a human reference translation and thus the better the system is. In a nutshell, the BLEU metric measures how many words overlap, giving higher scores to sequential words. For example, a string of four words in the translation that match the human reference translation (in the same order) will have a positive impact on the BLEU score and is weighted more heavily (and scored higher) than a one or two word match. It is very unlikely that you woul dever score 1 as that would mean that the compared output is exactly the same as the reference output.

BLEU

  • The scoring algorithms punish you (brevity penalty) for unnecessarily repeating high frequency words like “the”.
  • Studies have shown that there is a high correlation between BLEU and human judgments of quality when properly used.
  • BLEU scores are often stated on a scale of 1 to 100 to simplify communication but should not be confused with percentage of accuracy.
  • Even two competent human translations of the exact same material may only score in the 0.6 or 0.7 if they use different vocabulary and phrasing.

To conduct a BLEU measurement the following data is necessary:

  1. One or more human reference translations. (In the case of SMT, this should be data, that has NOT been used in building the system as training data and ideally should be unknown to the SMT system developer. It is generally recommended that 1,000 or more sentences be used to get a meaningful measurement.)
  2. Automated translation output of the exact same source data set.
  3. A measurement utility like Language Studio LiteTM that performs the comparison and calculation for you. http://www.asiaonline.net/ToolsAndDownloads.aspx#SoftwareDownloads

As would be expected using multiple human reference tests will always result in higher scores as the SMT output has more human variations to match against. The NIST (National Institute of Standards & Technology) uses BLEU as an approximate measure of quality in its annual MT competitions with four human reference sets to ensure that some variance in human translation are captured, and thus allow more accurate quality evaluations of the MT solutions being evaluated. Thus, when companies claim they have the “best” MT system, all they are really saying is that they got the highest BLEU score on a single reference set comparison. The same system could do quite poorly with a different Test Set, so this information should be used with some care.

machine-translation-2

What is BLEU useful for?

SMT systems are built by “training” a computer with examples of human translations. As more human translation data is added, systems should generally get better in quality. Asia Online provides a development environment that allows users to develop and make many adjustments in developing an SMT translation system. Often, new data can be added with beneficial results but sometimes this new data can cause a negative effect. Thus, to measure the progress made in the development process, the system developers need to be able to measure the quality quickly and regularly to make sure they are improving the system and are in fact making progress.

Competent and dispassionate human judgment is always the best gauge of a systems translation quality. However, users and developers need immediate and rapid feedback on development strategies, so using human translators for every test is not an efficient solution. The SMT system developers will experiment with many different approaches and data combinations to find one that will produce the best results.

During the development process, an automatic test is necessary to quickly see the impact of a development strategy. This utility will help to measure BLEU and in time other measures that will provide quick feedback on development strategies and the current quality of an SMT system. BLEU allows developers a way “to monitor the effect of daily changes to their systems in order to weed out bad ideas from good ideas.

When used to evaluate the relative merit of different system building strategies, BLEU can be quite effective as it provides very quick feedback and this enables SMT developers to quickly refine and improve translation systems they are building and continue to improve quality on a long term basis.

Asia Online provides a table that is periodically updated showing the BLEU scores of 506 different language combinations, http://www.asiaonline.net/translation.aspx . The table is shown below, where the first column is the Source Language code and the first row is the Target Language code. This is useful, since for the most part the same amount/quality/type of core data has been used to build the all the SMT systems shown in the table and the test sets used to measure the quality are basically comparable. As you can see, the darker green combinations produce the best systems (given the same amount of data). The table also shows that English to Romance Languages and Romance to Romance Language combination produce the best quality systems, other things being equal.

machine-translation-31

What is BLEU not useful for?

BLEU scores are always very directly related to a specific “test set” and a specific language pair. Thus, BLEU should not be used as an absolute measure of translation quality because the BLEU score can vary even for one language depending on the test and subject domain. In most cases comparing BLEU scores across different languages is meaningless unless very strict protocols have been followed.

Because of this, Asia Online always uses human translators to measure fluency and verify the accuracy of the systems. Also, most industry leaders will always vet the BLEU score readings with human assessments before production use.

In competitive comparisons it is important to carry out the comparison tests in an unbiased, scientific manner to get a true view of where you stand against competitive alternatives. Thus it is important to use the exact same test set AND the same BLEU measurement tool. The Test Set should be unknown to all the systems that are involved in the measurement. As the basic calculations used in determining the final BLEU can also vary, it is important to use the same tool when measuring several different systems.

BLEU score comparisons between two systems presented by some companies can be misleading because:

  1. companies may use different test sets and one may be simpler than the other
  2. different BLEU measurement tools are used
  3. if more human references are used to calculate the BLEU score, the scores will be higher (i.e., scoring one system with 4 human reference translations will increase the number of overlapping words versus a score calculated with 1 human reference translation)

Because of this, Asia Online recommends

  1. uses blind (that is previously unseen by the system developers) test sets to generate the BLEU scores
  2. use the same BLEU measurement tool
  3. adjust and normalize the scores so that a translation scored using 4 human reference translations is not compared to a translation with only one human reference translation.

If you are looking at BLEU scores that compare two different translation systems, you should always understand how the results were generated. Comparing systems that were tested on different test sets will be somewhat meaningless and could lead to very misleading and erroneous conclusions.

Problems with BLEU

There are several criticisms of BLEU that should also be understood if you are to use the metric effectively. BLEU only measures direct word-by-word similarity, and looks to match and measure the extent to which word clusters in two documents are identical. Accurate translations that use different words may score poorly since there is no match in the human reference. There is no understanding of paraphrases and synonyms so scores can be somewhat misleading in terms of overall accuracy. Also, nonsensical language that contains the right phrases in the wrong order can score high. E.g.

“Wander” doesn’t get partial credit for “stroll,” nor “sofa” for “couch.”

“Appeared calm when he was taken to the American plane, which will to Miami, Florida” would get the very same score as: “was being led to the calm as he was would take carry him seemed quite when taken”

These problems are further discussed in this article: http://www.theregister.co.uk/2007/05/15/google_translation/page2.html

This link is an academic critique of the BLEU that clearly points out many of the shortcomings of the metric: http://www.iccs.inf.ed.ac.uk/~miles/papers/eacl06.pdf

Having pointed out all these shortcomings the BLEU metric is still a very useful tool for practitioners engaged in the difficult task of creating automated translation systems that are continually improving. Careful and informed use of BLEU can drive the development and evolution of systems and allow researchers to test out many different hypotheses to determine if they are favorably affecting the performance of a translation engines output.

What are best-practices in using BLEU?

  • BLEU is best used as a way to evaluate development strategies and most useful to developers engaged in the SMT system building process.
  • Take care to develop a comprehensive and “blind” set of test data to measure your systems of (500 - 1000+ sentences) that cover the domain of interest.
  • Remember that a system developed to translate software knowledge base material is unlikely to do well on a test set with sentences that are common in general political news. So keep your test set focused on your business purpose.
  • Use BLEU measurements frequently when adding new data to your system to understand if it is beneficial or not.
  • When measuring competitive systems ensure that you are using:
    The same test set
    The same measurement tool
  • Remember that BLEU is not useful as an absolute measure for quality as it only focuses on matching word clusters in two similar documents.

What other measures are available in addition to BLEU?

In addition to the BLEU score there are also other measures that may be useful to developers of SMT engines. These include the Meteor test, F-Scores and Edit Measures (how many changes made to a set of data to get it to human quality). The Language Studio LiteTM tool provides F-Scores & WER at this time and will add more quality metrics in future.

Other largely human judgment based approaches there are also used to measure translation quality. Some that are worth mentioning include:

  • SAE-J2450, a primarily human judgment based approach that was originally used in the automobile industry.
  • There is also the ASTM F2575 A new translation quality assurance standard published by ASTM International (an ANSI-accredited standards body) which defines quality as: The degree to which the characteristics of a translation fulfill the requirements of the agreed upon specifications.
  • The EuroCen115038 is a new standard that defines translation services, outlines the competencies a translator and reviser must have, and describes quality controls.

While these different approaches are very useful for different purposes, they are not useful to a developer of an SMT engine as they cannot provide the rapid feedback necessary to guide new experimentation and continued evolution of the translation system.

Is translation quality the most important measure?

While translation quality is a very key driver of the use of automated translation technology, it is important to ask this fundamental question. We understand that it is unlikely that we are going to see automated translation that is completely equivalent to competent human translation in the near or even distant future. The core objective of the Asia Online platform is to enable large amount of high value content to be converted to other languages in as close to human quality as possible, as rapidly as possible. We are already seeing that our technology is dramatically enhancing translation related productivity and can enhance and improve the productivity of enterprise translation efforts even in its imperfect state.

Thus, we believe that a much more important question is: Is the system good enough to boost the overall productivity of the translation process? Or Is the automatically translated output good enough to be useful to a potential user? So the focus then shifts away from linguistic quality exclusively to the impact the technology has on the work process for completing very large translation projects or end-user usability. There is now evidence that suggests that customized Asia Online translation systems can boost the productivity of a translation operation anywhere from 25% to 350%. Companies like Microsoft are servicing the technical support needs of hundreds of millions of customers with raw SMT output. This productivity boost can come in terms of throughput and speed and/or in terms of reduced overall costs to complete projects. These translation systems will also enable organizations to start translating material that would never get translated if the technology were not available. The use of this technology extends the reach of translated material to new areas. More information can be shared with global customers and the dialog with the global customer can be expanded and intensified. It is reasonable to presume that this can enhance all international business initiatives in general. I believe that human driven automated translation technology is a pivotal technology that will increasingly be seen as a key requirement to facilitate business initiatives in international markets.

As we are still at a phase, where SMT technology is sometimes seen as a threat by human translators, it is important to take care in measuring productivity and diffusing the negative mentality that some translators may bring to an evaluation. Several LSP studies now validate that MT can improve productivity, reduce costs and accelerate throughput. Asia Online is committed to man-machine collaboration and we see that our best systems are clearly a product of an intensive and structured human feedback loop that enables these systems to continually improve. We expect that this will drive the expansion of the market rather than a reduction of rates and revenues for all the current players in the professional translation market.

machine-translation-4