Machine And Text Translation Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

When machine Translation is started at its basic level, Text Translation performs simple substitution of sentences in one natural language for sentences in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed.

Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.

The idea of machine translation was started and its origin happened in 17th century. In 1629, René Descartes illustrated a universal language, with some equivalent ideas in different tongues sharing one symbol. In the 1950s, The Georgetown experiment (1954) which made utilizing fully-automatic translation of over sixty Russian sentences into English. The experiment results came out with a great success and attracted in an era of financial funding for machine-translation research. The authors thought that within three to five years, machine translation would be at the best without any problem.

But real progress was much like a snail walk, however, and after the ALPAC report (1966), which found that the ten-year-long research had not been success and failed to fulfill expectations of researchers like how they thought, financial funding was greatly reduced. Beginning in the late 1980s, computers became as a part of all industries then researchers thought this computational with increased power and less expensiveness will make success as shown in statistical models for machine translation.

The idea of using digital computers in machine translation of natural languages was proposed as early as 1946 by A. D. Booth and possibly others.


The translation process may be stated as:

Decoding the meaning of the source text; and

Re-encoding this meaning in the target language.


Rule-based: A rule based machine translation system in this it consists of collection of some rules which are based on grammar called grammar rules, lexicon and software programs to process the rules. These rules in this translation system are extensible and maintainable. Rule based approach is the first strategy ever developed in the field of machine translation. Rules are framed and written in linguistic knowledge gathered from linguists. Rules are important for any translation and play a major role in various stages of translation: syntactic processing, semantic interpretation, and contextual processing of language.


We can use tree structure for representation of the structure of the sentence. A typical English sentence can be divided in to two major parts: noun phrase (NP) and verb phrase (VP). These two parts also again can be separated as per the structure of the sentence. 'Rewrite rules' are utilized to illustrate what tree structures are allowable for a given sentence. Therefore only the sentences with right structuring can be accessible for correct translation. Following are the rules to represent a simple grammar.

S -> NP VP

VP -> V NP

NP -> Name


Where S stands for sentence, V for verb, N for noun and ART for article. A grammar can derive a sentence if there is a sequence of rules to rewrite the start symbol, S, into a sentence.

Logical form is commonly used in semantic interpretation. For example the sentence, Joe was happy, can be written in logical form as:

(< PAST HAPPY> (NAME j1 "Joe"))

where PAST resembles for past tense. Semantic interpretation should be a compositional process from which interpretations are built with incrementally by the interpretations of subphrases. Lexicon plays a major role in semantic interpretation. Grammar rules are used to compute the logical form of the given sentence. Consider the grammar rule given below.

(S SEM (?semvp ?semnp)) -> (NP SEM ?semnp) (VP SEM ?semvp)

where SEM resembles a semantic feature. This rule says that a sentence consists of noun phrase and verb phrase.


Translation within a rule based machine translation system can be done by pattern matching of the rules. The better success of results can be obtained by avoiding the pattern matching of unnecessary rules. Knowledge of rules and reasoning them are actually used for language understanding. In a common human being speaking way or general world knowledge is required in solving of interpretation problems such as disambiguation. Context specific knowledge will be used for explaining the referent of noun phrases and disambiguating or confusing word senses based on what makes sense for the present situation. A knowledge based rules system presentation consists of knowledge base and inference techniques. Inference techniques can be applied for obtaining inference rules to derive the new sentences from the knowledge base.


Anaphora is the linguistic phenomenon of pointing back to a previously mentioned item in the text. The pointing back word or phrase is called anaphor and the entity to which it refers or for which it stands is its antecedent. For example, Ratna is not yet here but she is expected to arrive in the next one hour. 'Ratna' is considered to be an anaphor and 'she' is said to be the antecedent. When the anaphor refers to an antecedent and when both have the same referent in the real world, they are termed coreferential. Coreference is the act of referring to the same referent in the real world. The procedure of describing the antecedent of an anaphor is called anaphora resolution. And also the rules which are applied for resolution are called resolution rules. These rules are very well dependent on different sources of knowledge. The interpretation of anaphora is the key factor for the successful operation of a machine translation system.


The benefits of rule based machine translation system usage are that it can deeply categorize at syntax and semantic levels. And while using this approach we have some drawbacks such as we require a huge linguistic knowledge and very large number of rules to cover all the features of a language.

Statistical: Information theory is the base for getting an idea about statistical machine translation. A document can be translated according to the probability distribution say p(e | f) in which the string e in the target language (for example, English) is the translation of a string f in the source language (for example, French). The problem of modelling the probability distribution p(e | f) has been approached in a number of ways. Bayes theorem can be used to solve this probability function where the translation model p(f | e) is the probability that the source string is the translation of the target string, and the language model p(e) is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two sub-problems. Finding the best translation  is done by picking up the one that gives the highest probability:

For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in speech recognition.

As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. Language models are typically approximated by smoothed n-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages.

Better use of resources

There is a great deal of natural language in machine-readable format.

Generally, SMT systems are not tailored to any specific pair of languages.

Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages.

Example-based : The basic idea of Example-Based Machine Translation (EBMT) is to reuse examples of already existing translations as the basis for new translation. The process of EBMT is broken down into three stages:





The matching stage in example based machine translation finds examples that are going to contribute to the translation on the basis of their similarity with the input. The way matching stage should be implemented is based on how the examples are stored. In old systems, examples were stored as annotated tree structures and the constituents in the two languages were connected by explicit links. The input to be matched is parsed using the grammar that was used to build the example database and the tree is formed. This tree is compared with trees in the example database.

The input and examples can be matched by comparing character by character. This process is called sequence comparison. Alignment and recombination will be difficult if this approach is used. Examples may be annotated with Parts-Of-Speech tags. Several simple examples may be combined into a more general single example containing variables. The examples should be analyzed to see if they are suitable for further processing. Overlapping or contradictory examples should be properly dealt with.


Alignment is used to identify which parts of the corresponding translation are to be reused. Alignment is done by using bilingual dictionary or comparing with other examples. The process of alignment in example based machine translation must be automated.


Recombination is the final phase in example based machine translation approach. Recombination makes sure that the reusable parts in example identified during alignement are putting together in a legitimate way. It takes source language sentences and a set of translation patters as inputs and produces target language sentences as outputs. The design of recombination strategy depends on previous matching and alignment phases.

Hybrid MT: Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies. Several MT companies (Asia Online, LinguaSys, Systran, UPV) are claiming to have a hybrid approach using both rules and statistics. The approaches differ in a number of ways:

Rules post-processed by statistics: Translations are performed using a rules based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine.

Statistics guided by rules: Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating


MT technology will improve over the next few years as new MT engines are developed and released. Also the number of languages offered will increase. This trend is already happening as we have seen with the recent developments from Google and the emergence of "open source" technologies such as Moses. Because we have designed Translation to be "agnostic" to the machine translation engine used, Translation will be at the forefront of these developments. As mentioned above in approaches section translation's current technology relies on a "rules based" approach to translation (RBMT), whereas the newer technologies use an example based approached - broadly known as statistical machine translation (SMT).Whilst SMT generally provides a better "gist" translation than RBMT, SMT engines can only be improved by training them with large quantities of human created "translation memory" specific to the "domain" of the translation. Most individuals and businesses simply don't have access to these linguistic resources.

Predictions on the Future of the Translation Industry

However, RBMT can easily be trained as all that is required is access to mono lingual data in order to build a dictionary. Once a dictionary has been created for an organization, RBMT will usually be much more accurate than "gist" or untrained SMT.

So the near future predictions in Machine Translation can be expected as follows:

a) SMT or hybrid systems will continue to emerge from companies like Google and Asia online. Over time these will be the systems that will largely be used by individuals to understand the vast amounts of content in other languages on the internet.

b) RBMT systems (with dictionaries and translation memory) will continue to be used by most businesses. Individuals will also continue to use RBMT systems (with dictionaries) for day to day communication.

c) Global enterprises with access to large amounts of translation memory will adopt and customize SMT systems for specific purposes such as the translation of "knowledge bases."

Advanced future:

Machine translation software could also make today's Internet search engines seem like relics from the distant past. "We're only a few years away from Internet search engines that can return high-quality results translated from nearly every language around the globe," says Daniel Marcu, founder of Language Weaver. Eventually, software will be able not only to understand spoken language but also to act upon it.

companies consider their translation memories valuable intellectual property and would be unlikely to share them.

"If Cisco has to go to the trouble of translating the gigabit router instructions to Mandarin Chinese, that's not going to be easy," agrees analyst Eric Schmitt of Forrester Research. "It's going to be expensive. Cisco doesn't want to go to the trouble and then have Alcatel and Juniper come along and get the same benefit."

Still, while these challenges remain great, they may not be computer translation's largest stumbling block, says David Parmenter of Basis Technology in Cambridge, MA, a firm that assists companies in moving their business worldwide.

2009 CorconText introduces a FinalCopy, of Japanese-to-English documentation translation program which uses AI-based semantic networks to reduce the need for human editing of output.

2012 Saruzuno is a organization that embeds its Lexical Disambiguation System (LDS) into smartcards which are equipped with membrane microphones so travelers can converse with store clerks in dozens of languages.

2017 The Russian-made Durok II is a language tutor which is used to train customs-and-immigrations bots (DNA-based servant-devices) employed at US points of entry.

2020 Teaching a child reading and writing is a waste of time," declares Yeo Kiah Wei, Singapore's minister of education, who cancels the subjects in schools. "Children needn't be burdened with such an onerous task as deciphering tiny markings on a page or screen. Leave it to the machines."

2045 Telepathy system developed by Europeans. Users wear adhesive patches containing thought recognition and MT technology, plus a high-speed wireless transceiver.


Most of the people say about their experience in using machine translation (MT) for translating text from source language into another language most of them use either Google ( to translate a foreign-language web page, or a service like Babel Fish ( for translating a block of text from one language. Using MT is becoming more popular and an exponentially increasing common experience: today there are over three billion pages which can be translated by machine from one language to another on the Internet every month. This is likely to increase as the number of people with English as their first language as a proportion on internet users continues to drop. But if you have used one of these services, you'll know that the results are often far from perfect. These services generally convey the meaning of the message, but are rarely fluent and you certainly wouldn't want to use them to translate an important document from your language into another! So if there's so much of it being done, why is the translation quality poor?

The reality is that neither Google nor Babel Fish are good examples of what MT can really achieve. There is nothing wrong with the underlying translation technology, but these implementations are simply doing a sentence by sentence translation without any additional processing.

There are a number of ways in which the quality of an automatic machine translation can be dramatically improved today. These include:

• Use of Dictionaries

• Identification of words and phrases that should not be translated.

• Improving the translatability of the source language text.

• Not translating already translated text.

• Handling of unrecognized words.

Will Human Translators Become Redundant?

With all of this technological development, it has been suggested that one day we won't need human translators. We are confident that there will always be a role for the human translator; because humans are extremely adept at understanding context and extracting meaning from sentences. Machines will continue to struggle with this aspect of translation for some time yet.

While machine translations may achieve much improved translation accuracy, this may still never be good enough for some purposes such as translating legal documents and marketing literature. They will certainly never be good enough for works of literature!

Translation will continue to require human intervention - although the development of improved machine translations such as is provide by "trained" RBMT systems and customised SMT systems - will remove much of the "grunt" work.

The will mean that skilled translators will focus on post-editing and fine tuning the final translated work, rather than translating from scratch. This doesn't devalue the work of translators, what it means is that the amount of human intervention to achieve a perfect translation will be reduced, so it is feasible, both because of reduced time and reduced cost, to translate many more items than are translated today. This actually increases the amount of opportunity for translators.


As the quality of machine translation gets more accurate and closer to that of human translation, both the time to perform translations and the costs of doing so will fall significantly. This doesn't mean that the days of the human translator are numbered, what it means is that things that were never considered as candidates for translation in the past, such as emails and instant messages, can and will be capable of being automatically translated.

What it also means is that the types of work that translators do will change.

Today, email, and many previously untranslated documents, while not perfectly translated, can be delivered in the reader's native language and are immediately comprehensible and largely context-accurate. In many instances this will be good enough and will deliver significant business and cost benefits. So the improvement in quality of automated translations can add substantially to the quality and value of both business and personal communications. Translation is at the forefront of this revolution in language communications.