30

I translated "How to use Web?" into some other language using machine translation, but after I translated the result back into English, it is not the same as the first text I put into the translator.

Why did that happen?

new Q Open Wid
  • 395
  • 7
  • 25

3 Answers3

56

There is no one-to-one correspondence between languages and their vocabularies. This means it is impossible for a computer translator to be invertible. The translator's task going from language A to B is fundamentally different from going from language B to A.

To understand this, consider the French word "allumette", which in English is "match", that is, the short piece of wood that can be used to start a fire. A computer translator could easily translate "allumette" into the English word "match". There really is no other way to translate it (though I may be unaware of some idiomatic uses that should be translated another way). But it's not so easy in the other direction.

That's because the English word "match" has many meanings - it also can mean a soccer match ("le match" in French) or it can be a verb, which may be translated as "associer", "assortir", or other verbs. So whereas the translation from French "allumette" to English gave only one choice to the translator - "match", the translation of English "match" to French gives the translators many choices. And they may not get it right.

For example, "I lit the match" translates into French "J'ai allumé le match". Here, the incorrect word "match" (e.g. soccer match) was chosen instead of "allumette".

Even if the translator were perfect, it couldn't even deal with this perfectly - in the sentence "I saw the match", does match refer to a sports contest or something to start a fire? Both could be correct, and without context even a good human translator could only guess.

user298438
  • 53
  • 15
gaeguri
  • 1,485
  • 12
  • 17
  • 3
    While that's accurate, the question is specifically about Google Translate. I think a description of the neural networks behind it would be more relevant... – WavesWashSands Jan 09 '18 at 08:48
  • 44
    @WavesWashSands Why would that be more relevant? OP saw something they didn't understand, and wants to know why it happens. This would happen regardless of which technologies Google Translate used, because it's a fundamental issue in translation. It's not something that happens because they use neural nets. – Chris H Jan 09 '18 at 13:03
  • 8
    Google Translate can get the right translation of some words by the context: "Ela comeu uma manga" translates corretly to "She ate a mango" and "A camisa tem mangas vermelhas." to "The shirt has red sleeves". Also seems (I don't know French) that "I will lit a match to start the fire." translates correctly. Maybe their neural nets try to find the right translation of words with multiple meanings by checking context (like using "allumette" if there's "fire", "burn" or "stove" near to "match"). – Gustavo Rodrigues Jan 09 '18 at 14:25
  • 2
    @GustavoRodrigues as far as I know there is the option to correct the translation, which is how it 'learns' whether by markox chains, neural nets or otherwise. At some point it translated "A camisa tem mangas vermelhas." to "The shirt has mango sleeves", and someone corrected it enough that it stuck. – AncientSwordRage Jan 09 '18 at 17:48
  • 8
    It seems to be a common belief that people correcting bad results in Google Translation is a major source of improvement, but (apart from maybe a small subset) it's not likely. (1) Anonymous people on the net are really unreliable and make mistakes all the time. (2) How are you going to generalize? Google obviously doesn't have a gigantic table of every sentences and how people corrected them. If the underlying system can tell the contexts for "mangoes" and "red" apart, it's already pretty close to being correct without users' help. – jick Jan 09 '18 at 19:05
  • Expanding on (1), the internet is also full of trolls. If the correction system "works" with any kind of reliability, it's only a matter of time before people start to "correct" $(Country of their choice) to "land of pigs". – jick Jan 09 '18 at 19:07
  • 10
    @zixuan, there appears to be a total and complete linguistic breakdown in your comment. Which is ironic. I have no idea what you meant and no one else can do more than guess, either. – Wildcard Jan 10 '18 at 06:14
  • 3
    There could also be a component of structural ambiguity in the sentence you want to translate. For instance, if you take the sentence I saw a woman on the beach using binoculars out of context, you’ve got four different meanings in English already: either I or the woman can have the binoculars, either I or she can be on the beach. You can probably get a pretty decent automated translation of this sentence, and back, across all Indo-European languages, but as you move towards more distant languages it’ll be harder and harder for the translator to put all the pieces back into the right place. – betelgeuse Jun 01 '18 at 01:15
  • Sometimes they can mean the same way, like "The odds of getting 100% in the test will be close to 1", and I translate it into some other language and then I translate it back into English, I will get "The probabilities of getting 100% in the test will be close to 1". That's because the words "odds" and "probabilities" mean the same thing. In fact, if you translate the both sentences to another language, the translation will be actually the same. I don't care what translation is Google Translate going to give us and which language it's going to be in, but the result should be the same. – new Q Open Wid Jan 15 '19 at 23:38
  • 1
    @jick For your "(2) How are you going to generalize"—that's exactly what deep learning is for: generalizing from examples. While such learning systems aren't perfect, or identical to human generalization, they are definitely capable of generalizing in many cases where it would be very difficult to express as a rule or algorithm. The gigantic table that you're doubting is possible really is there in the network—implicitly, and very lossily, but there nonetheless. – abarnert Jan 17 '19 at 09:19
  • 2
    @jick Even for (1), Google does gather lots of other data from anonymous people on the internet; working out how to make that data useful enough despite all the obvious problems is, to a first approximation, how Google makes money. That's why they've spent huge sums buying companies like reCAPTCHA and Waze, to improve Google Books (and, later, help train self-driving car engines) and Google Maps by inputting massive amounts of anonymous crowd-sourced input. – abarnert Jan 17 '19 at 09:24
  • 1
    @jick And as for trolls: Trolls are constantly trying to google-bomb the search engine that way. Every few months they manage to come up with one funny result (Like "Did you mean cheese-eating surrender monkeys?" showing up for "French" for half a day), but that hasn't come close to breaking search. – abarnert Jan 17 '19 at 09:26
2

There is no one-to-one correspondence between languages and their vocabularies. It is fundamentally different from translating A to B and translating B to A in any machine translation software.

For example, when I translate "What is the nearby station?" into Chinese with Google Translate, Google Translate translates this sentence to "什么是附近的车站?" and when I translate back, Google Translate translates "什么是附近的车站?" to "What is a nearby station?" instead of "What is the nearby station?".

That's because the word "是" has many different meanings in Chinese. It could mean "a (一个 in Chinese)" or "the (该 in Chinese)", so both meanings are ok.

The wrong word when I translate back to English is "a". The correct word should be "the" and that's because the word "是" has many different meanings in Chinese, based on the second sentence of this answer.

The correct translation should be "什么是该附近的车站?", and this would translate to "What is the nearby station?". Google Translate translates "What is the nearby station?" to "什么是附近的车站?" instead of "什么是该附近的车站?". It didn't clarify the word "是" by adding "该" to the front of "是". You need to clarify the word "是" and this is because this word in Chinese has many different meanings, based on my second sentence.

By adding "该" to the front of "是", this would clarify the sentence "什么是附近的车站?", because the word "是" has many different meanings in Chinese, based on my fourth sentence at the end and my second sentence. Google Translate doesn't add "该" to the front of "是", and this is why it is not the same as the first time you translate it. If Google Translate adds "该" to the front of "是", it would be same as the first time you translate it.

curiousdannii
  • 6,193
  • 5
  • 26
  • 48
user298438
  • 53
  • 15
1

If we just restrict considering to the vocabulary, the cause of the discrepancy between the source and the source translated to a target then bak to the source is from two three things:

  • words have many meanings.
  • even if context could choose the correct meaning out of many, enough context isn't always specified in the source.

From the first problem, it's easy to see that on the translation from source to target, there are many possibilities, but then from those many possibilities back to the source language, there are that many more possibilities increasing the likelihood of missing the original. The second problem prevent limiting the possibilities to one in each translation.

If we include now add syntax to the context, it may reduce the possibilities, but in some sense a syntactic pattern (like an implication or tense/aspect) in one language doesn't always translate exactly to that in another, so it is almost like a polysemic vacabulry item itself.

Mitch
  • 4,455
  • 24
  • 44