In how small chunks can I chop up a text for piece by piece translation?

Question

I'm creating a piece language learning software, which will include a component that lets the user read texts with a translation on the side. The texts will be chopped up into segments, and have the feature that, if you hold your mouse over a certain segment, the corresponding segment in the translation is highlighted, so that you can easily keep track of where in the text you are.

The question is, how small can I make these chunks, before translation between different languages would be unnatural, stilted or impossible?

The safest thing to do would be to simply cut the text in pieces at full stops. This however, would lead to certain really long sentences, in which the reader easily could lose track of where she is, thus making the program a lot less useful.

The question then is, could I cut the sentences into smaller portions than complete sentences, and still expect every part to be translatable on its own into various languages? If so, where would I make the cut?

For example: "He was a poor man who certainly would not be able to afford this car"

Is it reasonable to expect that I here could cut this sentence in two, between man and who? Can I expect all languages to have the structure necessary to cut this sentence into two logical pieces like this?

I mean, I'm thinking that, since for example in Spanish, the subject of a phrase is usually omitted (as it is expressed by the conjugation of the verb), I could not segment my sentences into phrase constituents without running into trouble if translating my program into Spanish.

As there is a possibility that my language will be translated into languages whose structure I'm completely unaware of, I need to chope the sentences up in a way so that I'm sure not to get any unpleasant surprises during translation.

So what I'm wondering is how I can chop up my sentences without avoiding trouble when trying to translate the individual parts into different languages.

I hope I have phrased my issue clearly. Any insight on this matter is much appreciated!

Are you going to match up a text with a translation text? Or will the translation be generated on the fly? — jlawler, Sep 19 '14 at 23:46
Note that sentence boundaries won't always match up between languages. What's most naturally written with one sentence in Japanese might be better written as two in English, etc. — , Sep 20 '14 at 03:24

score 2 · Accepted Answer · answered Sep 20 '14 at 02:09

The question is clear, and I think I can provide a simple answer: Sentences! The smallest chunks the application can acknowledge will be sentences. If the application tries to go smaller than sentences, say to clauses, it will crash when confronted with languages that have syntactic structures that are quite different from European languages.

Let's take the example provided in the question.

 He was a poor man who certainly would not be able to afford this car.

In many languages that have a lot of head-final structures, e.g. Chinese and Japanese, relative clauses precede the nouns that the they modify. Thus when this sentence is translated into these languages, the translation should produce a much different order of elements in the clause. Using the English words, an ordering of elements closer to the following would result:

 He was a poor [certainly would not be able to afford this car] man.

The relative clause appears in brackets, and this clause precedes man, the noun that it modifies. Thus one could not split the translation of the original sentence between man and who and have the translation be coherent for Chinese or Japanese.

Now if the translation chooses to remain within a given language family, e.g. Germanic languages, then the chunks could be smaller, probably at the clause level. But going smaller than clauses would be difficult. Consider the following two examples, an English sentence and its German equivalent:

 I think that she will try to help.

 Ich denke, dass sie zu helfen vesuchen wird. 
 I   think  that she to  help    try    will

The order of the verbs in the German sentence (help-try-will) is the opposite of what it is in the English counterpart (will-try-help). What this means that an attempt to go smaller than clauses down to phrases, in this case to verb phrases, is going to encounter significant difficulties due to the quite different ordering of elements across these two closely related languages.

To be frank, I think the endeavor is facing an impossible challenge. The goal is unattainable. What would be possible, however, is to acknowledge just certain chunks of information that do translate easily from one language to the next. In other words, if the endeavor jettisoned the goal of providing comprehensive chunk by chunk translations and accepted the fact that there are only certain chunks that translate neatly, one could provide the translations for these neatly packaged chunks without having to deal with all the other chunks that do not translate neatly.

Your last suggestion seems reasonable but I think it will face a challenge of identification. Because few chunks translate neatly in all possible contexts. So you not only have to identify the easy to translate chunks but also the appropriate contexts - and if you could do that, you could probably do better translation int he first place. For example, 'She asked me a question', 'She asked me to explain.' and 'She asked me about my health' would all require three different translations in Czech which Google translate with all its resources completely botches up. — Dominik Lukes, Sep 20 '14 at 09:42
Yes, I agree. My answer was trying to end on a positive note, and in so doing, it was unrealistic. — Tim Osborne, Sep 20 '14 at 13:09
I think the key to making such software work well would be allowing pieces of text in one sentence to be mapped to pieces of text in the other which are not in the same order and might not even be contiguous. For example, if "I did not see him" were translated into French as as "Je ne l'ai pas vu.", pointing at not in English should highlight both ne and pas in French, and pointing at him in English should highlight the l' in French. — supercat, May 05 '23 at 18:08
Sometimes portions of text in one language would simply have no counterpart in the other. For example, in "Où va-t-il?" (Where is he going?), the -t- is a phonetic insertion to separate the vowels in "va" and "il", but serves no grammatical or semantic function. — supercat, May 05 '23 at 18:12
Incidentally, another thing that may be useful would be to use three or four columns, showing a "natural" way of writing something in the original language, a way that could be translated literally, a literal translation, and a natural way of writing the same thing in the other language. So "He must go" might be shown as "It is necessary that he go", and "Il faut qu'il va". — supercat, May 05 '23 at 18:17

In how small chunks can I chop up a text for piece by piece translation?

1 Answers1