Starting with thoughts of how it would be nice to have a back and forth translator when writing an email to someone who speaks a different language…
You would normally write the email in your language, then crank it through a web translator to get the text for the other person.
Well, shouldn’t the translator show the text translated back from the target language to yours so you can check what the other guy will be reading?
Oddly enough, I didn’t find anything on the web to do that as you type. Weird.
But, then it gets interesting.
The guys developing translation software could use instances of people’s starting text and final text when people process their writings through a back-and-forth-erizer. Figure a person starts by saying what he wants to say in his language. Then, as he modifies what he writes so the translation is better, he’s effectively spelling out a way to translate from his language to his language. He’s showing you a meaning thesaurus, not just simple word substitutions.
(Gunning) Fog index: Wouldn’t the difflib comparison score between the original and the back-translated text be in some way consistent with fog indices? The translation software builders probably use something like this to evaluate their software. I would.
Certain pairs of languages will translate back and forth better than other pairs. What does that mean?
- The translation software is better for those two lingoes?
- The cultures/people are closer?
- What else?
Over time, what happens? Can the changes in the pair distances be used as a metric of how the world is becoming a global village? Can such changes be used in any way to understand cultural differences? Can translation software improvements be normalized out of the pair distances over time?
Presumably, the translation software guys are monitoring the pair distances between languages so that there are no instances of translations being better going through an intermediate language rather than going direct from one language to another. If such were ever the case then the thing to do would be to train the direct translator using the longer route translations. Doing such a process iteratively sounds like a pretty good way to bring a new language in to the system. All the new language needs is a corpus of translations from it to one other language. Of course, this wouldn’t be a binary thing. The more effective pair-corpus’s would be able to bootstrap the less effective links, generally.
What are the implications of a world where people write using a language back-and-forth de-fogger? Does the writing end up bureaucratic? No personality. No sharp meaning. Vanilla.
Should textbooks be run through such a de-fogger? Should speeches? Especially in the education field, it seems important to get things across clearly.
Could using these back-and-forth techniques be used to build a new language? A better language? Could they be used to build a creole language?
If a language translation system built a creole language that’s close to an existing one, does that imply that the translation system understands the ingredient languages like a human?
Given net-available text, how much CPU does it take to build an effective language translation system?
Could back-and-forth translations be used to help translate old text in to modern language? That is, keep modifying the old text until you get the best back-and-forth for your modified text. It would be interesting to automate this whole process. Proof reader, editor, re-writer system.
Good URL, but probably going away in December (returns JSON translation of the ‘q’ string):