tonybaldwin | blog

non compos mentis

Posts Tagged ‘language

This technology can make the language barrier is gone

with 4 comments

Just for grins…

Engrish Mastars

English Mastary made simple....

First, let me state, for the millionth time, that I ❤ GOOGLE!

I use tonso google stuff…google search, gmail, google calendar (lifesaver!), google reader, google code, google groups, google plumbing, you name it…Google’s got it, I’m using it.  So, I’m not doing this to pick on Google.  Even so, a guy has to protect his own interests, no?  So, in the interest of demonstrating precisely why even the great Google will not supplant professional, human translators, I took yesterday’s NYTimes article on Google Translate, and ran it through Google Translate.  First, I translated it to French, then to Spanish, then back to English.

Now, I have to confess, the result is not unintelligible.  Most readers will be able to make some coherent sense of most of the resulting text.  Nonetheless, there  will be confusion (and laughter).  Now, imagine, if you will, the potential confusion, and quite possibly rather dire consequences were this method of translation used for, say, the instructions on your medication, international treaties, safety regulations, medical device instruction manuals, and a whole smathering of other complex textual materials of important significance.

There’s going to be confusion

That, folks, is why I still have a job.

And now, for your reading pleasure, the resultant text:

MOUNTAIN VIEW, Calif. – In a meeting with Google in 2004, the discussion focused on an e-mail the company had received from a fan in South Korea. Sergey Brin, one of the founders of Google, ran the message through an automatic translation service that the company had a license.

The message says that Google is a search engine of your choice, but the result is as follows: “The footwear of sliced raw fish you want. Google the green onion!”

Mr. Brin said Google should be able to do better. Six years later, its free Google Translate supports 52 languages, more than any other similar system, and use hundreds of millions of times a week to translate web pages and other texts.

“What you see on Google Translate is the state of the art in computer translation is not limited to a particular area,” said Alon Lavie, research associate professor in the Language Technologies Institute at Carnegie Mellon University.

Google’s efforts to expand beyond Web search has been uneven. Your digital book project, was hanged in the courtyard, and the introduction of its social network, Buzz, has raised fears of intimacy. The model suggests that this can sometimes stumble when it comes to challenge the traditions and conventions of cultural enterprise.

However, Google’s rapid growth to higher levels of translation is a reminder of what can happen when Google releases its power of brute force calculation of complex problems.

The network of data centers built to search the web, now, when united, the biggest team in the world. Google uses this machine to push the limits of translation technology. Last month, for example, said he was working to combine your translation tool with image analysis, allowing a person, for example, taking a photo of a German phone menu and get the machine translation into English.

“Machine translation is one of the best examples that demonstrates the vision of Google, said Tim O’Reilly, founder and CEO of tech publisher O’Reilly Media.” This is not something that someone no one takes seriously. However, Google understands something about the data that nobody understands and is willing to make the investments needed to address these types of complex problems ahead of the market. “

Creating a machine translation has been considered one of the toughest challenges in artificial intelligence. For decades, scientists tried using a team approach standards – teaching language regime of both languages and dictionaries give necessary.

But in half of the 1990s, researchers began to promote a statistical approach. They found that if they feed thousands or millions of computers and their human translations generated parts, you can learn to make assumptions about the exact form to translate new texts.

It turns out that this technique, which requires huge amounts of data and lots of computing power, Google has increased.

“Our infrastructure is well suited to this” Vic Gundotra, Google engineering vice president, said. “We can not adopt approaches that others can only dream.

Machine translation systems are far from perfect, and even Google’s human translators will not work soon. Experts say it is extremely difficult for a team to break a sentence into two parts, and then bring them back.

But the Google service is good enough to convey the essence of a news article, and became a source for quick translations for millions of people. “If you need a rough-and-ready translation is the place to go,” said Philip Resnik, an expert in machine translation and associate professor of linguistics at the University of Maryland, College Park.

Like its competitors in the field, including Microsoft and IBM, Google has promoted its translation engine transcripts of the United Nations, which are translated by the man in six languages, and the European Parliament, which resulted in 23 . This material is used to form systems most commonly used languages.

However, Google has traveled the Web text, and data from their project to digitize books and other sources to go beyond these languages. For more obscure languages, published a guide to help users with translations, then add the text in its database.

Offer Google could make a big hole in the translation business sale software companies like IBM, but machine translation is not likely to be a great Moneymaker, at least not by the standards of advertising google. But Google’s efforts could bear fruit in several ways.

Because the ads are online everywhere, while making it easier for people to use the Web to benefit society. And the system could have interesting applications. Last week, the company said that using speech recognition to generate English language subtitles for videos from YouTube, which could then be translated into 50 languages.

This technology can make the language barrier is gone,” said Franz Och, Google’s chief scientist who heads the team of the automatic translation company. This would allow anyone to communicate with anyone else. “

Mr. Och, a German researcher who previously worked at the University of Southern California, said he was reluctant to join Google, fearing that it would be the translation as a side project. Larry Page, Google’s other founder, called to reassure him.

“I just said is something that is very important to Google,” he recalled recently by Mr. Och. Mr. Och signed in 2004 and quickly was able to bring the promise of Mr. Page in the test.

While many translation systems such as using Google for one billion words of text to create a model of a language, Google has gone much more: hundreds of billions of few words in English. “The models are getting better the process rather than text,” said Och.

The effort was worth it. A year later, Google has won a competition run by the government that proof of sophisticated translation systems.

Google has used a similar approach – computing power, mounds of data and statistics – to address other complex issues. In 2007, for example, began offering 800-GOOG-411, directory assistance calls free interpretation of spoken. It has allowed Google to get the votes of millions of people who do better in the English speech recognition.

A year later, Google launched a search for the voice system that was as good as the other companies that have taken years to build.

And last year, Google launched a service called glasses, which analyzes the image of the phone, which is an online database of more than one billion images, including pictures of her taken to the streets Street View service.

Mr. Och has acknowledged that the Google translation still needs improvement, but he said he feels better quickly. “The curve of the current quality improvement is still very strong,” he said.

This article was translated by Google, the English, then French, Spanish, then back to English.

TRANSLATORS domain of man!*



*this phrase was “Human Translators Rule!! prior to the above treatment)

Just for fun, I ran that article through Simplied Chinese, then Czech, then back to English, again.

here is that result

Written by tonybaldwin

March 10, 2010 at 10:42 am

The Neural Advantage of Speaking 2 Languages: Scientific American

leave a comment »

The Neural Advantage of Speaking 2 Languages: Scientific American.

The ability to speak a second language isn’t the only thing that distinguishes bilingual people from their monolingual counterparts—their brains work differently, too. Research has shown, for instance, that children who know two languages more easily solve problems that involve misleading cues. A new study published in Psychological Science reveals that knowledge of a second language—even one learned in adolescence—affects how people read in their native tongue. The findings suggest that after learning a second language, people never look at words the same way again.

Written by tonybaldwin

February 19, 2010 at 9:37 am

Posted in language, news

Tagged with ,

FREE software alternatives for translators

with one comment

I’ve been using FREE Software exclusively for about 10 years now, including Debian Gnu/Linux as my operating system, and all FREE/Open Source tools, for all of my work.
In addition to OmegaT, I have also, now, been using Anaphraseus.

Both are FREE/Open Source Software, CAT (computer aided translation) tools, and fully cross-platform. OmegaT handles a far broader range of file formats (all MSOffice files, converted to ODF formats, the new docx, xlsx, pptx files, without conversion, Trados ttx files, html, xml, xliff, po,, other text formats and software localization formats, etc., etc.), and, to date, just feels more efficient for me. There are advantages in how it handles translation memories, as well (can use an entire directory full of tmx files for reference, and still generates a project specific TM for each project, whereas Anaphraseus only works with one TM). Anaphraseus, however, functions similar to the popular CAT tool, Wordfast®, only as an extension to (which is also Free/OSS), as opposed to M$Office®. As such, it will generate “unclean” .doc files, which various clients want, for use with both Trados and Wordfast.
Both OmegaT and Anaphraseus, thus, serve different purposes for me. I use OmegaT for more frequently, but when I need Anaphraseus’ functions, I am quite happy to use it, and regard both programs as fantastic tools.
I served as the localization coordinator for the OmegaT project for about a year and a half. It felt really good to be part of the project and contribute, even in this small way, to its continued success.
So, I decided that I’d like to contribute to Anaphraseus, as well. Now, I don’t write java, which is why my contribution to OmegaT was in the auspices of localization coordinator. Anaphraseus is written as an extension to, in StarBasic. I don’t even know if I’ve ever heard of StarBasic before this project, so, clearly I can’t contribute to the code. But the project had no updated documentation, so, I took the rather outdated, and now quite erroneous user and installation manual, and updated it: Anaphraseus Manual.

For years I used FOSS and gnu/linux without making any more of a contribution than some advocacy for FOSS in schools (while teaching in public schools, I formeraly administered a site for said advocacy at, but no longer own this domain), so it feels good to be able to contribute in more hands-on ways. Some day I’m going to be a real hacker, and actually contribute code, but, until then, helping with localization (part of my industry, anyway), and writing documentation are good…Ways to get my feet wet, anyway.

In any case, if you are a translator, I highly recommend both, OmegaT and Anaphraseus. They’re great tools, run in Mac, Windows or Gnu/Linux, and, above all, you sure can’t beat the price of both, the software and support ($0.00)!

For more information on FREE/Open Source Software in the translation industry, see: foss
Linux for Translators article: foss for translators.

(originally posted at nabble news)

Written by tonybaldwin

February 18, 2010 at 12:22 pm


leave a comment »

I completed two translation jobs today.
One was rather brief, some Spanish documents referring to the cancellation of a lease.
The other was an article from Brasil regarding an assessment instrument for social skills in preschool children, to be published in a European academic journal on psychology.

Now, I have a 37 page project due on Friday, Spanish materials, legal in nature.

Business is picking up, it seems, but I’m afraid to have faith that we’re back in business as before.
It’s been so sporadic for the last 5 months.

Written by tonybaldwin

January 10, 2009 at 11:58 pm

Posted in language

Tagged with ,

> The customer had complained about the quality. (MY RESPONSE)

leave a comment »

(working around the clock for a large agency, doubling up on projects from their London and California offices.
The London PM writes me, attaching “corrected” documents, with the client’s feedback)


Project manager (Big Company, London Office) wrote:
> The customer had complained about the quality.

They’re also morons.
I don’t have much time to play this game right now, because I’m busy.
First they wanted 17,000 words in 4 days, and now they want an additional 8000 in less than two days…All the while I am also cranking out similar volume for Fulana PM at same BIGCOMPANY in CA, USA.
I’m translating the documents as quickly as I can, and firing them off to you, admittedly, without a solid proofreading.
That’s what happens with rush jobs.
All the same, after a quick perusal of the changes on the first few pages of the first document you returned to me, I must say, they are simply wrong.
For instance, look at this:
That is how they “corrected” the word “established”, which I had spelled correctly, in the first place.
In another instance, they changed “bank cards” to “band cards” in reference to payment methods.
What, in God’s name, is a “band card”?
Here, we call them “bank cards”, which is what I had written in the first place. I have several in my wallet.
“Deposited to an account” or “in an account”…same thing, and, I would venture to state that “to an account” is more correct.
I deposit money, in the bank, TO my account.
No need for correction there.
In another instance, they replaced the €/Euro sign with a $/dollar sign.
The original quotes all financial figures in €/Euros. Hello.

Some of their corrections aren’t necessarily “incorrect”, but are unnecessary, as the original is not incorrect, either, and their stylistic alterations, if anything, make the text less efficient and needlessly verbose.

I haven’t seen a single “correction” here that is valid.

I’ve seen this game, before. They’re making arbitrary, and mostly INCORRECT “corrections” and whining, in some hope of getting a discount.
Well…I’ve been working non-stop on these documents.
I worked in until 2am last night, and I’m up at 7am, and back in the office working, and will continue through the holiday, July 4th, weekend, as I have another large project, also from BigCompany, due Monday,
and, I’m not granting any discounts.

I am a highly skilled, highly educated and trained linguist, and, above all, an expert in the English language.
Your BIGCompany knows that. You’ve been sending me work for over 3 years now, and never once has there been a complaint.
So have many, many other clients, including such well-reputed agencies as TransPerfect, and BTS/Bowne Global Desktop, both of whom are known to test potential providers without mercy.
I work efficiently. I provide top quality translations at unheard of rates (Another client called yesterday, simply dumbfounded, and wanted to know how I could afford to give them such excellent work at such low rates). I go beyond the call of duty, and even take screenshots and cut out the images from these miserably copied PDF files, and manipulate them if need be [with the GIMP, of course], and insert them to the target file, so that the translation appears PRECISELY like the original (how many of your providers do that? I don’t know anybody who does that…)
Apparently, this client knows not with whom it is they wish to play.
I can support my original text for every supposed “correction” I’ve seen here, so far.


Incidentally, if I hear another complaint, I will stop work on this project immediately, and they can go jump in boiling lake of fetid dog excrement.
I have better things to do with my time than give up the holiday for whining morons attempting to fish out discounts. I can’t afford to work for less, and I refuse as a matter of principle, anyway, especially when working around the clock over the holiday, which I should be spending eating grilled hotdogs and drinking beer and creating a general nuisance with small, colorful explosives with the rest of my family.
(I’d say, “and friends”, but I’ve lost all my friends from working around the clock like this for three years…they all think I’ve become a vampire, or something).
If the client wants cheap translations, tell them to hire a sweat shop in China for $0.02/word and see what kind of quality they receive.

I understand, Ms. Project Mgr, that it is the client, not you, making a complaint. I enjoy working with you.

Now, if you’ll excuse me, I have work to do.


Written by tonybaldwin

July 3, 2008 at 8:28 am

TransProCalc 0.8

leave a comment »

TransProCalc 0.8

New release of TransProCalc, free translation project management software…

I had not intended to release at this juncture, but, a user wrote to me and pointed out a minor error in my script (was cd-ing to /usr/loca/bin instead of /usr/local/bin to chmod to a+x for all the scripts…doh).
So, I fixed the, but also released the new tabbed/notebook arrangement.
The code has been cleaned up a bit, and has better commenting, too, if for no other reason than to make it easier for myself to find stuff while hacking away at it.

I’ve been working on writing in the math to round the figures, as per ‘s suggestion, but, have not completed that, so rounding is not part of the new release.

I’m hoping to have that done by the end of this month, and will be releasing again.
I’m also going to add a fourth report, a report that includes information from all three windows.
As it is, I usually make on by copy/pasting parts of the three existing reports together, so, I figure
it would be useful to others, too, to have this one report.
I’m also thinking about adding an invoice generator, although, I imagine that most folks
already have their own method for generating invoices.
I could even add export of the invoice to pdf format.
What do you think of that idea?
I would like to write in some user configuration…you know, saving the user’s pertinent information for the purposes
of generating said invoices, and, also, the ability to save information for clients, and, thus, a fourth tab in the notebook for
a client info input dialog, where one can enter, then save information for each client, such as address info, etc.,
to sort of build a client database for transprocalc.

My intention was to work on some of these items this week, in fact, and not to post a new release until these
items were completed, but, since the install script needed attention, I figured I’d at least amend that little issue,
and, since the tabbed/notebook feature was already completed and in use here, I figured I’d release it as such.
So, there you have it: TransProCalc 0.8 is up.
When I get these other features written in , I’ll be releasing 1.0

Anyone want to come on board and help code some of this stuff in?

Any ideas or feedback, too, are always welcome and appreciated.

Written by tonybaldwin

May 11, 2008 at 9:31 pm

Here kitty, kitty…tcl, tcl!

leave a comment »

Tickle TMX
Tickle TMX
Could Tony actually be crAzy enough, this very early in his programming career, to try and create a CAT (computer aided translation) program?

Well, not exactly, because I haven’t the knowledge to talk to a db full of translation memories, but,
I am trying my hand at developing a means of joining legacy translations to build a tmx or xliff file from them.

We’ll see what happens after I get that far….

Here are the beginnings of that project (img above).

Written by tonybaldwin

April 19, 2008 at 2:15 pm