tonybaldwin | blog

non compos mentis

Posts Tagged ‘google

search google, wikipedia, reverso from the bash terminal

with one comment

searching in bash

searching in bash

Okay, so, I like to use my bash terminal. Call me a geek all you like; it matters not to me. I wear that badge with pride.

The bash terminal is quick and efficient for doing a lot of stuff that one might otherwise use some bloated, cpu sucking, eye-candied, gui monstrosity to do. So, when I find ways to use it for more stuff, more stuff I do with it.

Now, for my work (recall, I am professionally a translator) I must often do research, some of which entails heavy lifting, and, otherwise, often simply searching for word definitions and translations. I use TclDict, which I wrote, frequently, but, I also use a lot of online resources that I never programmed TclDict to access, and would generally use a browser for that stuff. Unless, of course, I can do it my terminal!

For precisely such purposes, here are a couple of handy scripts I use while working.

First, let’s look up terms at Dict.org:

#!/bin/bash
# default db=all

if [[ $(echo $*) ]]; then

searchterm="$*"
else

read -p "Enter your search term: " searchterm
fi
read -p "choose database (enter 'list' to list all options, leave blank for first match): " db

if [[ $db = list ]] ; then
curl dict://dict.org/show:db

read -p "choose database, again: " db
fi

curl dict://dict.org/d:$searchterm:$db | less

Now, let’s search google from the command line:

#!/bin/bash
if [[ $(echo $*) ]]; then
searchterm="$*"
else
read -p "Enter your search term: " searchterm
fi
lynx -accept_all_cookies http://www.google.com/search?q=$searchterm
# I accept all cookies to go direct to search results without having to approve each cookie.
# you can disable that, of course.

I saved that in ~/bin/goose # for GOOgle SEarch
and just do
goose $searchterm
Or, search the google dictionary to translate a term:

#!/bin/bash
echo -e "Search google dictionary.\n"
read -p "Source language (two letters): " slang
read -p "Target language (two letters): " tlang
read -p "Search term: " sterm
lynx -dump "http://www.google.com/dictionary?langpair=$slang|$tlang&q=$sterm" | less

Note: For a monolingual search, just use the same language for source and target. Don’t leave either blank.

Or:

#!/bin/bash
if [ ! $3 ];
then
echo -e "usage requires 3 parameters: source language, target language, search term. \n
Thus, I have this as ~/bin/googdict, and do \n
googdict en es cows \n
to translate "cows" to Spanish. \n
For monolingual search, enter the language twice. \n
As indicated, use the two letter code: \n
\"en\" for English, \"fr\" for French, etc."
exit
fi
lynx -dump "http://www.google.com/dictionary?langpair=$1|$2&q=$3" | less

For the above, I have it in ~/bin/gd, usage being simply “gd $sourcelanguage $targetlanguage $searchterm”.
Example:
me@machine:~$ gd en es cow
Searches the Englist to Spanish dictionary for “cow”.

We can use similar principles to search reverso:

#!/bin/bash
#search reverso

if [ ! $1 ];
then
read -p "Enter the source language: " slang
read -p "Enter target language: " tlang
read -p "Enter your search term: " searchterm
lynx -dump dictionary.reverso.net/$slang-$tlang/$searchterm | less

else

lynx -dump dictionary.reverso.net/$1-$2/$3 | less
fi

With the google dictionary, you use the two-letter language code (i.e., “en” for English, “fr” for French, etc.). With reverso, you have to spell out the language (“english” for English, etc.).

With all of the above, I’ve used the program, less, to display the results, rather than spitting it all out to to the terminal at once. Click here to learn how to use less, if needed.

Additionally, most of the above require Lynx Browser, which is generally available for any gnu/linux distribution via your favorite package manager (apt, synaptic, aptitude, yum, portage, pacman, etc.). For the dict.org script, I used cURL (also part of most gnu/linux distributions and installable with your favorite package manager).

Google Translate can also be accessed, but for this, we’ll use a bit of python magic (I know, I pick on google translate, a lot, but it can be useful):

#!/usr/bin/env python
from urllib2 import urlopen
from urllib import urlencode
import sys

# The google translate API can be found here:
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples

lang1=sys.argv[1]
lang2=sys.argv[2]
langpair='%s|%s'%(lang1,lang2)
text=' '.join(sys.argv[3:])
base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
params=urlencode( (('v',1.0),
('q',text),
('langpair',langpair),) )
url=base_url+params
content=urlopen(url).read()
start_idx=content.find('"translatedText":"')+18
translation=content[start_idx:]
end_idx=translation.find('"}, "')
translation=translation[:end_idx]
print translation

Originally found that here, on the ubuntuforums.

And now for Wikipedia we have a couple of options.
First, we have this awesome little handy script, tucked into my $PATH as “define”:

#!/bin/bash
dig +short txt $1.wp.dg.cx
exit

I use it simply with “define $searchterm”, and it gives a short definition from wikipedia.  I originally found it here.

Another extremely handy tool is Wikipedia2Text, which I simply installed from the debian repos via aptitude. When I use this, I also pipe it to less:

#!/bin/bash
if [[ $(echo $*) ]]; then

searchterm="$*"
else

read -p "Enter your search term: " searchterm
fi

wikipedia2text $searchterm | less

I have that tucked into ~/bin/wikit, thus, do simply

wikit $searchterm
to get my results.

Enjoy!

All code here that I have written is free and released according to the GPL v. 3. Check the links for code I borrowed for licensing information (pretty sure it’s all GPL-ed, too).

./tony

Written by tonybaldwin

April 25, 2011 at 7:44 pm

exorcising bad translations

leave a comment »

This, my friends, is why Professional Translators are still a necessity.

Il Foglio, an Italian newspaper, has come out critizing the NY Times, who (OMGSTFUBBQ…can’t believe they did this!) used a computer generated translation of an article regarding the Vatican’s response to sexual abuse complaints.

The failure to translate led the American newspaper to argue that Cardinal Joseph Ratzinger was protecting a sexually abusive priest from Milwaukee.

The article, titled “New York Times does not translate,” starts by saying, “New York Times columnist Maureen Dowd returned to attack the Pope. Commenting on the words of exorcist Gabriele Amorth, who said that behind pedophile priests is the devil, Dowd suggested a way for the Catholic church to solve the problem: hire a ‘sexorcist.'” 1

Learn from this, kiddies.
When the text is important, neither Google Translate, nor Yahoo! BabelFish is truly your friend.

Go to, Proz.com and find a real, professional translator.
Of course, if your text requires translation from any of French, Portuguese or Spanish to American English, I’ve got you covered, right here.

tony


posted with Xpostulate

Written by tonybaldwin

April 13, 2010 at 8:20 pm

exorcising bad translations

leave a comment »

This, my friends, is why Professional Translators are still a necessity.

Il Foglio, an Italian newspaper, has come out critizing the NY Times, who (OMGSTFUBBQ…can’t believe they did this!) used a computer generated translation of an article regarding the Vatican’s response to sexual abuse complaints.

The failure to translate led the American newspaper to argue that Cardinal Joseph Ratzinger was protecting a sexually abusive priest from Milwaukee.

The article, titled “New York Times does not translate,” starts by saying, “New York Times columnist Maureen Dowd returned to attack the Pope. Commenting on the words of exorcist Gabriele Amorth, who said that behind pedophile priests is the devil, Dowd suggested a way for the Catholic church to solve the problem: hire a ‘sexorcist.'” 1

Learn from this, kiddies.
When the text is important, neither Google Translate, nor Yahoo! BabelFish is truly your friend.

Go to, Proz.com and find a real, professional translator.
Of course, if your text requires translation from any of French, Portuguese or Spanish to American English, I’ve got you covered, right here.

tony


posted with Xpostulate

Written by tonybaldwin

April 13, 2010 at 1:20 pm

Linux Inside! 50 place you didn't know Linux was running.

leave a comment »

I found this article interesting.

Among gnu/linux users listed are included:

  • various US and foreign government agencies, including the French Parliament, Cuba, Spain, the US Postal Service, US Dept. of Defense and Navy, etc.
  • Many large companies (you knew about IBm, Dell and Google, of course, but how about Burlington Coat Factory, Amazon.com, Omaha Steaks, and Virgin Airlines?)
  • a myriad school systems

Likely, you are using services running on gnu/linux, somewhere, whether you knew it or not!


posted with Xpostulate

Written by tonybaldwin

April 12, 2010 at 7:26 am

This technology can make the language barrier is gone

with 4 comments

Just for grins…

Engrish Mastars

English Mastary made simple....

First, let me state, for the millionth time, that I ❤ GOOGLE!

I use tonso google stuff…google search, gmail, google calendar (lifesaver!), google reader, google code, google groups, google plumbing, you name it…Google’s got it, I’m using it.  So, I’m not doing this to pick on Google.  Even so, a guy has to protect his own interests, no?  So, in the interest of demonstrating precisely why even the great Google will not supplant professional, human translators, I took yesterday’s NYTimes article on Google Translate, and ran it through Google Translate.  First, I translated it to French, then to Spanish, then back to English.

Now, I have to confess, the result is not unintelligible.  Most readers will be able to make some coherent sense of most of the resulting text.  Nonetheless, there  will be confusion (and laughter).  Now, imagine, if you will, the potential confusion, and quite possibly rather dire consequences were this method of translation used for, say, the instructions on your medication, international treaties, safety regulations, medical device instruction manuals, and a whole smathering of other complex textual materials of important significance.

There’s going to be confusion

That, folks, is why I still have a job.

And now, for your reading pleasure, the resultant text:


MOUNTAIN VIEW, Calif. – In a meeting with Google in 2004, the discussion focused on an e-mail the company had received from a fan in South Korea. Sergey Brin, one of the founders of Google, ran the message through an automatic translation service that the company had a license.

The message says that Google is a search engine of your choice, but the result is as follows: “The footwear of sliced raw fish you want. Google the green onion!”

Mr. Brin said Google should be able to do better. Six years later, its free Google Translate supports 52 languages, more than any other similar system, and use hundreds of millions of times a week to translate web pages and other texts.

“What you see on Google Translate is the state of the art in computer translation is not limited to a particular area,” said Alon Lavie, research associate professor in the Language Technologies Institute at Carnegie Mellon University.

Google’s efforts to expand beyond Web search has been uneven. Your digital book project, was hanged in the courtyard, and the introduction of its social network, Buzz, has raised fears of intimacy. The model suggests that this can sometimes stumble when it comes to challenge the traditions and conventions of cultural enterprise.

However, Google’s rapid growth to higher levels of translation is a reminder of what can happen when Google releases its power of brute force calculation of complex problems.

The network of data centers built to search the web, now, when united, the biggest team in the world. Google uses this machine to push the limits of translation technology. Last month, for example, said he was working to combine your translation tool with image analysis, allowing a person, for example, taking a photo of a German phone menu and get the machine translation into English.

“Machine translation is one of the best examples that demonstrates the vision of Google, said Tim O’Reilly, founder and CEO of tech publisher O’Reilly Media.” This is not something that someone no one takes seriously. However, Google understands something about the data that nobody understands and is willing to make the investments needed to address these types of complex problems ahead of the market. “

Creating a machine translation has been considered one of the toughest challenges in artificial intelligence. For decades, scientists tried using a team approach standards – teaching language regime of both languages and dictionaries give necessary.

But in half of the 1990s, researchers began to promote a statistical approach. They found that if they feed thousands or millions of computers and their human translations generated parts, you can learn to make assumptions about the exact form to translate new texts.

It turns out that this technique, which requires huge amounts of data and lots of computing power, Google has increased.

“Our infrastructure is well suited to this” Vic Gundotra, Google engineering vice president, said. “We can not adopt approaches that others can only dream.

Machine translation systems are far from perfect, and even Google’s human translators will not work soon. Experts say it is extremely difficult for a team to break a sentence into two parts, and then bring them back.

But the Google service is good enough to convey the essence of a news article, and became a source for quick translations for millions of people. “If you need a rough-and-ready translation is the place to go,” said Philip Resnik, an expert in machine translation and associate professor of linguistics at the University of Maryland, College Park.

Like its competitors in the field, including Microsoft and IBM, Google has promoted its translation engine transcripts of the United Nations, which are translated by the man in six languages, and the European Parliament, which resulted in 23 . This material is used to form systems most commonly used languages.

However, Google has traveled the Web text, and data from their project to digitize books and other sources to go beyond these languages. For more obscure languages, published a guide to help users with translations, then add the text in its database.

Offer Google could make a big hole in the translation business sale software companies like IBM, but machine translation is not likely to be a great Moneymaker, at least not by the standards of advertising google. But Google’s efforts could bear fruit in several ways.

Because the ads are online everywhere, while making it easier for people to use the Web to benefit society. And the system could have interesting applications. Last week, the company said that using speech recognition to generate English language subtitles for videos from YouTube, which could then be translated into 50 languages.

This technology can make the language barrier is gone,” said Franz Och, Google’s chief scientist who heads the team of the automatic translation company. This would allow anyone to communicate with anyone else. “

Mr. Och, a German researcher who previously worked at the University of Southern California, said he was reluctant to join Google, fearing that it would be the translation as a side project. Larry Page, Google’s other founder, called to reassure him.

“I just said is something that is very important to Google,” he recalled recently by Mr. Och. Mr. Och signed in 2004 and quickly was able to bring the promise of Mr. Page in the test.

While many translation systems such as using Google for one billion words of text to create a model of a language, Google has gone much more: hundreds of billions of few words in English. “The models are getting better the process rather than text,” said Och.

The effort was worth it. A year later, Google has won a competition run by the government that proof of sophisticated translation systems.

Google has used a similar approach – computing power, mounds of data and statistics – to address other complex issues. In 2007, for example, began offering 800-GOOG-411, directory assistance calls free interpretation of spoken. It has allowed Google to get the votes of millions of people who do better in the English speech recognition.

A year later, Google launched a search for the voice system that was as good as the other companies that have taken years to build.

And last year, Google launched a service called glasses, which analyzes the image of the phone, which is an online database of more than one billion images, including pictures of her taken to the streets Street View service.

Mr. Och has acknowledged that the Google translation still needs improvement, but he said he feels better quickly. “The curve of the current quality improvement is still very strong,” he said.

http://www.nytimes.com/2010/03/09/technology/09translate.html

This article was translated by Google, the English, then French, Spanish, then back to English.

TRANSLATORS domain of man!*

🙂


Tony

*this phrase was “Human Translators Rule!! prior to the above treatment)

Just for fun, I ran that article through Simplied Chinese, then Czech, then back to English, again.

here is that result

Written by tonybaldwin

March 10, 2010 at 10:42 am

Machine Translations, Google, and my job…

with 2 comments

I just thought I’d share this, quickly: Google’s Computing Power Refines Translation Tool

Google’s efforts to expand beyond searching the Web have met with mixed success. Its digital books project has been hung up in court, and the introduction of its social network, Buzz, raised privacy fears. The pattern suggests that it can sometimes misstep when it tries to challenge business traditions and cultural conventions.

But Google’s quick rise to the top echelons of the translation business is a reminder of what can happen when Google unleashes its brute-force computing power on complex problems.


Being both, a computer technology geek, and, a professional HUMAN translator, of course, I have mixed feelings about MT or Machine Translation. Personally, I don’t think MT will ever replace humans. Ever. Language is just too complex.
The internet is riddled with humorous examples of bad machine translation. Just take a look at Engrish.com, or, here’s a lovely example right here: Lost in Translation, Seriously.
Funny stuff.

Computers, or course, are a very powerful and useful tool in translation, of course. I would never deny that. Computer technology has brought about a great many changes in the translation industry over the past several years. Many translators feel threatened by that technology. I prefer to embrace it, frankly. I see it as a tool, not a threat. I confess, I use Google Translate sometimes. You already know I ❤ Google. Moreover, OmegaT, my preferred CAT (computer aided translation) tool now has integrated an optional Google Translate feature, so that, while I am translating a document, OmegaT will show me the Google Translate result for that segment. I have to say that instances in which I can simply insert that result without editing it are few. Perhaps 15 to 20%. I suppose that’s not too bad, really, considering the success of earlier attempts at MT, but it is also a clear indication that, without MY intervention, the translation would come out terribly. Sometimes this Google Translate feature is helpful, speeds things up, makes my work more efficient. I have found, however, that if I use Google Translate to translate an entire document, the revision process thereafter often becomes so cumbersome that the job becomes more work than it would had I simply translated the document on my own. Or with OmegaT, with Google Translate at my side. Using OmegaT, with Google Translate, I have access to the utility in Google’s tool, only using results when appropriate, thus, and my work does become more efficient. This becomes a sort of ménage à trois of Computer Aided Translation, Machine Aided Human Translation, and of course, Human Translation. Or, we could just call it “Human Aided Machine Translation” (not a new term). No matter what you call it, Machine translations will always, in my opinion, require human intervention. So, as I see it, Machine Translation is a useful tool. But it will never, ever take the place of professional, human translators. Language, and the human brain, are simply too complex.


Relevant links:

Written by tonybaldwin

March 9, 2010 at 11:05 am

March becomes "Red Cross Month", Google sets you FREE, M$ must offer browser choice in EU

with one comment

Buenos días, Amigos.

Yesterday was a looong day, and I was unable to post here. I was out interpreting, Bridgeport, CT.
On a positive note, I had a delicious lunch of picanha, peixe frito, quiabo, aborbora, e feijão at Terra Brasilis. Yummmm…good stuff!
Today, I have a lot of catching up to do here in the office, one project working, two more on hold, busy, busy, busy.

Nonetheless, I wanted to stop in and bring your attention to a few matters:

First, President Obama has declared March to be “Red Cross Month”.
In his declaration, President Obama states:

The Red Cross has continued to serve those suffering from large- and small-scale disasters. The organization is best known for its work helping communities deal with major disasters such as hurricanes, floods, and wildfires. These large-scale disasters represent a major part of the work of the American Red Cross. Just as important are the tens of thousands of small-scale disasters that occur every day in communities nationwide, and the volunteers who respond to them. These efforts include supporting our military and their families, collecting and distributing blood, helping the needy, delivering health and safety education, and providing aid abroad. [1]

see: chiledisasterrelief.com/ and redcross.org article


This is interesting: Chile earthquake may have shifted Earth’s axis, apparently enough to have altered the length of a day by a few milliseconds. Kind of creEpy, really.


Next, and this is pretty cool, Google wants to make removing your data from their services as easy as falling off a log.

How many other internet companies do that? I understand removing your data from Facebook would require nothing short of a nuclear holocaust.


And last, for now, I found this amusing:
The European Commission has ruled, in antitrust proceedings, that MicroSoft® must offer users a choice for their default browser, rather than automatically include Internet Explorer®.


That’s all I have for now, me droogies…Back to work!

Written by tonybaldwin

March 2, 2010 at 6:52 am