tonybaldwin | blog

non compos mentis

Archive for the ‘gnu/linux’ Category

Web Word Count

leave a comment »

Web Word Count: Get the word count for a list of webpages on a website.

A colleague asked what the easiest way was to get the word count for a list of pages on a website (for estimation purposes for a translation project).

This is what I came up with:

#!/bin/bash
# add up wordcounts for website

total=0 # initialize variable for total

# scan through a list of pages
# strip the html elements and count the words
# append the count to wordcount.txt

for i in $(cat pagelist.txt); do
     curl -s  $i |  sed -ne '{s/]*>//g;s/^[ \t]*//;p}' | wc -w >> wordcount.txt
done

# this is for purely aesthetic purposes, 
# but we're merging the list of pages with the wordcount file:
paste pagelist.txt wordcount.txt > pagewordcount.list

# for each number in the wordcount.txt file, add it to the previous number (get a total)
for t in $(cat wordcount.txt); do 
	total=$((total + $t))
done

# append the total to the end of the merged pagelist+wordcount file:
echo "Total word count = $total" >> pagewordcount.list

# read back the file:
cat pagewordcount.list

# ciao
exit

I ssh-ed to my server and did
ls -1 *.html > pagelist.txt
which lallowed me to feed the script this list.

baldwinlinguas.com/index.html
baldwinlinguas.com/esp.html
baldwinlinguas.com/fran.html
baldwinlinguas.com/port.html
baldwinlinguas.com/empregos.html
baldwinlinguas.com/transquote.html

So, then I ran the script on this list of the pages, and this is the output:

baldwinlinguas.com/index.html 535
baldwinlinguas.com/esp.html 342
baldwinlinguas.com/fran.html 295
baldwinlinguas.com/port.html 337
baldwinlinguas.com/empregos.html 662
baldwinlinguas.com/transquote.html 244
Total word count = 2415

So, it works. Someone with better bash fu could likely find a shorter path to this result.

Now, this is simple, of course, for a simple website, like baldwinlinguas.com.
On the other hand, if you have some huge wordpress installation, like this blog, and have tonso public php pages, rather than html, and eve more php files in the backend, you have to do a bit of sorting, I imagine.

Were I to attempt that with the baldwinsoftware wiki, I would probably just go to the Sitemap and grab that list of pages, using their URLs, of course.

./tony

Written by tonybaldwin

September 21, 2011 at 5:25 am

search google, wikipedia, reverso from the bash terminal

leave a comment »

 

searching in bash

searching in bash

 

Okay, so, I like to use my bash terminal. Call me a geek all you like; it matters not to me. I wear that badge with pride.

The bash terminal is quick and efficient for doing a lot of stuff that one might otherwise use some bloated, cpu sucking, eye-candied, gui monstrosity to do. So, when I find ways to use it for more stuff, more stuff I do with it.

Now, for my work (recall, I am professionally a translator) I must often do research, some of which entails heavy lifting, and, otherwise, often simply searching for word definitions and translations. I use TclDict, which I wrote, frequently, but, I also use a lot of online resources that I never programmed TclDict to access, and would generally use a browser for that stuff. Unless, of course, I can do it my terminal!

For precisely such purposes, here are a couple of handy scripts I use while working.

First, let’s look up terms at Dict.org:

#!/bin/bash

if [[ $(echo $*) ]]; then

searchterm="$*"
else

read -p "Enter your search term: " searchterm
fi
read -p "choose database (enter \'list\' to list all): " db

if [ $db = list ] ; then
curl dict://dict.org/show:db

read -p "choose database, again: " db
fi

curl dict://dict.org/d:$searchterm:$db | less

 

 

Now, let’s search google from the command line:

#!/bin/bash
if [[ $(echo $*) ]]; then
searchterm="$*"
else
read -p "Enter your search term: " searchterm
fi
lynx -accept_all_cookies http://www.google.com/search?q=$searchterm
# I accept all cookies to go direct to search results without having to approve each cookie.
# you can disable that, of course.

 

I saved that in ~/bin/goose # for GOOgle SEarch
and just do
goose $searchterm

Or, search the google dictionary to translate a term:

#!/bin/bash
echo -e "Search google dictionary.\n"
read -p "Source language (two letters): " slang
read -p "Target language (two letters): " tlang
read -p "Search term: " sterm
lynx -dump "http://www.google.com/dictionary?langpair=$slang|$tlang&q=$sterm" | less

Note: For a monolingual search, just use the same language for source and target. Don’t leave either blank.

Or:

#!/bin/bash
if [ ! $1 ];
then
echo -e "usage requires 3 parameters: source language, target language, search term. \n
Thus, I have this as ~/bin/googdict, and do \n
googdict en es cows \n
to translate "cows" to Spanish. \n
For monolingual search, enter the language twice. \n
As indicated, use the two letter code: \n
\"en\" for English, \"fr\" for French, etc."
exit
fi

lynx -dump "http://www.google.com/dictionary?langpair=$1|$2&q=$3" | less

For the above, I have it in ~/bin/gd, usage being simply “gd $sourcelanguage $targetlanguage $searchterm”.
Example:
me@machine:~$ gd en es cow
Searches the Englist to Spanish dictionary for “cow”.

We can use similar principles to search reverso:

#!/bin/bash
#search reverso
read -p "Enter the source language: " slang
read -p "Enter target language: " tlang
read -p "Enter your search term: " searchterm
lynx -dump dictionary.reverso.net/$slang-$tlang/$searchterm | less

With the google dictionary, you use the two-letter language code (i.e., “en” for English, “fr” for French, etc.). With reverso, you have to spell out the language (“english” for English, etc.).

With all of the above, I’ve used the program, less, to display the results, rather than spitting it all out to to the terminal at once. Click here to learn how to use less, if needed.

Additionally, most of the above require Lynx Browser, which is generally available for any gnu/linux distribution via your favorite package manager (apt, synaptic, aptitude, yum, portage, pacman, etc.). For the dict.org script, I used cURL (also part of most gnu/linux distributions and installable with your favorite package manager).

Google Translate can also be accessed, but for this, we’ll use a bit of python magic (I know, I pick on google translate, a lot, but it can be useful):

#!/usr/bin/env python
from urllib2 import urlopen
from urllib import urlencode
import sys

# The google translate API can be found here:
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples

 

lang1=sys.argv[1]
lang2=sys.argv[2]
langpair='%s|%s'%(lang1,lang2)
text=' '.join(sys.argv[3:])
base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
params=urlencode( (('v',1.0),
('q',text),
('langpair',langpair),) )
url=base_url+params
content=urlopen(url).read()
start_idx=content.find('"translatedText":"')+18
translation=content[start_idx:]
end_idx=translation.find('"}, "')
translation=translation[:end_idx]
print translation

Originally found that here, on the ubuntuforums.

And now for Wikipedia we have a couple of options.
First, we have this awesome little handy script, tucked into my $PATH as “define”:

#!/bin/bash
dig +short txt $1.wp.dg.cx
exit

I use it simply with “define $searchterm”, and it gives a short definition from wikipedia.  I originally found it here.

Another extremely handy tool is Wikipedia2Text, which I simply installed from the debian repos via aptitude. When I use this, I also pipe it to less:
#!/bin/bash

if [[ $(echo $*) ]]; then

searchterm="$*"
else

read -p "Enter your search term: " searchterm
fi

 

wikipedia2text $searchterm | less

I have that tucked into ~/bin/wikit, thus, do simply wikit $searchterm to get my results.

Enjoy!

All code here that I have written is free and released according to the GPL v. 3. Check the links for code I borrowed for licensing information (pretty sure it’s all GPL-ed, too).

./tony

Written by tonybaldwin

May 3, 2011 at 12:52 am

Oggify – convert all your wav and mp3 to ogg

with one comment

A script to convert .mp3 files to .ogg

Requires mpg123 and oggenc, uses perl rename, but I can make one with the old rename (rename.ul now in ubuntu and debian).

Why should we use ogg?

cd into the dir fullo tunes, and:


#!/bin/bash

# convert mp3 and wav to ogg
# tony baldwin http://www.BaldwinSoftware.com
# cleaning up file names

echo cleaning file names...

rename 's/ /_/g' *
rename y/A-Z/a-z/ *
rename 's/_-_/-/g' *
rename 's/\,//g' *

# converting all mp3 files to wav,
#so there will be nothing but wavs

echo Converting mp3 to wav...

for i in $(ls -1 *.mp3)
do
n=$i
mpg123 -w "$n.wav" "$n"
done

# and, now, converting those wav files to ogg

echo Converting .wav to .ogg

for i in *.wav
do
oggenc $i
done

# Clean up, clean up, everybody everywhere
# Clean up, clean up, everybody do your share...

# cleaning some file names
# removing ".mp3" from $filename.mp3.ogg
# for result of $filename.ogg

rename 's/.mp3//g' *.ogg

# removing all those big, fat wav files.

rm -f *.wav
rm -f *.mp3

Cleaning up after ourselves...

echo -e "Your files are ready, friend.\nHappy listening!"

exit

# This program was written by tony baldwin - tony @ baldwinsoftware.com
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

wiki page for this script.

I have a version with a little gui-ness with zenity, if anyone wants it. (i.e. with a graphical user interface)

./tony

Written by tonybaldwin

February 22, 2011 at 11:25 am

Find IPs – a script to find all nodes on the LAN

with one comment

This will find all computers on the local area network:

#!/bin/bash

# find all nodes on this intranet

# create report file with date
fd=$(date +%m%d%y%H%M%S)
echo -e "IP$fd\nReport of IP addresses on this intranet, test started at \n$(date)\n\nThe following IP addresses were found:" > IP$fd.txt
echo -e " Okay. Mind you, this could take a couple of minutes.\nI'll be scanning all 254 possibilities between 192.168.1.1 and 192.168.1.254\nI will ring the system bell when I am done.\nHere we go..."
for i in 192.168.1.{1..254}
do
echo "scanning ... ... ..."
if ping -c1 -w1 $i &>/dev/null
then
echo -e "AHA! Got one! ---- $i is up!"
echo -e $i >> IP$fd.txt
fi
done
echo -e "That's all I got.\Test completed at\n$(date)\n" >> IP$fd.txt
echo -e \\a
echo -e "Your report is IP$fd.txt, and this is what it says:\n"
cat IP$fd.txt
exit

Starting looks like this:

finding IP address on the LAN

finding IP address on the LAN

In the end, it gives you a report that looks like this:

IP022211114605
Report of IP addresses on this intranet, test started at
Tue Feb 22 11:46:05 EST 2011
The following IP addresses were found:
192.168.1.1
192.168.1.100
192.168.1.101

That’s all I got.
Test completed at
Tue Feb 22 11:49:35 EST 2011

Like this:

FindIPS output

IP addresses found!

chmod that baby, pop her in ~/bin.
done.

public wiki page for this script

./tony

Written by tonybaldwin

February 22, 2011 at 10:00 am

Adventures with Debian Lenny on AMD64

with 2 comments

LibreOffice in AMD64 Debian

LibreOffice in AMD64 Debian/Lenny, in OpenBox

Well, folks, I figured it was time for a computer upgrade, since, I was, until yesterday, still working on a 3.2ghz Celeron CPU with 1.5gm ram that I purchased almost 4 years ago, now, from TigerDirect.com.  Frankly, that machine was still doing a great job on 95% of stuff I do. I can’t lie. But, being the geek that I am, I felt the need for more speed, and whole lot of putrid green envy over newer, shinier things.  So, I went to my favorite source, again, TigerDirect, and ordered a 2.5ghz dual core, AMD64 barebones kit with 4gb of ram.

d00d…I have not been able to ramp up the CPU usage past like 18%, nor use more than about 20% of the ram. I know there are bigger, faster machines out there, but this is clearly plenty of machine for my needs.

Anyway, this machine is an AMD 64 bit CPU.  I won’t pretend, for even a second, to comprehend what the difference is between 64 bit and 32 bit computer, beyond that 64 is twice 32, and, in some cases, it means I need different software.

So, I got the machine in two boxes, with all these separate parts, a motherboard, a cpu, 2 ram sticks, a hard drive, and I had to put them together.  I yanked the video card from my old machine and used it, too, since it has dvi, and this mobo in this kit did not (works best for my monitor).  And, then, being completely ignorant, as I am, I installed from the same Debian/Lenny XFCE installation disk that I had used for every other machine in my office.  It installed okay, but, for the life of me, I could not get DHCP to work so I could connect to the internet.  So I got on #debian at freenode with my old computer (now using an older monitor, and the onboard mobo video) and raised my hand and after several rounds of the real hackers in there massaging me with the Socratic method, it became clear that I had simply installed the wrong system, and that I needed the (duh!) AMD64 version of Lenny.  So, I downloaded that ISO file, burnt up a CDROM, and installed it.  And that worked nicely, and the DHCP works fine, and I’m actually writing to you from this new machine.  I like it.  But I had a few other adventures between then and now, and thought I’d share them.

One of the first things we all do is tweak up our browsers and import our bookmarks and all the good stuff.  Well, I prefer google chrome, which makes a lot of that really, really easy, since you can synch all that stuff right online, so, of course, I installed google-chrome, which I’m using now, and, surprise, I like it.  But, flash was not working…which made me sad.  I did a lot of googling around and digging around and trying to figure out why, and, along the way even tried iceweasel (debian for “firefox”) and found that flash wasn’t working there, either.   Eventually I found Adobe Flash Square, the new 64 bit, beta, prerelease, Flash plugin.  Now, getting that to work with iceweasel/firefox was simple enough; Simply a matter of downloading the tarball, unpacking that, and copying the libflashplayer.so to /usr/lib/mozilla/plugins.  Did that, and I could watch a youtube video in iceweasel, no problem.  But there is no plugins directory for google-chrome.  More googling about revealed that the browser, supposedly, has Flash support built right in.  Sure.  But it wasn’t working… A bit more googling about and I learned that I could see more information about my plugins in google chrome by pointing my browser to about:plugins, and, so, I did, and I learned that flash was, in fact, not really built into chrome, but that chrome was looking in /usr/lib/swfdec-mozilla/ for nswrapper_32_64.libflashplayer.so, which was, indeed, there, but not working.  I learned that this nswrapper nonsense was an older trick, which, apparently, is not compatible with the NEW Flash 10, or something, because, well, I figured that out because it wasn’t working…But the Flash Square libflashplayer.so WAS working…So, this is what I did:  I removed the nswrapper_32_64.libflashplayer.so, just yanked it right out of /usr/lib/swfdec-mozilla/.  Then, I made a link in that directory to the libflashplayer.so in /usr/mozilla/plugins, and renamed it to nswrapper_32_64.libflashplayer.so (since that’s what chrome would be looking for), and restarted chrome, and, guess what.  IT WORKED!

cd /usr/lib/swfdec-mozilla/
su
********
rm -f nswrapper_32_64.libflashplayer.so
ln -s /usr/lib/mozilla/plugins/libflashplayer.so /usr/lib/swfdec-mozilla/
mv libflashplayer.so nswrapper_32_64.libflashplayer.so

That did it.

Now, that’s not the only issue I’ve had.  I installed LibreOffice, for which there are 64 bit .deb files.  But when I tried to run it, it kept puking and giving up.  I looked at the error it was giving.  I neglected to write it down, but, it comes down to the fact that it could not find libcairo.so.2.  I dug around for that, and found it right in (big surprise) /usr./lib….so, what was the problem?  I looked again at the error.  For some reason unknown to me or anyone with whom I’ve communicated, libreoffice was looking for libcairo.so.2 in /opt/libreoffice/basis3.3/program, rather than in /usr/lib, where any normal program would expect to find such a lib.  That was easy.  I just copied the lib right in there (I don’t know, I probably could have symlinked that, too…didn’t even try that…).

cp ./usr/lib/libcairo.so.2 /opt/libreoffice/basis3.3/program/

And that was that. LibreOffice is now working superbly.

There have been a few other tweaks and adjustments along the way, but, at this juncture, I’ve got a blazing fast system (debian with openbox is light and efficient) on which to work.  Good stuff.  I feel like a got a great deal from TigerDirect, and, as always the best deal ever with Debian gnu/linux.  The guys on #debian at freenode were extremely patient and graciously helpful, and I owe them a great debt of gratitude.  Debian ROCKS! both the software and the community.


On a side note, stealing my old machine’s video card, and the 24 inch 1680×1050 monitor, seemed to have pissed it off, because, with the onboard video and 15 inch 1280×960 monitor (both of which it HAD used before I purchased the big screen) all text in gui windows, menus, panels, etc. was

HUGE

too huge to read, even, where windows were expanded off screen and even alt-click grabbing them and moving them around was useless, and menus were unreadable, and, basically, all graphical elements were rendered useless, regardless of whether I used openbox, lxde, xfce, or wmii…I wrestled with that for hours…running Xorg -configure, dpgk-reconfigure xserver-xorg, xrandr, changing screen resolution, refresh rates…sacrificing chickens, and my best goat…blah blah blah..all to no avail.

Eventually, I figure out that the problem had nothing to do with Xorg, screen resolution, or what type of sacrifice I offered…The culprit was gdm (still don’t know why).  I figured it, because, if I logged into single user as root and started X without the assistance of gdm, everything was perfect. So that seemed, correctly, to identify gdm as the culprit.  Still, I could not figure out how to fix gdm, so, I just yanked gdm right off the machine (aptitude remove gdm).  Using startx from the command line kept starting LXDE on that machine, however, which displeased me since, well, I prefer plain old openbox, without LXDE, and, besides, I also have XFCE and WMII on that machine, and would prefer to have a choice when logging in.  So, for that, I just had to copy a .xinitrc into my /home. I removed the entry for LXDE (never use it), and added entries for openbox, xfce and wmii.  Now, no DM…I log in via the console, then startx, which tosses up a dialog and asks which of the 3 window managers I want to use, and, good to go.  It works nicely.


One more really cool thing. I back up my /home regularly onto an external usb hdd (at least monthly, but I did it before building the new machine and stealing the video card from the old one, I do :~# rsync -rvu /home/tony /media/disk/home), so, I was able to simply copy my whole home directory onto the new machines, and, this being the very same system (well, except being amd64 in stead of 32bitIntel, or whatever), all the programs have the same configs, saved passwords, blah blah blah that they had before. That makes moving to a new computer so much easier. Since I run rsync as root, I do have to chown everything (chown -R tony:tony) again after moving it back, but that’s no big deal.

Written by tonybaldwin

January 27, 2011 at 2:08 pm

DjVu: Free alternative to PDF

leave a comment »

djview4

this article, as .djvu in djview4

First, a bit of ranting about open standards and free file formats:
Okay, you know I’m always harping about using Open Document Formats.
So, on the LibreOffice user list today there was discussion of a viable Free/Open alternative to .pdf files. After all, PDF is, indeed, a proprietary format, owned by Adobe, and it is ubiquitous, and there really should (must, perhaps), be a free, open alternative. As such, someone on the list mentioned DjVu, which, frankly, I’d never looked at before (I had heard of it, but knew not what it was). It’s a free/open file format that was initially created for scanned documents, from what I gather, and has been around since the late 80s, still maintained by the original authors, and is now used for all kinds of gro0vy stuff.
I did a bit of research, googling, apt=cache searching, and poking around. Eventually, I aptitude installed djview4 and djvulibre and experimented a little. I have drawn the conclusion that, yes, in my opinion, DjVu would be an excellent candidate to be used as, in fact, a better option for many reasons, for the purposes .pdf currently serves (a portable document format that preserves formatting, essentially). Works great.

But there IS a rather glaring drawback…
The one big drawback is, conversion tools are lacking.
One can not, for instance, simply write a DjVu file in any kind of document editor, as you can write a pdf with many different editors, web browsers, most office software, LaTeX editors, and basic text editors, such as tcltext, and, frankly, even in a command line interface.
But to create DjVu, you can only convert other files to DjVu.
Then, in general, and this is what most irritates me, it seems you have to convert from non-free formats. There are no tools, for instance, to convert directly from plain text, LaTeX (.tex), .odf (.odt), .png, or even html files to a .dvju file. What’s worse, is that all of your Free and/or open source browsers, document editors, etc., will export or print a file to .pdf, but not to .djvu. OpenOffice.org will write a .pdf. LibreOffice, and Abiword will write a .pdf. LaTeX editors will write a .pdf….Everybody will write a .pdf, but nobody has written code to write a file directly to .djvu. In my opinion, that needs changing. We need to use open standards and free/open file formats (all kinds of reasons for that discussed in this entry to this blog).

That said, today I wrote a script to convert a plain text file to DjVu (but, yes, I had to round-trip it through .pdf, darn it).
This script was written on a Debian/Stable (lenny at the time of this writing) system, on AMD64 arch, using all tools available in the lenny repos.
It requires (obvious when you read the script) enscript, ps2pdf, and pdf2djvu (part of dvjulibre).
The script first converts your text file to postscript with enscript, the from postscript to pdf, with, surprise, ps2pdf, and, then, the final step of converting to .djvu.

The script looks like this:
#!/bin/bash

if [[ $(echo $*) ]]; then
text="$*"
else
echo "try again, and include a file name, and ONLY 1 file name at a time. Thank you." && exit
fi

echo converting $text to $text.ps

enscript $text -q -B -p $text.ps

echo converting $text.ps to $text.pdf

ps2pdf $text.ps

echo converting $text.pdf to $text.djvu

pdf2djvu $text.pdf -o $text.djvu

echo renaming ...

rename.ul .txt.djvu .djvu $text.djvu

echo cleaning up ...

rm $text.ps $text.pdf

echo done

exit

I actually turned the script on itself, and created a DjVu file of this text, available here.
With this, I may very add the capacity to export a .djvu file to tcltext. Why not? It’s just a shame, imho, that such an export is not direct, without having the cross into proprietary territory via .pdf, in order to be accomplished.

Also, as a gift to my fellow freedom fighters, foss hackers, and open standards supports, I have created a DjVu of my poetry here which contains all the poems published in my recent book  (but not the paintings and photographs).

And, this full article in djvu format here.  This last was fun, because I ended up having to change the text encoding first.  Apparently enscript doesn’t like utf8. I had copy/pasted the article into tcltext, which generates utf8 here (system default).  I made a .dvju that had all these weird character substitutions (like /200a#blahblah for a quotation mark?).  Here’s how to handle the conversion.

iconv iconv -f utf8 --to-code=ascii//TRANSLIT yourfile > newfile

Now, if you use firefox or some other mozilla derivative, there’s actually a plugin for view such files in your browser, included in the djvulibre packages..  Otherwise, you’ll need a djvu viewer, such as djview or evince.

Anyway,
Enjoy.

./tony


Este artículo en español: http://www.gnewbook.org/pg/blog/tonybaldwin/read/83736/djvu-excelente-substituto-al-formato-nolibre-de-pdf-pero
Esse artigo em português: http://softwarelivre.org/tonybaldwin/blog/djvu-otimo-substituto-ao-formato-nao-livre-de-pdf-mas…

Written by tonybaldwin

January 27, 2011 at 2:06 pm

mv screenshots

leave a comment »

Screenshot pictures tend to pile up in my /home, so I wrote this:

#!/bin/bash

# move pix or die, damn it

if [ ! -f *.jpg ]
then
echo no pix d00d
exit
else
echo THESE
ls -1 *.jpg
echo are ALL pix, d00d.
for i in $(ls *.jpg)
do
mv $i pix/screenshots
echo I just moved $i to the screenshots dir
done
echo all done d00d
fi
exit

Written by tonybaldwin

January 27, 2011 at 1:54 pm