tonybaldwin | blog

non compos mentis

Archive for the ‘hacking’ Category

Web Word Count – count the words on a website with bash, lynx, curl, wget, sed, and wc

with one comment

Web Word Count: Get the word count for a list of webpages on a website.

A colleague asked what the easiest way was to get the word count for a list of pages on a website (for estimation purposes for a translation project).

This is what I came up with:

#!/bin/bash

# get word counts and generate estimated price for localization of a website
# by tony baldwin / baldwinsoftware.com
# with help from the linuxfortranslators group on yahoo!
# released according to the terms of the Gnu Publi License, v. 3 or later

# collecting necessary data:
read -p "Please enter the per word rate (only numbers, like 0.12): " rate
read -p "Enter currency (letters only, EU, USD, etc.): " cur
read -p "Enter domain (do not include http://www, just, for example, somedomain.com): " url

# if we've run this script in this dir, old files will mess us up
for i in pagelist.txt wordcount.txt plist-wcount.txt; do
	if [[ -f $i ]]; then
		echo removing old $i
		rm $i
	fi
done

echo "getting pages ...  this could take a bit ... "

wget -m -q -E -R jpg,tar,gz,png,gif,mpg,mp3,iso,wav,ogg,ogv,css,zip,djvu,js,rar,mov,3gp,tiff,mng $url
find . -type f | grep html > pagelist.txt

echo "okay, counting words...yeah...we're counting words..."

for file in $(cat pagelist.txt); do
	lynx -dump -nolist  $file | wc -w >> wordcount.txt
done
paste pagelist.txt wordcount.txt > plist-wcount.txt

echo "adding up totals...almost there..."
total=0
for t in $(cat wordcount.txt); do
	total=$((total + t))
done

echo "calculating price ... "
price=`echo "$total * $rate" | bc`

echo -e "\n-------------------------------\nTOTAL WORD COUNT = $total" >> plist-wcount.txt
echo -e "at $rate, the estimated price is $cur $price
------------------------------" >> plist-wcount.txt

echo "Okay, that should just about do it!"
echo  -------------------------------
sed 's/\.\///g' plist-wcount.txt > $url.estimate.txt
rm plist-wcount.txt
cat $url.estimate.txt
echo This information is saved in $url.estimate.txt
exit

So, then I ran the script on my site, tonybaldwin.net, with a rate of US$012/word, and this is the final output:

—————————————-
tonybaldwin.net/log/archives/environment/index.html 38
tonybaldwin.net/log/archives/cuisine/index.html 38
tonybaldwin.net/log/archives/music/index.html 52
tonybaldwin.net/log/archives/philosophy/index.html 38
tonybaldwin.net/log/archives/nanoblogger-help/index.html 52
tonybaldwin.net/log/archives/2011/09/11/911/index.html 322
tonybaldwin.net/log/archives/2011/09/index.html 774
tonybaldwin.net/log/archives/2011/09/01/mit_intro_to_cs_and_programming_assignment_1/index.html 494
tonybaldwin.net/log/archives/2011/08/26/come_on_irene/index.html 382
tonybaldwin.net/log/archives/2011/08/26/welcome_to_nanoblogger_3_4_2/index.html 289
tonybaldwin.net/log/archives/2011/08/26/here_we_roll_again/index.html 618
tonybaldwin.net/log/archives/2011/08/27/couldnt_stand_the_weather/index.html 93
tonybaldwin.net/log/archives/2011/08/index.html 1205
tonybaldwin.net/log/archives/2011/index.html 133
tonybaldwin.net/log/archives/technology/index.html 56
tonybaldwin.net/log/archives/politic/index.html 38
tonybaldwin.net/log/archives/religion/index.html 38
tonybaldwin.net/log/archives/art/index.html 38
tonybaldwin.net/log/archives/index.html 85
tonybaldwin.net/log/archives/personal/index.html 65
tonybaldwin.net/log/archives/health/index.html 38
tonybaldwin.net/log/articles/about/index.html 671
tonybaldwin.net/log/index.html 2027
tonybaldwin.net/log.1.html 2027
tonybaldwin.net/index.html 96
tonybaldwin.net/social.html 82

———————————————–
TOTAL WORD COUNT = 9789
at 0.12, the estimated price is USD 1174.68
———————————————–

Now, this is simple, of course, for a simple website, like tonybaldwin.net, which is largely all static html pages. Sites with dynamic content are going to be an entirely different story, of course.

The comments explain what’s going on here, but I explain in greater detail here on the baldwinsoftware wiki.

Now, if you just want the wordcount for one page, try this:

    #!/bin/bash

# add up wordcounts for one webpage

if [[ ! $* ]]; then
    read -p "Please enter a webpage url: " ur
else
    url=$*
 fi
 read -p "How much to you charge per word? " rate
 count=`lynx -dump -nolist $url | wc -w`
 price=`echo "$count * $rate" | bc`
 echo -e "$url has $count words. At $rate, the price would be US\$$price."
 exit

Special thanks to out to the Linux 4 Translator list for some assistance with this script.

Enjoy!

./tony

Advertisements

Written by tonybaldwin

September 20, 2011 at 10:31 pm

New Xpostulate release in the works

leave a comment »

Okay, I just pushed new code for Xpostulate to github with the following changes:

  • removed iziblog, scribbld, inksome (spam SEO havens anyway)
  • removed twitter until I can get oauth working
  • added support for custom wordpress installations
  • added support for posting to friendika with bbcode insertions
  • changed identi.ca feature to support any status.net installation.
  • also, various pertinent alterations to gui, of course

all in ONE DAY! because I F–KING ROCK!

I have NOT updated the win/lin installers on the main Xpostulate page, yet.
I have to play with installjammer and get those worked up again, and will probably give a day or two for this new code to be tested,
since, it seems, I now have a contributor on the project who seems willing to test and prod this code.

WELCOME ABOARD, Charles Roth!

Still to do:

  • I really, really want a button to click to automagically translate bbcode to html or vice-versa. That I can do, but need time.
  • Get oauth working for twitter…maybe
  • add support for blogger
  • change the LJ, IJ, DJ, DW to be simple moveabletype, with multiple options, rather than hardwired for 4 different sites, so, say, if you only use LJ and DW, you don’t have DJ and IJ cluttering your interface, or, even, if you have multiple LJ accts (I do, one for my art, other for hackery), you can do that, etc.

Now, I really must get back to translating these Brazilian pharma regulations.

Written by tonybaldwin

September 18, 2011 at 1:47 pm

Fren.Tcl and Frendi.Sh

with 3 comments

Friendika

Friendika

So, those who know me know that I’ve been playing on Friendika, a decentralized, federated, free/open source, privacy protecting, and, well, pretty amazing Social Networking application.

Friendika is pretty awesome in various ways, including, first, you have complete control over who can or cannot see your content.  You own your content and your privacy is completely yours to control.  Also, you can follow contacts from many other networks, including twitter, any status.net installation, Diaspora and Facebook, plus rss feeds, even, so, it becomes sort of a social networking aggregator.  Not only that, but it has friend groups similar to Diaspora Aspects or Google+ Circles.  These groups are very handy.  I follow my Diaspora and Facebook contacts, plus my identi.ca contacts, plus a large number of twitter accounts on my friendika, and have them grouped into local friends, family, haxors (fellow foss hackers, tech blogs, etc.), friends (not local, people I met online), tradus (translation colleagues, work related, polyglots), and one more group for news which includes mostly twitter feeds from a number of news outlets (Al Jazeera, BBC, NPR, Alternet, etc.).  So, it has really helped me to organize my social networking.

So, these past couple of days I, being the geek that I am, have been playing with means of posting to Friendika remotely, first from the bash cli.  Now, I had posted earlier a quick-n-dirty update type script, but I have one now that will toggle cross-posting to various other services (statusnet, twitter, facebook), and will open an editor (vim) to allow you to write longer posts.  I posted it on the wiki here, but will also include the code in this post:

#!/bin/bash

# update friendika from bash with curl
# I put this in my path as "frendi"

# here you enter your username and password
# and other relevant variables, such as whether or not
# you'd like to cross post to statusnet, twitter, or farcebork

read -p "Please enter your username: " uname
read -p "Please enter your password: " pwrd
read -p "Cross post to statusnet? (1=yes, 0=no): " snet
read -p "Cross post to twitter? (1=yes, 0=no): " twit
read -p "Cross post to Farcebork? (1=yes, 0=no): " fb
read -p "Enter the domain of your Friendika site (i.e. http://friendika.somesite.net): " url

# if you did not enter text for update, the script asks for it

if [[ $(echo $*) ]]; then
	ud="$*"
else
	read -p "Enter your update text: " ud
fi

# and this is the curl command that sends the update to the server

if [[ $(curl -u $uname:$pwrd  -d "status=$ud&statusnet_enable=$snet&twitter_enable=$twit&facebook_enable=$fb"  $url/api/statuses/update.xml | grep error) ]]; then

# what does the server say?

	echo "Error"
else
	echo "Success!"
	echo $ud
fi

# this next is optional, but I made a dir in ~/Documents to keep the posts.
# You can comment it out, you can change where it is storing them (the dir path)
# or, even, if you don't want to save the posts (they will pile up), you could
# change this to simply
# rm $filedate.fpost or rm -rf *.fpost, or some such thing.

mv $filedate.fpost ~/Documents/fposts
exit

But I have also now written a graphical application in tcl/tk to write posts to Friendika, Fren.Tcl

Fren.Tcl

Fren.Tcl - tcl/tk Friendika posting application

Find me on Friendika here.

./tony

Written by tonybaldwin

September 14, 2011 at 8:27 am

FireSSH – Firefox ssh addon = AWESOME!

with one comment

This, me droogies, is one of the coolest browser extensions EVER!

Editing a webpage with vim over ssh with FireSSH in Iceweasel 5.0
Editing a webpage with vim over ssh with FireSSH in Iceweasel 5.0

FireSSH is a Firefox plugin (also compatible with iceweasel, mis compañeros debianistas), written entirely in javascript, that runs an SSH terminal IN your browser, allowing you remote access to your webserver or other machine.

After using tony

Written by tonybaldwin

September 6, 2011 at 7:21 pm

Image UP

leave a comment »

Image Up

a quick-n-dirty script to copy an image (or other file) to your server. (wiki page for this script)

I basically use this to upload screenshots for display here on this wiki and my blog, etc., so have the images directory “hardwired” in the script, but this could easily be customized to choose a different directory and use with any manner of files.

#!/bin/bash

# script to upload images to my server
# by tony baldwin

if [ ! $1 ]; then
        # If you didn't tell it which file, it asks here
	read -p "Which image, pal? " img
        else
        img = $1
fi

# using scp to copy the file to the server
scp $img username@server_url_or_IP_address:/path/to/directory/html/images/
# you will be asked for your password, of course.  This is a function of scp, so not written into the script.

echo "Your image is now at http://www.yoursite.com/images/$img."
read -p "Would you like to open it in your a browser now? (y/n): " op

if [ $op = "y" ]; then
	# you can replace xdg-open with with your favorite browser, but this should choose your default browser, anyway.
	xdg-open http://www.yoursite.com/images/$img
        # if you chose yes, the browser will open the image.
        # Otherwise, it won't, but you have the url, so you can copy/paste to a browser or html document, blog entry, tweet, etc., at will.
fi

exit

This image was uploaded with the above script:

(editing website with Tcltext)

This script, of course, assumes you are in the same directory as your image file, too.

Enjoy!

./tony


EDIT: What would be cool is if I could make your filemanager allow this in a right-click action. Like, I use PCManFM. If I could just right-click an image and choose this, then pop-up the url with zenity, or, perhaps, even just automatically run the xdg-open…Hmmmm…One can probably work this out with some filemanagers more easily than others.

With some work, I could rewrite the script so that it choose a clicked image and auto-opens with the browser, and then just choose the script with “right-click > open with …”, perhaps…

Of course, I can just F4 (open dir in terminal), then bang off the script.

Written by tonybaldwin

September 4, 2011 at 1:31 pm

Get me some learnin' (MIT Open Courseware)

leave a comment »

So, I decided,”Enough with the tinkering with the hackery…Time to start learning for real.”, and started to take this course (Introduction to Computer Science and Programming) through MIT’s Open Courseware.

So far, I haven’t even “attended” the first lecture (watch a video), but skipped ahead to the first assignment, being like that, and jumped right to solving it. The assignment was to create a program, in any language, that asks the user for their last name, then first name, then prints back the first name, then last name. I did it in 6 different languages, just for fun. 😀

First python:

#!/usr/bin/env python

# first assignment for MIT intro comp sci class.
# Problem Set 0
# Name: Tony Baldwin
# Collaborators: none
# Time: 0:30

last = raw_input("Please enter your last name:  ")
first = raw_input("Please enter your first name: ")
print("Hello, " + first + " " +  last + "!")

Then bash:

#!/bin/bash

# MIT Intro to CS & Programming, assignment 1
# by tony baldwin

read -p "Please enter your last name: " last
read -p "Please enter your first name: " first
echo Hello, $first $last!

Then lisp:

#!/usr/bin/clisp

; MIT Intro to CS & Programming, assignment 1
; tony baldwin

(format t "Please enter your last name: ")
    (let ((last (read)))
(format t "Please enter your first name: ")
     (let  ((first (read)))
     (format t "~%Hello, ~A ~A!" first last)))

Then tcl:

#!/usr/bin/env tclsh8.5

# MIT Intro to CS & Programming, assignment 1
# by tony baldwin

puts "Please enter your last name: "
gets stdin last
puts "Please enter your first name: "
gets stdin first
puts "Hello, $first $last!"

Then perl:

#!/usr/bin/perl -w
use strict;
# MIT Intro to CS & Prog
# assignment PS0

print "Please enter your last name: ";
chomp($last = );
print "Please enter your first name: ";
chomp($first = );
print "Hello, $first $last!\n";

And now, some ruby:

#!/usr/bin/ruby

# MIT Intro to CS & Programming
# assignment ps0

puts "Please enter your last name: "
last = gets
puts "Please enter your first name: "
first = gets
puts "Hello, " + first.chomp + " " +  last.chomp + "!"

I do know that eventually I will have to attend the lectures, but this first assignment seemed rather straightforward.

I was motivated to start really learning my hackery when, a couple of nights ago, I started to look at lisp.  Something in the sparse efficiency of lisp struck me as, well, striking. Beautiful, even. Weird…

./tony

Written by tonybaldwin

September 1, 2011 at 6:18 am

Thou unmuzzled, malmsey-nosed scullian! (randomness in php)

with 6 comments

Some time ago, I wrote fo0l and Shakes, being random Shakespearean insult generators in python, fo0l being a basic script, and Shakes being the same, dressed up with a tkinter gui.

Today, I translated fo0l to php, creating a webinterface for this lovely linguistic tool.

Try it out HERE, if thou hast the heart, thou frothy, shard-borne haggard!

What did I do?
Let’s look at fo0l, first:

#!/usr/bin/python
# Shakespearean insult generator

from random import randint

a = ("artless", "bawdy", "beslubbering", "bootless", "churlish", "cockered", "clouted", "craven", "currish", "dankish", "dissembling", "droning", "errant", "fawning", "fobbing", "froward", "frothy", "gleeking", "goatish", "gorbellied", "impertinent", "infectious", "jarring", "loggerheaded", "lumpish", "mammering", "mangled", "mewling", "paunchy", "pribbling", "puking", "puny", "qualling", "rank", "reeky", "roguish", "ruttish", "saucy", "spleeny", "spongy", "surly", "tottering", "unmuzzled", "vain", "venomed", "villainous", "warped", "wayward", "weedy", "yeasty", "cullionly", "fusty", "caluminous", "wimpled", "burly-boned", "misbegotten", "odiferous", "poisonous", "fishified", "Wart-necked") # 60 items

a1 = randint(0,59)
a2 = a[a1]

b = ("base-court", "bat-fowling", "beef-witted", "beetle-headed", "boil-brained", "clapper-clawed", "clay-brained", "common-kissing", "crook-pated", "dismal-dreaming", "dizzy-eyed", "doghearted", "dread-bolted", "earth-vexing", "elf-skinned", "fat-kidneyed", "fen-sucked", "flap-mouthed", "fly-bitten", "folly-fallen", "fool-born", "full-gorged", "guts-griping", "half-faced", "hasty-witted", "hedge-born", "hell-hated", "idle", "headed", "ill-breeding", "ill-nurtured", "knotty-pated", "milk-livered", "motley-minded", "onion-eyed", "plume-plucked", "pottle-deep", "pox-marked", "reeling-ripe", "rough-hewn", "rude-growing", "rump-fed", "shard-borne", "sheep-biting", "spur-galled", "swag-bellied", "tardy-gaited", "tickle-brained", "toad", "spotted", "unchin-snouted", "weather-bitten", " whoreson", "malmsey-nosed", "rampallian", "lily", "livered", "scurvy-valiant", "brazen-faced", "unwash'd", "bunch-back'd", "leaden-footed", "muddy-mettled", "pigeon-liver'd", "scale-sided") # 62 items

b1 = randint(0,61)
b2 = b[b1]

c = ("apple-john", "baggage", "barnacle", "bladder", "boar-pig", "bugbear", "bum-bailey", "canker-blossom", "clack-dish", "clotpole", "coxcomb", "codpiece", "death-token", "dewberry", "flap-dragon", "flax-wench", "flirt-gill", "foot-licker", "fustilarian", "giglet", "gudgeon", "haggard", "harpy", "hedge-pig", "horn-beast", "hugger-mugger", "joithead", "lewdster", "lout", "maggot-pie", "malt-worm", "mammet", "measle", "minnow", "miscreant", "moldwarp", "mumble-news", "nut-hook", "pigeon-egg", "pignut", "puttock", "pumpion", "ratsbane", "scut", "skainsmate", "strumpet", "varlot", "vassal", "whey-face", "wagtail", "knave", "blind-worm", "popinjay", "scullian", "jolt-head", " malcontent", "devil-monk", "toad", "rascal", "Basket-Cockle") # 60 items

c1 = randint(0,59)
c2 = c[c1]

print("Thou " + a2 + ", " + b2 + " " + c2 + "!")

Now, how did I translate that to php?


Enjoy!

./tony

Written by tonybaldwin

May 17, 2011 at 6:44 pm