tonybaldwin | blog

non compos mentis

DjVu: Free alternative to PDF (and a script to convert plain text to DjVu)

with 14 comments

djview4

this article, as .djvu in djview4

First, a bit of ranting about open standards and free file formats:
Okay, you know I’m always harping about using Open Document Formats.
So, on the LibreOffice user list today there was discussion of a viable Free/Open alternative to .pdf files. After all, PDF is, indeed, a proprietary format, owned by Adobe, and it is ubiquitous, and there really should (must, perhaps), be a free, open alternative. As such, someone on the list mentioned DjVu, which, frankly, I’d never looked at before (I had heard of it, but knew not what it was). It’s a free/open file format that was initially created for scanned documents, from what I gather, and has been around since the late 80s, still maintained by the original authors, and is now used for all kinds of gro0vy stuff.
I did a bit of research, googling, apt=cache searching, and poking around. Eventually, I aptitude installed djview4 and djvulibre and experimented a little. I have drawn the conclusion that, yes, in my opinion, DjVu would be an excellent candidate to be used as, in fact, a better option for many reasons, for the purposes .pdf currently serves (a portable document format that preserves formatting, essentially). Works great.

But there IS a rather glaring drawback…
The one big drawback is, conversion tools are lacking.
One can not, for instance, simply write a DjVu file in any kind of document editor, as you can write a pdf with many different editors, web browsers, most office software, LaTeX editors, and basic text editors, such as tcltext, and, frankly, even in a command line interface.
But to create DjVu, you can only convert other files to DjVu.
Then, in general, and this is what most irritates me, it seems you have to convert from non-free formats. There are no tools, for instance, to convert directly from plain text, LaTeX (.tex), .odf (.odt), .png, or even html files to a .dvju file. What’s worse, is that all of your Free and/or open source browsers, document editors, etc., will export or print a file to .pdf, but not to .djvu. OpenOffice.org will write a .pdf. LibreOffice, and Abiword will write a .pdf. LaTeX editors will write a .pdf….Everybody will write a .pdf, but nobody has written code to write a file directly to .djvu. In my opinion, that needs changing. We need to use open standards and free/open file formats (all kinds of reasons for that discussed in this entry to this blog).

That said, today I wrote a script to convert a plain text file to DjVu (but, yes, I had to round-trip it through .pdf, darn it).
This script was written on a Debian/Stable (lenny at the time of this writing) system, on AMD64 arch, using all tools available in the lenny repos.
It requires (obvious when you read the script) enscript, ps2pdf, and pdf2djvu (part of dvjulibre).
The script first converts your text file to postscript with enscript, the from postscript to pdf, with, surprise, ps2pdf, and, then, the final step of converting to .djvu.

The script looks like this:
#!/bin/bash

#!/bin/bash

# Converting a text file to a DjVu file
# copyright © tony baldwin / tony@baldwinsoftware.com
# release according to the terms of the GNU Public License, v. 3 or later

# first, make sure you named a file. duh.

if [[ $(echo $*) ]]; then
text="$*"
else
echo "try again, and include the file name..hello!" && exit
fi

# okay, enscript like ASCII best, so let's test our file encoding
# if we have anything other than ASCII, we will convert with iconv

enc="$(file --brief --mime-encoding $text)"
echo This file is encoded as $enc

if [ $enc != us-ascii ] ; then
echo We need to convert to ascii first.
echo Converting text encoding now ...
iconv -f $enc --to-code=ascii//TRANSLIT $text > tempy
mv tempy $text
newenc="$(file --brief --mime-encoding $text)"
echo Ok, now we have $newenc encoding and can proceed with conversion to djvu ...
fi

# from here, things are fairly self-explanatory

echo converting $text to $text.ps

enscript $text -q -B -p $text.ps

echo converting $text.ps to $text.pdf

ps2pdf $text.ps

echo converting $text.pdf to $text.djvu

pdf2djvu $text.pdf -o $text.djvu

echo renaming ...

rename.ul .txt.djvu .djvu $text.djvu

echo cleaning up ...

rm $text.ps $text.pdf

echo all done

# here, we are using the variable $text, which is $filename.txt, and changing it to $filename
# so we can append .djvu and open the resulting file in djview4

ntx=${text%.*}

djview4 $ntx.djvu &

exit

# This program was written by anthony baldwin - tony@baldwinsoftware.com
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

What it’s doing? The first thing the file does is check the file encoding of the file in question. Enscript seems to play nice with ASCII, but not utf8 or some other encodings, so we’re converting to ASCII before doing anything else. Then, the script converts your text file to postscript with enscript, then from postscript to pdf, with, surprise, ps2pdf, and, then, the final step of converting to .djvu. At the end, the file cleans up the directory, removing the .ps and .pdf files. Then, it opens your file in Djview4. I have commented the script accordingly.

I actually turned the script on itself, and created a DjVu file of this text, available here.
With this, I may very add the capacity to export a .djvu file to tcltext. Why not? It’s just a shame, imho, that such an export is not direct, without having the cross into proprietary territory via .pdf, in order to be accomplished.

Also, as a gift to my fellow freedom fighters, foss hackers, and open standards supports, I have created a DjVu of my poetry here which contains all the poems published in my recent book  (but not the paintings and photographs).

And, this full article in djvu format here.  This last was fun, because I ended up having to change the text encoding first.  Apparently enscript doesn’t like utf8. I had copy/pasted the article into tcltext, which generates utf8 here (system default).  I made a .dvju that had all these weird character substitutions (like /200a#blahblah for a quotation mark?).  This is why I updated the script with the enscript text encoding conversion feature.

Now, if you use firefox or some other mozilla derivative, there’s actually a plugin for view such files in your browser, included in the djvulibre packages..  Otherwise, you’ll need a djvu viewer, such as djview or evince.

Anyway,
Enjoy.

./tony


Este artículo en español: http://www.gnewbook.org/pg/blog/tonybaldwin/read/83736/djvu-excelente-substituto-al-formato-nolibre-de-pdf-pero
Esse artigo em português: http://softwarelivre.org/tonybaldwin/blog/djvu-otimo-substituto-ao-formato-nao-livre-de-pdf-mas…

Advertisements

Written by tonybaldwin

January 19, 2011 at 3:43 pm

14 Responses

Subscribe to comments with RSS.

  1. Can DjVu handle pictures and html-style links as pdf can?

    Excellent work getting a script written so fast! We were just grumbling about not having one and as if by magic one arrived in mid-grumble 🙂
    Regards from
    Tom 🙂

    Tom

    January 19, 2011 at 5:53 pm

    • Yes, .djvu can preserve links from a .pdf file. From the pdf2djvu man page:

      –hyperlinks=options
      Specifies hyperlink display options. options must be a comma-separated list of:

      border-avis
      Make hyperlinks´ borders always visible. (Otherwise, the border will be visible only when the mouse is over the hyperlink.)

      #RRGGBB
      Set hyperlinks´ borders color.

      –no-hyperlinks
      Don´t extract hyperlinks.

      By default, it should preserve links, but if you don't give it other parameters for highlighting them, they just look like the remainder of the text.
      You can also copy text from a .djvu (when generated from a text/pdf, as opposed to images. Djvu doesn't have some kind of built in OCR or anything).

      tonybaldwin

      January 20, 2011 at 1:53 pm

    • The thing is, we need a script or utility to convert odt to djvu, imho, but I do not have the skill to write something of that complexity, dealing with all that xml stuff in an odt file.

      tonybaldwin

      January 20, 2011 at 8:49 pm

  2. Even EMACS will write a pdf file, but not a djvu file….Hmmmmm….
    Seem incongruous, what with EMACS having been originally the brainchild of the founder of the Free Software movement…
    I'm going to go bother him about that.

    tonybaldwin

    January 20, 2011 at 8:50 pm

  3. tonybaldwin

    January 23, 2011 at 10:21 am

  4. The Gnu Public License, version 3, in DjVu: http://www.baldwinsoftware.com/downloads/gpl-3.0….
    converted with the above script.

    tonybaldwin

    February 7, 2011 at 9:55 pm

  5. Tony, thanks for the post. I have also done some search about the subject. It seems that PDF is an open format now (ISO 32000) but DjVu has still some advantages. I agree – the support for writing a LibreOffice document in DjVu format directly from the suite, as it is the case with the PDF format, is very important and would be gladly welcome by some, including you, including me :-).

    Pijar

    February 10, 2011 at 4:06 am

    • Okay, I understand that PDF is an open standard, but that doesn't make it Free (as in speech), no?
      To my knowledge, Adobe holds patents on PDF.

      If it IS, indeed, free, then simply plugging in a script similar to mine, to export to pdf, then pdftodjvu, with a menu item or button in any editor is likely simple enough for those who want a djvu file (or running it from the cli).
      But, if I'm correct, and it's NOT free, then we have some work to do.
      I shouldn't think it wouldn't be too hard to convert directly from .ps or .dvi to .djvu, but, my fu is not sufficient to divine such a solution, yet.

      tonybaldwin

      February 12, 2011 at 10:00 am

  6. If you wish to avoid PDF, you can always first print to postscript (.ps)
    Then use pstopnm (from netpbm) to convert to pnm, e.g.
    B/W "pstopnm -dpi=300 -pbm -stdout file.ps >file.pbm"
    GRAY "pstopnm -dpi=300 -pgm -stdout file.ps >file.pgm"
    COLOR "pstopnm -dpi=300 -pnm -stdout file.ps >file.pnm"

    Then convert to djvu with either cjb2 for B/W or c44 for grayscale and color images.
    B/W "cjb2 -lossless file.pbm file.djv"
    GRAY & COLOR "c44 -dpi=300 file.pnm file.djv"

    And you can use djvused for editing the djvu files.

    I normally use latex for creating pdf via pdflatex
    but if I need various restricted javascript functions or Extended Reader Rights,
    I use the dvips engine first, then use Adobe Acrobat to distill the ps file to pdf.

    The pdf can then be converted to djvu via "djvu -o file.djv file.pdf"
    which preserves hyperlinks and hyperreferences (but not javascript functionality)

    As an example of the space savings possible, I scanned a letterpage of B/W text to pbm
    Then converted to djv "cjb2 -lossless letter.pbm letter.djv"
    Then converted to pdf "ddjvu -format=pdf letter.djv letter.pdf"
    The size of the original pbm was 1.1M, the djv was 40K , and the pdf 72K

    Jeremy

    March 2, 2011 at 4:51 pm

    • Wow, thank you very much for this very comprehensive comment.

      tonybaldwin

      March 28, 2011 at 8:14 pm

  7. Jeremy’s comment is a good example why we need to create more djvu tools — to enable commenting in a pdf, for instance, one needs (non-free) Acrobat. djvu seems like an excellent alternative, just needing more development.

    peter

    March 30, 2011 at 10:15 am

  8. Please do not use DJVU. The company/creator has abandoned it and as you've found it has horrible support. What will happen is you will convert you docs from PDFs to DJVU and this will force everyone that uses your doc to have to convert it back to a PDF so they can actually use it.

    Mike Bethany

    May 12, 2011 at 8:53 am

    • I have not found it to have "horrible support", and, the project is alive and well.
      I don't know where you get your information, but it is incorrect, sir.

      tonybaldwin

      May 17, 2011 at 8:12 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: