Victus Spiritus

home

HTML to ePub and back, partial success

21 May 2011

What I'd like to accomplish with epublishing software:

  1. Convert an intriguing site (Python for Fun) from html to epub to enable easy mobile or tablet reading while offline. Automate that process for future sites which I enjoy and which have far more content than I can consume in one sitting.

  2. Convert an existing PDF (Children of the Ark) document into a living editable wiki/web site. There are plenty of static PDFs which beg to be opened for editing by the public. Our game beta is just one of them.

  3. Convert select topics and posts from this blog into a transportable epub friendly format. I often refer to earlier blog posts. It's fun to review what and how I thought before, and how my understanding has changed with time. Some readers prefer focused batches of content in pamphlet and book forms. I should be able to easily deliver that by organizing a few dozen posts into bundled ebook content.

This morning's post discusses a first crack at the former. I'm hung up at a reasonable quality PDF to ePub stage. Another path of going directly from html to ePub may prove more fruitful.

Scanning the web for tools

After reading a number of questionable looking sites providing enormous amounts of information about how to use their $40-50 dollar tool to convert from one format to another I happened upon a google code tool called wkhtmltopdf. This provided the foundation for the conversion from html to pdf, one step in converting to a bundled format. I documented the transformation of Python for Fun into PDF form on github, and it was straightforward.

What is wkhtmltopdf?

Description

Simple shell utility to convert html to pdf using the webkit rendering engine, and qt.

Introduction

Searching the web, I have found several command line tools that allow you to convert a HTML-document to a PDF-document, however they all seem to use their own, and rather incomplete rendering engine, resulting in poor quality. Recently QT 4.4 was released with a WebKit widget (WebKit is the engine of Apples Safari, which is a fork of the KDE KHtml), and making a good tool became very easy. (from wkhtmltopdf site)

Here's how I proceeded so far:

Converted local html to pdf with wkhtmltopdf

Script:

/Applications/wkhtmltopdf cover index.html collection.html toc lode/lode.html buckets/buckets.html tower/tower.html animal/animal.html gui/tkPhone.html gui/sqlPhone.html gui/wxPhone.html erlang/erlang.html erlang/erlang2.html forth/forth.html lisp/lisp.html prolog/intro.html prolog/prolog1.html prolog/prolog2.html prolog/prolog3.html huffman/huffman.html rtn/rtn.html sir/sir.html unicode/unicode.html logic/logic.html logic2/logic2.html mm/simulator.html mm/assembler.html mm/compiler.html sql/sql.html wave/wave.html py4fun.pdf

output pdf:

py4fun.pdf

  1. First attempt at pdf to epub:

    uploaded and converted pdf to epub with 2epub.com

    1. Browse to pdf file
    2. upload file
    3. download epub

    First epub:
    orig_py4fun.epub (360kbytes)

    But the links weren't working in that file on my iOS devices...

  2. Second try:

    I used the Mac Store PDF Converter (by Shenzhen Wondershare Software Co. Ltd) to create py4fun.epub (11mbytes). The quality of the output epub was much lower than the free version.
    output file: deleted.

  3. Third try:

    I snagged Lexcycle Stanza for the desktop. It imported the PDF kinda funky and requested access to a PDF viewer. Begrudgingly, I reinstalled Adobe's PDF reader again (~420mbytes) and it opened the file fine. But this time the option to export to epub wasn't available. Bollocks!

Unfortunately the process of transforming the PDF to ePub file proved even more frustrating. The first site I tried broke all the internal hyperlinks (2epub.com), but at least created a mostly readable epub file (some code sample formatting was lost).

The second tool I tried was a $60 dollar download from the Mac App Store, so I expected a high quality product. Instead what I purchased was a steaming pile of crap. It generated a huge output file (~30 times the size of the free epub file) full of broken hyper links and random blank pages. Definitely do not buy that terrible software: PDF Converter (by Shenzhen Wondershare Software Co. Ltd).