IPad Tip: How To Convert HTML to PDFs
Since I bought my iPad a few weeks ago, you can’t seem to pull me away from it. I have a number of computers, and while I still use my Netbook quite a bit, I’ve found that the iPad really is a fantastic consumption device. It’s great for downloading books from Amazon (via Kindle), reading the latest news sites, and viewing my book collection (mainly PDFs). I had originally hoped that I could leverage my Document Browser project I created a few months ago. However, the flash-based PDF viewer is obviously a deal-breaker on the iPad. I do feel slightly better since I’ve found a comparable solution.
Of all the apps I use, I have to say that GoodReader is at the top of my list! GoodReader is a really comprehensive PDF viewer with a large number of features. Although I’ve been a fan of AirSharing, GoodReader offers it’s main feature of sharing files using Wi-Fi, and some others not present in other apps: excellent support for large PDF files (>75 MB), the ability to connect over the network to view documents (WebDAV, FTP, Google Mail, etc.). At this point I’ve pretty much copied the majority of my My Documents folder. And, to access the rest of my docs I simply need to VPN into my home server, connect over WebDAV and quickly download.
Now, GoodReader can support a number of file formats including .txt, .html, and various video and audio formats. But, I’ve been trying to standardize my doc collection into a common format: PDF. I have a handful of websites that I’ve been wanting to convert to PDF format for offline viewing and have found the PERFECT open source command-line solution: wkhtmltopdf to the rescue. What makes wkhtmltopdf unique is that it uses Webkit (WebKit is the engine of Apples Safari, which is a fork of the KDE KHtml) to render pages and works on Windows, Mac, and Linux. Since it’s based on Webkit this means that the resulting PDF looks pretty much like it would in Safari and the format remains intact (including HTML links which is a big plus).
To test this out I thought it would be good to take the Squashed Philosophers site with me. For anybody with an interest in philosophy, Squashed Philosophers from Glyn Hughes is really an excellent cliff notes-like site that takes that ideas and works of the greatest philosophers of history and compresses into a summary that can be ready in ~30 minutes (condensed but very high quality). I figured this would be a great thing to take on the go. I wrote a quick little Perl script that grabbed all of the sub-pages from the main page, and executed a simple command for each URL. Example:
# wkhtmltopdf http://www.btinternet.com/~glynhughes/squashed/ancientgreeks.htm ancientgreeks.pdf
Loading pages (1/5)
Resolving links (2/5)
Counting pages (3/5)
Printing pages (5/5)
The resulting PDF looks really good (embedded in my Document Browser for preview).