How To Create Your Own Personal Document Viewer (Like Scribd or Google Books)
Like most people, I have a large number of personal documents in a variety of formats (PDF, Excel, Word, RTF, PowerPoint, etc.). For the typical user, organizing these documents in a ‘My Documents’ folder and having MS Office/Open Office/Adobe Acrobat installed simply gets the job done. However, I’ve been looking for some sort of “Web 2.0″ solution to view my documents while I’m on the go. And, since my knowledge manager is web-based, I’d like a way to browse and embed personal documents directly in my wiki without needing any special software.
I’ve been impressed with services like Scribd (think YouTube for Documents). Most people have probably already used Scribd, but in case you haven’t, this service allows you to upload your documents (variety of formats supported) and view them online in Flash format. The beauty of this service is that you can also share documents and embed directly inside you website/blog/wiki. While this works great for sharing certain types of documents, it’s not really appropriate for uploading my entire collection of documents (especially since many contain personal information). So, I decided to figure out how to create my own hosted document/book viewer like Scribd or Google Books.
The following embedded document browser was actually fairly straight forward to make. In a nutshell, the viewer takes a PDF file that is converted to Flash (using SWFTools – pdf2swf), and then uses an open source flash viewer called FlexPaper to help with navigation.
The navigation bar is fairly straight forward. You can page up/down, go directly to a given page, zoom, print, and even select a thumbnail mode. It does currently lack the ability to view full screen,
search (Search was JUST added to version 1.1) or select text, so I create additional option to view in HTML (using wvHtml) and view the frame full screen.
Open Source To The Rescue
When I first start exploring ways to view all my docs in a web interface, I didn’t initially focus on flash. I figured it would be too difficult to have the end product look like Scribd (I was way wrong). So, I evaluated a number of Linux command-line utilities to convert documents on the fly. The following is a decent list of applications that can help with any of your conversion needs:
- wvWare – A library for converting Word docs. The utility I used most was wvHtml to convert from .doc directly to .html.
- xlHtml – Converts Excel spreadsheets to HTML.
- PDFtoHtml – Converts PDF documents to HTML
- UnRTF – Converts RTF to text or HTML
- SWFTools – A collection of utilities to generate and work with SWF (Flash) files
There are apparently some ways to convert between various formats using Open Office on the command-line (e.g. JODConvert, PyODConverter, Unoconv, etc.). However, I haven’t yet spent time evaluating these approaches since my current setup seems to be working pretty well.
I put up a very preliminary Document Browser prototype at http://eric-blue.com/projects/docbrowser/. The interface uses JQuery and JQueryFileTree to make entire document folder available for browsing just like Windows Explorer.
The doc viewer pane uses the Flash-based interface like the iFrame above for all .PDF docs. And, the conversion script will render the output in HTML according to the doc type (.doc, .xls, .rtf) using the tools listed above. I’ve even added support for Mind Manager mindmaps using my web-based mindmap viewer to do conversions into Freemind flash on the fly.
Overall, I’m happy with the end result. I’ve setup a customized version of the document browser to run on my personal web server at home. I can now successfully view my documents from my Laptop while I’m on the road, and I’ve been able to embed documents directly in my wiki so I don’t have to spend time hunting for the right doc.
Other Interesting Links
- Open source flash viewers -
- PSView (Online viewer for PDF, Postscript, Word) – http://view.samurajdata.se/
- Vuzit (Online document viewer) and API – http://vuzit.com/
Update: Sample code has been posted here http://eric-blue.com/2010/02/12/example-document-browser-code/