How To Create Your Own Personal Document Viewer (Like Scribd or Google Books)
Like most people, I have a large number of personal documents in a variety of formats (PDF, Excel, Word, RTF, PowerPoint, etc.). For the typical user, organizing these documents in a ‘My Documents’ folder and having MS Office/Open Office/Adobe Acrobat installed simply gets the job done. However, I’ve been looking for some sort of “Web 2.0” solution to view my documents while I’m on the go. And, since my knowledge manager is web-based, I’d like a way to browse and embed personal documents directly in my wiki without needing any special software.
I’ve been impressed with services like Scribd (think YouTube for Documents). Most people have probably already used Scribd, but in case you haven’t, this service allows you to upload your documents (variety of formats supported) and view them online in Flash format. The beauty of this service is that you can also share documents and embed directly inside you website/blog/wiki. While this works great for sharing certain types of documents, it’s not really appropriate for uploading my entire collection of documents (especially since many contain personal information). So, I decided to figure out how to create my own hosted document/book viewer like Scribd or Google Books.
The following embedded document browser was actually fairly straight forward to make. In a nutshell, the viewer takes a PDF file that is converted to Flash (using SWFTools – pdf2swf), and then uses an open source flash viewer called FlexPaper to help with navigation.
The navigation bar is fairly straight forward. You can page up/down, go directly to a given page, zoom, print, and even select a thumbnail mode. It does currently lack the ability to view full screen,
Open Source To The Rescue
When I first start exploring ways to view all my docs in a web interface, I didn’t initially focus on flash. I figured it would be too difficult to have the end product look like Scribd (I was way wrong). So, I evaluated a number of Linux command-line utilities to convert documents on the fly. The following is a decent list of applications that can help with any of your conversion needs:
- wvWare – A library for converting Word docs. The utility I used most was wvHtml to convert from .doc directly to .html.
- xlHtml – Converts Excel spreadsheets to HTML.
- PDFtoHtml – Converts PDF documents to HTML
- UnRTF – Converts RTF to text or HTML
- SWFTools – A collection of utilities to generate and work with SWF (Flash) files
There are apparently some ways to convert between various formats using Open Office on the command-line (e.g. JODConvert, PyODConverter, Unoconv, etc.). However, I haven’t yet spent time evaluating these approaches since my current setup seems to be working pretty well.
I put up a very preliminary Document Browser prototype at http://eric-blue.com/projects/docbrowser/. The interface uses JQuery and JQueryFileTree to make entire document folder available for browsing just like Windows Explorer.
The doc viewer pane uses the Flash-based interface like the iFrame above for all .PDF docs. And, the conversion script will render the output in HTML according to the doc type (.doc, .xls, .rtf) using the tools listed above. I’ve even added support for Mind Manager mindmaps using my web-based mindmap viewer to do conversions into Freemind flash on the fly.
Overall, I’m happy with the end result. I’ve setup a customized version of the document browser to run on my personal web server at home. I can now successfully view my documents from my Laptop while I’m on the road, and I’ve been able to embed documents directly in my wiki so I don’t have to spend time hunting for the right doc.
Other Interesting Links
- Open source flash viewers –
- PSView (Online viewer for PDF, Postscript, Word) – http://view.samurajdata.se/
- Vuzit (Online document viewer) and API – http://vuzit.com/
FlexPaper and SWF Viewer/zViewer
Update: Sample code has been posted here http://eric-blue.com/2010/02/12/example-document-browser-code/
Good seeing FlexPaper coming to use. I actually just released version 1.1 which _does_ contain a search function! 🙂
Erik, thanks and perfect timing! Nothing like getting in touch in real-time AND having a new version of FlexPaper with just the feature I need. 🙂
I also added support for directly embedded Google Books. I added a custom handler for .url extensions that will embed the book url (e.g. http://books.google.com/books?id=88U6hdUi6D0C) inside an iframe.
I also wrote a quick little script that parses your Google Books library XML file, grabs the title and URL and outputs to the URL file. So now the document browser can combine you physical and ‘virtual’ docs hosted by Google. Pretty cool.
This is an excellent contribution, and thank you for mentioning Vuzit (my company) in your links. I remember years ago doing surveys like this, and would have loved to stumble on something like this at that time. It’s amazing how far different combinations of the open source tools can go these days!
I’ll just mention a couple of things about Vuzit that might be interesting to others. Vuzit is a development platform we sell to other online businesses that provides document sharing, control, and analytics features that are presently unavailable in any other products. As such, it’s not free! We do open source and host all of our client libraries on github (PHP, Ruby, .NET, Java, etc) however. Another major difference is that it’s AJAX and web service based (ie. no Flash/SWF) so you can build your own front-end if you wish.
Thanks! -Chris (Vuzit co-founder and CTO)
Glad to see xlhtml being put to use. I put quite a bit of effort into the program. 🙂
[…] Blog Post: How To Create Your Own Personal Document Viewer (Like Scribd or Google Books) … http://eric-blue.com/?p=977 […]
I discovered recently that any PDF can be viewed via Google Docs by prepending the PDF’s URL with
Hey eric, that’s really a great contribution.
Now a days I use the viewer made by macromedia i.e flashpaper. I need some help, I just wanted to show the pages in two-up view, like the Adobe reader i.e we can see two pages at a time. See if you can help me with it or just give me any other option which would include the two-up page view and exclude download or save or print.
Hoping for a soon reply.
Thanks and Regards,
Anyone know why when I set PrintEnabled : false it doesn’t work?
The document still has the print button on the toolbar. A way to remove that would be good.
Anyone know anything about watermarks too?
[…] RSS feed for updates on this topic.Powered by WP Greet BoxSince I posted my article last month on How To Create Your Own Personal Document Viewer, I’ve had a few inquiries on how people could have a similar setup themselves. I thought it […]
Your friend can tell me how to run the pdftohtml.exe did in php. thanks
Nice job Eric! I think the next challenge is to make read/write access on the way…
I also want to create this type of application, I convert my document by pdf2swf tool, but unable to show it in a popup on my html. Can you suggest me the way. It will be great to hear from you
Have you posted your work online to download (the source). I’d love to try this on my personal server as well.
I created a follow-up post with a link to the download. Bear in mind it’s nothing fancy (e.g. some weekend hacking) but gets the job done.
It is an excellent Article. However, I woud like to use it on my local webserver(http://localhost/myLocalweb/). How can I change the path to Docs folder to get the tree view?
I recently discovered a tool to convert .chm files to .pdf. http://code.google.com/p/chm2pdf/
Thank you for your post it is a nice blog. You have describe the large number of document formats and applications are available in my documents. You can also distribute your thoughts and ideas on Document Viewer
I was wondering if you could let us know how storing the document in your wiki works?
Do you have to duplicate things? A copy in the wiki, and a copy in a file system so that the document browser displays how it does?
I’m looking for a way to store, tag & retrieve/view .pdf/.docx files.
I’ve been looking into docmgr.org but I’m not sure how I could integrate it with MediaWiki 🙁
[…] The following embedded document browser was actually fairly straight forward to make. In a nutshell, the viewer takes a PDF file that is converted to Flash (using SWFTools – pdf2swf), and then uses an open source flash viewer called FlexPaper to help with navigation. LINK […]
Pretty nice post. I just stumbled upon your blog and wanted
to say that I have truly enjoyed surfing around your blog posts.
After all I’ll be subscribing to your feed and I hope you write again very soon!
I want to integrate web document viewer. This viewer allows to view file like pdf, ppt,pptx,doc,docx,xlsx,xls,pdf,swf,flv. Is flexpaper viewer will display most of the format? Please suggest..
Your response highly appreciated
I am new to programming. I want to add an online document viewer to my website.But, I need to customize to meet some requirements.Can I customize this ? If not any suggestions about other software available?
It was nice work dude, good job..