<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Eric Blue's Blog &#187; Knowledge Management</title>
	<atom:link href="http://eric-blue.com/category/knowledge-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://eric-blue.com</link>
	<description>Technology, Philosophy, and Personal Development</description>
	<lastBuildDate>Mon, 21 May 2012 02:29:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Firefox Scrapbook Hacks &#8211; Viewing and Saving Webpages from Anywhere!</title>
		<link>http://eric-blue.com/2011/04/03/firefox-scrapbook-hacks-viewing-and-saving-webpages-from-anywhere/</link>
		<comments>http://eric-blue.com/2011/04/03/firefox-scrapbook-hacks-viewing-and-saving-webpages-from-anywhere/#comments</comments>
		<pubDate>Sun, 03 Apr 2011 20:12:00 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Headline]]></category>
		<category><![CDATA[IPhone]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Knowledge]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Memex]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=1444</guid>
		<description><![CDATA[This weekend I decided to wrap up a couple cool knowledge management &#8220;hacks&#8221; and share some code on GitHub.  I primarily use the Firefox Scrapbook plugin to save all web pages of interest and use ...]]></description>
			<content:encoded><![CDATA[<p>This weekend I decided to wrap up a couple cool knowledge management &#8220;hacks&#8221; and share some code on GitHub.  I primarily use the <a href="http://amb.vis.ne.jp/mozilla/scrapbook/">Firefox Scrapbook plugin </a>to save all web pages of interest and use it as a general &#8220;digital snippet&#8221; repository. Since I started using Scrapbook in 2006 there have been a number of online services that have come along to offer this functionality (namely Evernote, Zotero, and countless others).  Some of these services make it very easy to universally access and save webpages between multiple devices.  As part of my usual DIY philosophy, I&#8217;ve made an effort to stick with Scrapbook and build the missing features myself.  This is in large part due to data ownership (it&#8217;s my data and I don&#8217;t want to be tied to a single service/company), plus it&#8217;s fun to tinker and make these useful &#8220;hacks&#8221;.</p>
<p>In Dec &#8216;09 I shared a blog post about how to <a href="http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/">synchronize the scrapbook data between multiple computers</a>. This was the first major step to sharing data between multiple devices, but still lacked some of the ubiquity that I desired.  In a nutshell I&#8217;ve made 2 <span style="text-decoration: underline;">major</span> enhancements to Scrapbook:</p>
<ol>
<li>An email &#8216;bridge&#8217; to Scrapbook so I can email links from any device (PC, iPhone, iPad) and have them saved by Scrapbook</li>
<li>A centralized web-interface to browse/search/filter my scrapbook data.</li>
</ol>
<p>I&#8217;ll start off with the less visually-stunning hack (email bridge), but by far the craftier of two.</p>
<p><strong>Hack #1 &#8211; Scrapbook Email Interface</strong></p>
<p>Whenever I began synchronizing my Scrapbook data between the 2-3 computers this solved a huge problem with being able to save webpages from anywhere.  Since 2009 a lot has changed, and devices like iPhone and iPad (yes, Apple fan boy to a degree) have changed the way we consume news.  Recently I&#8217;ve been using apps on the iPad like Zite and Flipboard to consolidate my Twitter, Facebook, and Googler Reader feeds into a single personalized newspaper.  This means that now &gt; 50% of my reading time is spent from a device that has no visibility into my Scrapbook data.  I simply wanted a way to automatically email a link (built nativily into these apps) and have it automagically saved into my Scrapbook folder.  I could have simply cut corners and wrote a script to hand-edit the Scrapbook RDF Files and save the web page using something like wget or curl.  But, it just wouldn&#8217;t be the same&#8230;. I want the webpage saved EXACTLY as Firefox would normally render and save it.</p>
<p>This poses a bit of a technical challenge, since Scrapbook runs inside Firefox and there&#8217;s no native way to interface with a plugin running inside a browser.  After researching a number of approaches, I came across 2 Firefox plugins that let you build interfaces inside firefox (http, telent, etc.) that actually let you control the browser and execute Javascript.  Of the 2 plugins; <a href="https://addons.mozilla.org/en-us/firefox/addon/pow-plain-old-webserver/">POW </a>and <a href="https://addons.mozilla.org/en-US/firefox/addon/mozrepl/">MozRepl</a>, I decided to go with POW (Plain Old Webserver).  Both plugins are wicked cool in the sense that they&#8217;re non-traditional and very powerful.  POW runs a webserver inside firefox and let&#8217;s you run your &#8217;server-side&#8217; scripts as Javascript.  I&#8217;ve basically written a server process that runs INSIDE the client and executes XPCOM/Javascript to control the web browser windows and invokes the Scrapbook plugin API directly.</p>
<p>The setup process is simple:</p>
<ol>
<li>Setup and install the POW and Scrapbook plugins in your browser</li>
<li>Configure POW to run a desired port and create a new directory /scrapbook/</li>
<li>Copy the index.sjs (server-side javascript) to this new /scrapbook/ directory</li>
<li>Setup a new email box or alias (e.g. yourusername+scrapbook@gmail.com)</li>
<li>Either run scrapbook2email.pl manually or run as a CRON job every couple minutes</li>
<li>Simply send emails to your new Scrapbook email, run the email script, and watch your pages be saved automatically</li>
</ol>
<p>At a high-level this is accomplished with 2 scripts:</p>
<p><strong>Email Interface script (Perl)</strong></p>
<p>This script uses IMAP to retrieve scrapbook email requests from a designated folder. Along with doing basic sender/recipient validation, the script is also aware of plain text/multipart messages.  Once the email request is parsed, the link of the requested web page to be saved will be extracted.  Given the request URL the script will then contact the POW server and pass the requested URL (e.g. http://127.0.0.1:6670/scrapbook/?url=http://yourwebpagetobesaved.com/?articleID=3q4e3332).  Note that this version of the script requires that Firefox/POW be running and makes no attempt to launch for you.</p>
<p>For a copy of the script click <a href="https://github.com/ericblue/Scrapbook-Email-Interface/blob/master/email2scapbook.pl" target="_blank">here</a> (GitHub).</p>
<p><strong>Scrapbook/POW Bridge (Server-Side Javascript)</strong></p>
<p>This script does the heavy lifting, and is essentially running at the other end of the POW server URL (http://127.0.0.1:6670/scrapbook/). Once the requested URL is detected the browser will spawn a new tab, automatically execute the Scrapbook Capture request, and save the webpage to a new top-level folder (e.g. Unfiled/MM-DD-YYYY). This script was tested with Scrapbook v.1.3.7.</p>
<p>For a copy of the script click <a href="https://github.com/ericblue/Scrapbook-Email-Interface/blob/master/index.sjs" target="_blank">here</a> (GitHub).</p>
<p>It&#8217;s nifty now to email a link to my Scrapbook Bot and wihin a couple minutes a little notify popup shows in Firefox indicating my page was saved.</p>
<p><strong>Hack #2 &#8211; Scrapbook Browser</strong></p>
<p>This code was actually written back in Dec &#8216;09 after I wrote the synchronize blog post (and around the time I wrote the Document Viewer), however I haven&#8217;t shared until now.  What I&#8217;ve done is write a simple Perl/JQuery web app that used Simile&#8217;s Exhibit to view Scrapbook data in a tile, table, or timeline.  This interface also has a file/folder view so you can browse snippets just like you can through the native Scrapbook plugin interface within Firefox.</p>
<p>Here are some screenshots:</p>
<p><strong>Tile View</strong></p>
<p style="text-align: center;"><a title="scrapbook-tile by ericblue76, on Flickr" href="http://www.flickr.com/photos/56683314@N00/5585601153/"><img class="aligncenter" src="http://farm6.static.flickr.com/5190/5585601153_605e15c3fd.jpg" alt="scrapbook-tile" width="500" height="310" /></a></p>
<p><strong>Timeline View</strong></p>
<p style="text-align: center;"><a title="scrapbook-timeline by ericblue76, on Flickr" href="http://www.flickr.com/photos/56683314@N00/5585600779/"><img class="aligncenter" src="http://farm6.static.flickr.com/5067/5585600779_e8533b8361.jpg" alt="scrapbook-timeline" width="500" height="315" /></a></p>
<p><strong>Table View</strong></p>
<p style="text-align: center;"><a title="scrapbook-table by ericblue76, on Flickr" href="http://www.flickr.com/photos/56683314@N00/5585601333/"><img class="aligncenter" src="http://farm6.static.flickr.com/5228/5585601333_7d561d97d9.jpg" alt="scrapbook-table" width="500" height="310" /></a></p>
<p><strong>Folder View</strong></p>
<p style="text-align: center;"><a title="scrapbook-folder by ericblue76, on Flickr" href="http://www.flickr.com/photos/56683314@N00/5585601017/"><img class="aligncenter" src="http://farm6.static.flickr.com/5308/5585601017_c970c25570.jpg" alt="scrapbook-folder" width="500" height="289" /></a></p>
<p style="text-align: left;">To download the code click <a href="https://github.com/ericblue/Scrapbook-Browser" target="_blank">here</a> (GitHub).</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Firefox+Scrapbook+Hacks+%E2%80%93+Viewing+and+Saving+Webpages+from+Anywhere%21+http://eric-blue.com/?p=1444+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2011/04/03/firefox-scrapbook-hacks-viewing-and-saving-webpages-from-anywhere/&amp;title=Firefox+Scrapbook+Hacks+%E2%80%93+Viewing+and+Saving+Webpages+from+Anywhere%21" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2011/04/03/firefox-scrapbook-hacks-viewing-and-saving-webpages-from-anywhere/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Learning Faster &#8211; Automatically Extract Highlighted Text from PDF Documents</title>
		<link>http://eric-blue.com/2010/12/17/learning-faster-automatically-extract-highlighted-text-from-pdf-documents/</link>
		<comments>http://eric-blue.com/2010/12/17/learning-faster-automatically-extract-highlighted-text-from-pdf-documents/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 07:46:07 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=1305</guid>
		<description><![CDATA[Overview
 I never really considered myself a “highlighter” until a couple years ago.  Back in school I would, on occasion, highlight some interesting passages while doing homework or reading books and jot them down later. ...]]></description>
			<content:encoded><![CDATA[<p><strong>Overview</strong></p>
<p><a href="http://eric-blue.com/wp-content/uploads/2010/12/highlight.jpg"><img class="size-medium wp-image-1308 alignright" title="Image courtesy http://www.flickr.com/photos/liveandrock/" src="http://eric-blue.com/wp-content/uploads/2010/12/highlight-300x225.jpg" alt="Image courtesy http://www.flickr.com/photos/liveandrock/" width="300" height="225" /></a> I never really considered myself a “highlighter” until a couple years ago.  Back in school I would, on occasion, highlight some interesting passages while doing homework or reading books and jot them down later.  More often then not though many of those highlights would go to waste.  After all, what good are highlighting interesting bits of text if you don’t use them later?  My highlight compulsion increased about 6 years ago when I dove head first into mindmapping and starting experimenting with a technique called MMOST (Mind Map Organic Study Technique).  In a nutshell, MMOST is a strategy for quickly digesting books and summarizing what you’ve learned into a mindmap so you can recall or reference at a later date.  For a great intro to the MMOST technique, check out the post on <a href="http://kentblumberg.typepad.com/kent_blumberg/2006/12/how_to_study_a_.html">How to Understand a Business Book in Four Hours</a>.  What does highlighting have to do with MMOST?  While I’m reading a book I’ll highlight the passages that stick out to me and use those as the basis for creating the mindmap summary.  It can take a lot of time, but the process of highlighting, reviewing, and creating the mindmap can significantly improve your recall and what you get out of a book (or any research project).</p>
<p>Another big change happened earlier this year when I started using an iPad.  I’ve been gradually accumulating more digital books (using PDFs and purchasing books through Amazon using Kindle).  After using Kindle for a short time I was blown away by the feature that let’s you highlight book passages and get summaries of the highlighted text and page number (The direct URL is <a href="http://kindle.amazon.com/your_highlights">http://kindle.amazon.com/your_highlights</a>.  This is REALLY useful for accelerating the summarizing process and the beauty of it is that it’s automatic &#8211; the extraction just works!  Around the time I started using Kindle for iPad I discovered a fantastic PDF Document reader called <a href="http://www.goodiware.com/goodreader.html">GoodReader</a>.</p>
<p><a href="http://eric-blue.com/wp-content/uploads/2010/12/goodreader.jpg"><img class="aligncenter size-full wp-image-1319" title="goodreader" src="http://eric-blue.com/wp-content/uploads/2010/12/goodreader.jpg" alt="" width="553" height="415" /></a></p>
<p>GoodReader is a full-featured document reader with some powerful features.  Not only can you take all of your documents on the go, you can access remotely using WebDAV, Google Docs, DropBox, Email, and other online services.  Starting a couple months ago it got even better by supporting PDF highlighting and annotations.  I thought to myself, “Hey, it would be great if I could somehow extract all my highlighted text just like Kindle.  I could TRIPLE the number of books I read and create summaries for almost all of them!”.  It turns out this IS possible, but it is no where near as simple as I initially hoped.  I dove down the deep rabit hole of reviewing the ~ 1,000 page Adobe PDF specification, hacked and tinkered with Perl and Java code, reviewed numerous open source and commercial offerings, and have emerged (slightly scathed but wiser) with some good solutions.</p>
<p><strong>The Challenge</strong></p>
<p>I won’t get into the nitty-gritty details here, but what would seem a simple operation of extracting highlighted text from a PDF turns out to be exceedingly difficult depending on what strategy you use.  In fact, as near as I can tell, there is no existing open source or commercial solution that can reliably extract the 100% text accurately from all documents.  The main challenge with PDF is that it isn’t a markup language like HTML that will explicitly tell you how text should be rendered.  For example:</p>
<blockquote>
<p style="text-align: left;">This is an &lt;b&gt;example&lt;/b&gt; &lt;highlight&gt;sentence that I would like to highlight&lt;/highlight&gt;.</p>
</blockquote>
<p>The PDF format, while parsable, uses concepts like dictionaries, objects, streams and coordinate systems that tell PDF readers how to correctly render the doc. What this means is that things like annotations (notes) and highlights are rendered separately from the text itself.  The best way to visualize this is to think of the highlighted PDF as having 2 distinct layers: the top layer is the highlight itself and the bottom layer is the text.  The straightforward strategy is to simply say: “Find the X,Y coordinates of the region of highlight, then find the X,Y coordinates of all text in that same region and simply copy it”.  Well, the unfortunate complexity is that in order to find the coordinates of the text you also have to take into consideration the font type and size of the font.  After many hours of hacking with only minimal success, I’ve concluded that this method is not currently possible without a lot of additional coding.  And, unless somebody can point me in the right direction, I haven’t found any open source or commercial offerings that do this.  OK, so you’re probably wondering why I’ve made you read this much of the post only to tell you it’s not technically possible.  It is possible, just using a slightly different method.</p>
<p><strong>The Solutions</strong></p>
<p>It turns out that you can automatically extract the highlight with 100% accuracy, but there is a caveat that requires a little more manual work.  It sounds much more painful than it really is.  The trick is to not only highlight the passage of text, but also copy the text and paste as an annotation (note) on top of the highlight.  For GoodReader it’s simply a matter of a couple extra clicks.  And for people who use Adobe Acrobat or Acrobat Reader, there is an option in most versions to automatically copy/paste text into a note whenever you select text to highlight (Go to Settings -&gt; Commenting Preferences -&gt; &#8220;Copy selected text into Highlight, Cross-Out, and Underline comment pop-ups.&#8221;).  Here’s how you accomplish using GoodReader as of v3.2.0:</p>
<ol>
<li>Select the text you would like to highlight and select Copy.  As soon as you click Copy, the menu option above the text will remain.</li>
<li>Next select the Highlight option.  At this point the text will now be highlighted.</li>
<li>Tap the highlighted text and select the Open option.  A note dialogue will appear.</li>
<li>Hold down for 2 sections on the note until the Paste option appears and select.  Click Save.</li>
</ol>
<p>Basically 6 quick clicks/taps and you’re done.  It’s not ideal, but certainly a good trade-off if it means you get to extract automatically and have 100% reliability.  Now, there are a couple options for easily extracting your highlights.</p>
<p><strong>Option 1 &#8211; Use a PDF Reader to create highlight summaries</strong></p>
<p>If you have the money, Adobe Acrobat has many features that let you view and print all of your annotations (notes, highlights, etc.).  Although not significantly cost prohibitive most people (myself included) don&#8217;t really want to spend money if you can find a comparable free or open source solution.  <a href="http://get.adobe.com/reader/">Adobe Acrobat Reader </a>(the free version most people use) does allow you to view the highlights in a summary pane, but doesn&#8217;t allow you to extract and print (You&#8217;ll notice that if you don&#8217;t create the annotated note with your highlight the entry will show blank.)  The best free PDF viewer that I experimented with is <a href="http://www.foxitsoftware.com/pdf/reader/">Foxit Reader </a>and it allows you to easily create a PDF summary of your highlights.  Simply go to Comments -&gt; Summary Comments and you&#8217;ll be prompted to save a new PDF file that only contains the highlighted text along with the page number.</p>
<p><a href="http://eric-blue.com/wp-content/uploads/2010/12/foxit.jpg"><img class="aligncenter size-large wp-image-1324" title="foxit" src="http://eric-blue.com/wp-content/uploads/2010/12/foxit-1024x669.jpg" alt="" width="535" height="348" /></a></p>
<p><strong>Option 2 &#8211; Programmatically extract highlights</strong></p>
<p>For those inclined to hack, there are a couple open source options for parsing PDF files.  I first started experimenting with a great Perl module called <a href="http://search.cpan.org/dist/CAM-PDF/">CAM::PDF</a>.  After a few weekends of tinkering around and subsequently needing to dig into the official Adobe PDF specificaiton I realized how complicated PDF parsing, rendering, and text extraction can be.  CAM::PDF does make it easy parse the overall structure of the document and extract text for an entire page, but it is very difficult to extract for exact coordinates (for a number of technical reasons).  At this point I was still trying to solve the problem with the original strategy of extracing text by x,y coordinates, and after researching for countless hours I realized my open source options were limited.  My next step was to experiment with <a href="http://pdfbox.apache.org/">PDFBox</a>, an Apache open source JAVA PDF library.  After some searching I was very excited to at least scratch the surface and get preliminary results of text extraction based on the highlight x,y coordinates.  I soon discovered that needing to take the font style, orientation, and spacing into consideration to grab the exact text would prove to be time consuming.  I haven&#8217;t yet found other examples, or reached out on the mailing list, but I&#8217;m sure with sufficient determination and time this could be done.  Not wanting to devote this amount of time right now to solve this problem, I opted to go for the pragmatic solution of saving the note and extracting that.  For those interested, I&#8217;ve attached some very simple test code that will extract the annotated comment and I&#8217;ve included commented out code for doing very basic (and not yet accurate) extraction based on region/coordinates.  When I have more time I may make this a standalone executable so you can run from the command-line and bulk extract highlights from multiple documents:</p>
<div id="wpshdo_1" class="wp-synhighlighter-outer"><div id="wpshdt_1" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_1"></a><a id="wpshat_1" class="wp-synhighlighter-title" href="#codesyntax_1"  onClick="javascript:wpsh_toggleBlock(1)" title="Click to show/hide code block">Code block</a></td><td align="right"><a href="#codesyntax_1" onClick="javascript:wpsh_code(1)" title="Show code only"><img border="0" style="border: 0 none" src="http://eric-blue.com/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_1" onClick="javascript:wpsh_print(1)" title="Print code"><img border="0" style="border: 0 none" src="http://eric-blue.com/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://eric-blue.com/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://eric-blue.com/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_1" class="wp-synhighlighter-inner" style="display: block;"><pre class="java" style="font-family:monospace;"><span class="kw1">import</span> <span class="co2">java.awt.geom.Rectangle2D</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">java.io.File</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">java.util.List</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">org.apache.pdfbox.pdmodel.PDDocument</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">org.apache.pdfbox.pdmodel.PDPage</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">org.apache.pdfbox.pdmodel.common.PDRectangle</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation</span><span class="sy0">;</span>
<span class="kw1">import</span> <span class="co2">org.apache.pdfbox.util.PDFTextStripperByArea</span><span class="sy0">;</span>
<span class="kw1">public</span> <span class="kw1">class</span> ExtractHighlights <span class="br0">&#123;</span>
<span class="kw1">public</span> <span class="kw1">static</span> <span class="kw4">void</span> main<span class="br0">&#40;</span><a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Astring+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">String</span></a> args<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
<span class="kw1">try</span> <span class="br0">&#123;</span>
PDDocument pddDocument <span class="sy0">=</span> PDDocument.<span class="me1">load</span><span class="br0">&#40;</span><span class="kw1">new</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Afile+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">File</span></a><span class="br0">&#40;</span><span class="st0">&quot;sample.pdf&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Alist+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">List</span></a> allPages <span class="sy0">=</span> pddDocument.<span class="me1">getDocumentCatalog</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">getAllPages</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="kw1">for</span> <span class="br0">&#40;</span><span class="kw4">int</span> i <span class="sy0">=</span> <span class="nu0">0</span><span class="sy0">;</span> i <span class="sy0">&lt;</span> allPages.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> i<span class="sy0">++</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
<span class="kw4">int</span> pageNum <span class="sy0">=</span> i <span class="sy0">+</span> <span class="nu0">1</span><span class="sy0">;</span>
PDPage page <span class="sy0">=</span> <span class="br0">&#40;</span>PDPage<span class="br0">&#41;</span> allPages.<span class="me1">get</span><span class="br0">&#40;</span>i<span class="br0">&#41;</span><span class="sy0">;</span>
List<span class="sy0">&lt;</span>PDAnnotation<span class="sy0">&gt;</span> la <span class="sy0">=</span> page.<span class="me1">getAnnotations</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="kw1">if</span> <span class="br0">&#40;</span>la.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">&lt;</span> 1<span class="br0">&#41;</span> <span class="br0">&#123;</span>
<span class="kw1">continue</span><span class="sy0">;</span>
<span class="br0">&#125;</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">System</span></a>.<span class="me1">out</span>.<span class="me1">println</span><span class="br0">&#40;</span><span class="st0">&quot;Total annotations = &quot;</span> <span class="sy0">+</span> la.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">System</span></a>.<span class="me1">out</span>.<span class="me1">println</span><span class="br0">&#40;</span><span class="st0">&quot;<span class="es0">\n</span>Process Page &quot;</span> <span class="sy0">+</span> pageNum <span class="sy0">+</span> <span class="st0">&quot;...&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="co1">// Just get the first annotation for testing</span>
PDAnnotation pdfAnnot <span class="sy0">=</span> la.<span class="me1">get</span><span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">;</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">System</span></a>.<span class="me1">out</span>.<span class="me1">println</span><span class="br0">&#40;</span><span class="st0">&quot;Annot type = &quot;</span> <span class="sy0">+</span> pdfAnnot.<span class="me1">getSubtype</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">System</span></a>.<span class="me1">out</span>.<span class="me1">println</span><span class="br0">&#40;</span><span class="st0">&quot;Modified date = &quot;</span> <span class="sy0">+</span> pdfAnnot.<span class="me1">getModifiedDate</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">System</span></a>.<span class="me1">out</span>.<span class="me1">println</span><span class="br0">&#40;</span><span class="st0">&quot;Rectangle = &quot;</span> <span class="sy0">+</span> pdfAnnot.<span class="me1">getRectangle</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="co1">// Sample code taken from Canoo unit test - extractAnnotations</span>
<span class="co1">// See https://svn.canoo.com/trunk/webtest/src/main/java/com/canoo/webtest/plugins/pdftest/htmlunit/pdfbox/PdfBoxPDFPage.java</span>
<span class="co1">// Experimental - Not completely working since rectangle doesn't take font size/spacing into account</span>
<span class="co1">// PDFTextStripperByArea stripper = new PDFTextStripperByArea();</span>
<span class="co1">// stripper.setSortByPosition(true);</span>
<span class="co1">//</span>
<span class="co1">// PDRectangle rect = pdfAnnot.getRectangle();</span>
<span class="co1">// float x = rect.getLowerLeftX() - 1;</span>
<span class="co1">// float y = rect.getUpperRightY() - 1;</span>
<span class="co1">// float width = rect.getWidth() + 2;</span>
<span class="co1">// float height = rect.getHeight() + rect.getHeight() / 4;</span>
<span class="co1">// int rotation = page.findRotation();</span>
<span class="co1">// if (rotation == 0) {</span>
<span class="co1">//     PDRectangle pageSize = page.findMediaBox();</span>
<span class="co1">//       y = pageSize.getHeight() - y;</span>
<span class="co1">//}</span>
<span class="co1">//</span>
<span class="co1">// Rectangle2D.Float awtRect = new Rectangle2D.Float(x, y, width, height);</span>
<span class="co1">// stripper.addRegion(Integer.toString(0), awtRect);</span>
<span class="co1">// stripper.extractRegions(page);</span>
<span class="co1">//</span>
<span class="co1">// System.out.println(&quot;Getting text from region = &quot; + awtRect + &quot;\n&quot;);</span>
<span class="co1">// System.out.println(stripper.getTextForRegion(Integer.toString(0)));</span>
<a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">System</span></a>.<span class="me1">out</span>.<span class="me1">println</span><span class="br0">&#40;</span><span class="st0">&quot;Getting text from comment = &quot;</span> <span class="sy0">+</span> pdfAnnot.<span class="me1">getContents</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="br0">&#125;</span>
pddDocument.<span class="me1">close</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="br0">&#125;</span> <span class="kw1">catch</span> <span class="br0">&#40;</span><a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Aexception+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span class="kw3">Exception</span></a> ex<span class="br0">&#41;</span> <span class="br0">&#123;</span>
ex.<span class="me1">printStackTrace</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="br0">&#125;</span>
<span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></div></div>
<p>Of all the APIs I reviewed PDFBox appears to be one of the best: enumerating through the annotations is easy, extracting the note is just as simple, and the basic API is there to extract highlights with no need for the note (just be prepared to dig in and do some work).  I also spent some time researching Adobe&#8217;s Javascript API and saw some forum posts where a person had mentioned they wrote a JavaScript plugin for Adobe Acrobat Reader that extracted the highlight without the need for the notes.  However, I could not find a working example.  With further research I&#8217;m sure this could be another option.</p>
<p>For the short-term, my practical solution is going to use Foxit Reader to create the highlight summaries.  Foxit works under Wine (linux) and I&#8217;ve been able to share my GoodReader docs over WiFi and mount that Goodreader share as a WebDav folder.  This means that once I&#8217;m done reading and highlighting a PDF I can easily open up in FoxitReader without needing to copy anything, generate the highlight summary, and save back to my Documents folder.  Longer-term I&#8217;ll probably elaborate on the PDFBox code and write a program to automatically extract the highlights and save as text, XML, or HTML.</p>
<p><strong>Other Links of Interest</strong></p>
<ul>
<li><a href="http://www.delicious.com/ericblue76/pdf">My PDF Bookmarks from Del.icio.us</a> (TONS of good links found during research)</li>
<li><a href="http://pypi.python.org/pypi/scrape-highlighted/0.1.0">Python &#8211; Scrape Highlighted</a> (Not portable, but uses a combo of Python, AppleScript and SkimPDF for Mac)</li>
<li><a href="http://www.unixuser.org/~euske/python/pdfminer/index.html">Python &#8211; PDF Miner</a></li>
<li><a href="http://www.windjack.com/product/pdfcanopener/">PDF Can Opener </a>(Inspects PDF docs)</li>
<li><a href="http://www.michaeltracylaw.com/attorney-tools.html">Acrobat Exhibit Highlighter</a> (Some highlight tools using Javascript to enhance Acrobat)</li>
<li><a href="http://www.topicscape.com/Topicgrazer/help.php">Topic Grazer</a> (Windows &#8211; helps with text extraction)</li>
</ul>
<p><strong><br />
</strong></p>
<p>Happy Highlighting!</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Learning+Faster+%E2%80%93+Automatically+Extract+Highlighted+Text+from+PDF+Documents+http://eric-blue.com/?p=1305+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2010/12/17/learning-faster-automatically-extract-highlighted-text-from-pdf-documents/&amp;title=Learning+Faster+%E2%80%93+Automatically+Extract+Highlighted+Text+from+PDF+Documents" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2010/12/17/learning-faster-automatically-extract-highlighted-text-from-pdf-documents/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Example Document Browser Code</title>
		<link>http://eric-blue.com/2010/02/12/example-document-browser-code/</link>
		<comments>http://eric-blue.com/2010/02/12/example-document-browser-code/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 18:51:56 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=1039</guid>
		<description><![CDATA[Since I posted my article last month on How To Create Your Own Personal Document Viewer, I&#8217;ve had a few inquiries on how people could have a similar setup themselves.  I thought it might be ...]]></description>
			<content:encoded><![CDATA[<p>Since I posted my article last month on <a href="http://eric-blue.com/2010/01/03/how-to-create-your-own-personal-document-viewer-like-scribd-or-google-books/">How To Create Your Own Personal Document Viewer</a>, I&#8217;ve had a few inquiries on how people could have a similar setup themselves.  I thought it might be helpful to .zip up the docbrowser project and show some of the code that does the conversions using the utilities I illustrated in the article.  Disclaimer: This code is by <span style="text-decoration: underline;">no means</span> my finest work (it was hacked together on a Sat. afternoon), but it gets the job done.  At a high-level the code is very simple:</p>
<ul>
<li>Determine the doc extension and perform the appropriate conversion (.doc.pdf.xls) or redirect using an external app (mindmapviewer or Google books)</li>
<li>Assign conversion commands to be executed for each doc type</li>
<li>Before doc display, lookup converted doc in cache to speed up render time (use MD5 hash on the title)</li>
</ul>
<p>I&#8217;ve created a .zip file(4.1MB) of the entire <a href="http://eric-blue.com/projects/docbrowser/doc_browser_1_0.zip">Doc Browser sample code</a>.  It contains the simple .CGI Conversion script, along with jQueryFileTree for rendering the doc tree, FlexPaper, and some sample documents.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Example+Document+Browser+Code+http://eric-blue.com/?p=1039+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2010/02/12/example-document-browser-code/&amp;title=Example+Document+Browser+Code" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2010/02/12/example-document-browser-code/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>How To Create Your Own Personal Document Viewer (Like Scribd or Google Books)</title>
		<link>http://eric-blue.com/2010/01/03/how-to-create-your-own-personal-document-viewer-like-scribd-or-google-books/</link>
		<comments>http://eric-blue.com/2010/01/03/how-to-create-your-own-personal-document-viewer-like-scribd-or-google-books/#comments</comments>
		<pubDate>Sun, 03 Jan 2010 07:35:21 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=977</guid>
		<description><![CDATA[Like most people, I have a large number of personal documents in a variety of formats (PDF, Excel, Word, RTF, PowerPoint, etc.).  For the typical user, organizing these documents in a  'My Documents' folder and having MS Office/Open Office/Adobe Acrobat installed simply gets the job done.  However, I've been looking for some sort of "Web 2.0" solution to view my documents while I'm on the go. And, since my <a href="http://eric-blue.com/my-projects/personal-memex/">knowledge manager is web-based</a>, I'd like a way to browse and embed personal documents directly in my wiki without needing any special software.]]></description>
			<content:encoded><![CDATA[<p><strong>Overview</strong></p>
<p>Like most people, I have a large number of personal documents in a variety of formats (PDF, Excel, Word, RTF, PowerPoint, etc.).  For the typical user, organizing these documents in a  &#8216;My Documents&#8217; folder and having MS Office/Open Office/Adobe Acrobat installed simply gets the job done.  However, I&#8217;ve been looking for some sort of &#8220;Web 2.0&#8243; solution to view my documents while I&#8217;m on the go. And, since my <a href="http://eric-blue.com/my-projects/personal-memex/">knowledge manager is web-based</a>, I&#8217;d like a way to browse and embed personal documents directly in my wiki without needing any special software.</p>
<p>I&#8217;ve been impressed with services like <a href="http://www.scribd.com">Scribd</a> (think YouTube for Documents).  Most people have probably already used Scribd, but in case you haven&#8217;t, this service allows you to upload your documents (variety of formats supported) and view them online in Flash format.  The beauty of this service is that you can also share documents and embed directly inside you website/blog/wiki.   While this works great for sharing certain types of documents, it&#8217;s not really appropriate for uploading my entire collection of documents (especially since many contain personal information).  So, I decided to figure out how to create my own hosted document/book viewer like <a href="http://www.scribd.com">Scribd</a> or <a href="http://books.google.com">Google Books</a>.</p>
<p><strong>Example</strong></p>
<p>The following embedded document browser was actually fairly straight forward to make.  In a nutshell, the viewer takes a PDF file that is converted to Flash (using <a href="http://swftools.org/">SWFTools</a> &#8211; pdf2swf), and then uses an open source flash viewer called <a href="http://www.devaldi.com/?page_id=260">FlexPaper</a> to help with navigation.</p>
<p><center> <iframe id="docviewer" name="docviewer" height="500" width="500" src="http://eric-blue.com/projects/docbrowser/example.html"></iframe> </center> </p>
<p>The navigation bar is fairly straight forward.  You can page up/down, go directly to a given page, zoom, print, and even select a thumbnail mode.  It does currently lack the ability to view full screen, <del datetime="2010-01-03T08:40:01+00:00">search</del> (Search was JUST added to <a href="http://code.google.com/p/flexpaper/">version 1.1</a>) or select text, so I create additional option to view in HTML (using wvHtml) and view the frame full screen.</p>
<p><strong>Open Source To The Rescue</strong></p>
<p>When I first start exploring ways to view all my docs in a web interface, I didn&#8217;t initially focus on flash.  I figured it would be too difficult to have the end product look like Scribd (I was way wrong).  So, I evaluated a number of Linux command-line utilities to convert documents on the fly.  The following is a decent list of applications that can help with any of your conversion needs:</p>
<ul>
<li><a href="http://wvware.sourceforge.net/">wvWare</a> &#8211; A library for converting Word docs.  The utility I used most was wvHtml to convert from .doc directly to .html.</li>
<li><a href="http://freshmeat.net/projects/xlhtml/">xlHtml</a> &#8211; Converts Excel spreadsheets to HTML.</li>
<li><a href="http://pdftohtml.sourceforge.net/">PDFtoHtml</a> &#8211; Converts PDF documents to HTML</li>
<li><a href="http://www.gnu.org/software/unrtf/unrtf.html">UnRTF</a> &#8211; Converts RTF to text or HTML</li>
<li><a href="http://swftools.org/">SWFTools</a> &#8211; A collection of utilities to generate and work with SWF (Flash) files</li>
</ul>
<p>There are apparently some ways to convert between various formats using <a href="http://delicious.com/ericblue76/openoffice+converter">Open Office on the command-line</a> (e.g. JODConvert, PyODConverter, Unoconv, etc.). However, I haven&#8217;t yet spent time evaluating these approaches since my current setup seems to be working pretty well.</p>
<p><strong>DocBrowser Project</strong><br />
<a href="http://eric-blue.com/projects/docbrowser/"><img class="aligncenter size-full wp-image-1003" title="doc_browser_scaled" src="http://eric-blue.com/wp-content/uploads/2010/01/doc_browser_scaled.jpg" alt="" width="500" height="348" /></a><br />
I put up a very preliminary Document Browser prototype at <a href="http://eric-blue.com/projects/docbrowser/">http://eric-blue.com/projects/docbrowser/</a>.  The interface uses <a href="http://jquery.com/">JQuery</a> and <a href="http://abeautifulsite.net/2008/03/jquery-file-tree/">JQueryFileTree</a> to make entire document folder available for browsing just like Windows Explorer.</p>
<p>The doc viewer pane uses the Flash-based interface like the iFrame above for all .PDF docs.  And, the conversion script will render the output in HTML according to the doc type (.doc, .xls, .rtf) using the tools listed above.  I&#8217;ve even added support for <a href="http://mindjet.com/">Mind Manager mindmaps</a> using my web-based <a href="http://eric-blue.com/projects/mindmapviewer/">mindmap viewer</a> to do conversions into Freemind flash on the fly.</p>
<p>Overall, I&#8217;m happy with the end result.  I&#8217;ve setup a customized version of the document browser to run on my personal web server at home.  I can now successfully view my documents from my Laptop while I&#8217;m on the road, and I&#8217;ve been able to embed documents directly in my wiki so I don&#8217;t have to spend time hunting for the right doc.</p>
<p><strong>Other Interesting Links</strong></p>
<ul>
<li>Open source flash viewers -</li>
<p><a href="http://www.devaldi.com/?page_id=260">FlexPaper</a> and <a href="http://swfviewer.blogspot.com/">SWF Viewer/zViewer</a></p>
<li>PSView (Online viewer for PDF, Postscript, Word) &#8211; <a href="http://view.samurajdata.se/">http://view.samurajdata.se/</a></li>
<li>Vuzit (Online document viewer) and API &#8211; <a href="http://vuzit.com/">http://vuzit.com/</a></li>
</ul>
<p><b>Update</b>: Sample code has been posted here <a href="http://eric-blue.com/2010/02/12/example-document-browser-code/">http://eric-blue.com/2010/02/12/example-document-browser-code/</a></p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=How+To+Create+Your+Own+Personal+Document+Viewer+%28Like+Scribd+or+Google+Books%29+http://eric-blue.com/?p=977+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2010/01/03/how-to-create-your-own-personal-document-viewer-like-scribd-or-google-books/&amp;title=How+To+Create+Your+Own+Personal+Document+Viewer+%28Like+Scribd+or+Google+Books%29" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2010/01/03/how-to-create-your-own-personal-document-viewer-like-scribd-or-google-books/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Knowledge To Go: Put Your Wiki On Your IPhone</title>
		<link>http://eric-blue.com/2009/12/13/knowledge-to-go-put-your-wiki-on-your-iphone/</link>
		<comments>http://eric-blue.com/2009/12/13/knowledge-to-go-put-your-wiki-on-your-iphone/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 05:39:14 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Knowledge]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=932</guid>
		<description><![CDATA[Building my own personal knowlege manager has been quite a journey.  Over the last couple years I've taken a "piece meal" approach and slowly built up the features of my system one component at a time.  One major feature that has always been on my mind is data portability.  Last week I wrote an article on how to sync your digial scrapbook between multiple computers and even sync to your wiki.  This feature had me thinking about how I could take portability to the next level.]]></description>
			<content:encoded><![CDATA[<p>Building my own <a href="http://eric-blue.com/my-projects/personal-memex/">personal knowledge manager</a> has been quite a journey.  Over the last couple years I&#8217;ve taken a &#8220;piecemeal&#8221; approach and slowly built up the features of my system one component at a time.  One major feature that has always been on my mind is data portability.  Last week I wrote an article on how to <a href="http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/">sync your digial scrapbook</a> between multiple computers and even sync to your wiki.  This feature had me thinking about how I could take portability to the next level.</p>
<p>Being able to access your personal information/knowledge from multiple places is the ultimate realization of total information ubiquity.  Being able to access all of your personal bookmarks, notes, contact information, journal entries, and research data from any computer is obviously useful.  Being able to access all of your personal knowledge from a handheld device like an iPhone is absolutely exciting!  Without sounding totally nostalgic, this type of portability is in a large part a modern-day realization of what <a href="http://en.wikipedia.org/wiki/Vannevar_Bush">Vannevar Bush</a> had envisioned in his article on the Memex (&#8220;<a href="http://www.theatlantic.com/doc/194507/bush/2">As We May Think</a>&#8220;).</p>
<blockquote><p>“Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”<br />
<a href="http://eric-blue.com/wp-content/uploads/2009/12/iphone-large-memex.png"></a></p></blockquote>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-938" title="iphone-large-memex" src="http://eric-blue.com/wp-content/uploads/2009/12/iphone-large-memex.png" border="0" alt="" width="395" height="716" /></p>
<p style="text-align: center;">
<p>Currently, my personal wiki and other research data (bookmarks, pdfs, mindmaps, etc) are stored on my private server (accessible only behind my firewall).  Unless I enable SSH access, my wiki content is not generally available from the Internet and I have no way to easily access remotely.  I&#8217;ve been thinking for a while on the best approach for making this data completely portable.  After some experimentation, I&#8217;ve found an easy method for making my personal wiki completely accessible in offline mode right on my iPhone.  At a high-level all you really need to do are 2 things:</p>
<blockquote><p>1) Find software that can take a snapshot of your wiki content and make it available for offline viewing</p>
<p>2) Find software that lets you save a copy of your snapshot wiki, store on the iPhone, and view in a web browser (actually both on the phone itself and another PC)</p></blockquote>
<p><strong>Creating a backup of your wiki</strong></p>
<p>There are a lot of applications out there that act as &#8217;spiders&#8217; that crawl your website and save local copies of your pages so you can view in offline mode (no need for an Internet connection).  After trying a handful, one of the better applications I tested was <a href="http://www.httrack.com/">HTTrack</a> (available for Windows and Linux). I should note that I really did try to make this work with Scrapbook.  To date I&#8217;ve used Scrapbook to capture copies of pages with no problems.  However, it turns out that backing up a wiki pushes it to its limit&#8230; Scrapbook only does one serial http connection at a time, doesn&#8217;t have a configurable delay between requests (default is 1 sec and this takes too long), filtering options are not extensive enough, and the process of dynamically updating the HTML to support relatives links took way too long.  In the end, HTTrack ended up being the best solution for a complete wiki backup.</p>
<p>HTTrack is a highly configurable crawler that allows you to create a complete snapshot of your wiki (<a href="http://www.mediawiki.org/wiki/MediaWiki">Mediawiki</a> in my case).  Crawling a wiki turns out to be a little more complicated that your typical website.  Because wiki&#8217;s offer a number of functions (editing of pages, viewing history, printing and exporting in other formats) there are certain links that should not be included in the backup.  After some trial and error, I discovered that since I used <a href="http://semantic-mediawiki.org/wiki/Semantic_MediaWiki">Semantic Mediawiki </a>I needed to be even more careful with the links I wanted to include (many of the Special and Property pages took FOREVER to index).</p>
<p>I tried the windows version of HTTrack (even under Wine on Linux) and the web client version as well.   However, was not completely impressed with how it worked.  What I wanted was a command-line script to run the backup.  Luckily, I found a couple <a href="http://www-public.it-sudparis.eu/~berger_o/weblog/2008/05/30/offline-backup-mediawiki-with-httrack/">websites that have used HTTrack</a> for this purpose and decided to use for my own needs.  Here is a copy of the script i used to create the offline snapshot of my wiki:</p>
<div id="wpshdo_2" class="wp-synhighlighter-outer"><div id="wpshdt_2" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_2"></a><a id="wpshat_2" class="wp-synhighlighter-title" href="#codesyntax_2"  onClick="javascript:wpsh_toggleBlock(2)" title="Click to show/hide code block">Code block</a></td><td align="right"><a href="#codesyntax_2" onClick="javascript:wpsh_code(2)" title="Show code only"><img border="0" style="border: 0 none" src="http://eric-blue.com/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_2" onClick="javascript:wpsh_print(2)" title="Print code"><img border="0" style="border: 0 none" src="http://eric-blue.com/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://eric-blue.com/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://eric-blue.com/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_2" class="wp-synhighlighter-inner" style="display: block;"><pre class="text" style="font-family:monospace;">#! /bin/sh
# Inspired by blogpost from http://www-public.it-sudparis.eu/~berger_o/weblog/2008/05/30/offline-backup-mediawiki-with-httrack/
# -w mirror web sites (--mirror)
# -O backup directory
# -%P extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
# -N0 Saves files like in site Site-structure (default)
# -s0 follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always) (--robots[=N])
# -p7 Expert options, priority mode: 7 &gt; get html files before, then treat other files
# -S Expert option, stay on the same directory
# -a Expert option, stay on the same address
# -K0 keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K3 absolute URI links) (--keep-links[=N]
# -A25000 maximum transfer rate in bytes/seconds (1000=1kb/s max) (--max-rate[=N])
# -F user-agent field (-F &quot;user-agent name&quot;) (--user-agent )
# -%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
# -x Build option, replace external html links by error pages
# -%x Build option, do not include any password for external password protected websites (%x0 include) (--no-passwords)
site=wiki:8080/memex
topurl=http://$site
backupdir=~/websites/memex
httrack -c4 -w $topurl/Special:Allpages \
-O &quot;$backupdir&quot; -%P -N0 -s0 -p7 -S -a -K0 \
-F &quot;Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)&quot; \
-%s -x -%x  \
&quot;+*$site/index.php?*&quot; \
&quot;+*$site/mindmap*&quot; \
&quot;-*Special*&quot; \
&quot;-*Property*&quot; \
&quot;-$site/index.php?title=Property:*&quot; \
&quot;-$site/index.php?title=Special:*&quot; \
&quot;-*$site/Discussion:*&quot; \
&quot;-*$site/Help*&quot; \
&quot;-*/docs/*&quot; \
&quot;-*/wikifiles/*&quot; \
&quot;-*month=*&amp;year=*&quot; \
&quot;-*action=edit&quot; \
&quot;-*action=formedit&quot; \
&quot;-*action=history&quot; \
&quot;-*printable=yes&quot; \
&quot;-*oldid=*&quot; \
&quot;+*$site/images/*&quot; \
&quot;+*.css&quot; \
&quot;+*.js&quot;</pre></div></div>
<p>The best feature of HTTrack is that it will download all content, including Javascript and Flash, and update all links and make them relative.  This way the entire website can be viewed offline and made portable.  Overall, my entire wiki backup was ~ 30MB of markup and content (of course, excluding all audio and video).  And, in the future I need to come up with a solution for exporting my mindmaps.  Since iPhone does not yet support flash I&#8217;ll need some other way to allow for embedded viewing of my mindmap content.  Anyways, at this point all of my content is now ready for copying to my iPhone.</p>
<p><strong>Storing my Wiki on my iPhone</strong></p>
<p style="text-align: left;">OK, this is the really nifty part.  The one thing I really missed about my old 60GB IPod Video was the ability to mount it over USB and use it just like an external hard drive.  I used to haul around TONs of my data and could easily share between Windows, Linux and Mac.  Unfortunately when the iPhones and iPod Touch&#8217;s came out, you could no longer mount the iPhone and copy files (without hackery of course).  Luckily there are a number apps that let you use your iPhone as a storage device.  One of the BEST applications out there is an app called <a href="http://avatron.com/apps/air-sharing/">AirSharing</a>.</p>
<p style="text-align: center;"><a href="http://avatron.com/apps/air-sharing/"><img class="size-medium wp-image-948 aligncenter" title="air-sharing-icon" src="http://eric-blue.com/wp-content/uploads/2009/12/air-sharing-icon.png" alt="" width="146" height="146" /></a></p>
<p><strong>With Air Sharing, you can:</strong></p>
<ul>
<li>Mount your iPhone or iPod touch as a wireless drive on a Mac, windows, or Linux computer, over Wi-Fi, or connect from your computer’s web browser.</li>
<li>Drag-drop files between your iPhone or iPod touch and your computers.</li>
<li>View documents in many common formats.</li>
</ul>
<p>What&#8217;s really useful is that you can mount your iPhone using WebDAV and transfer files just like a regular drive.  The incredibly cool bonus is that you can also access your content from another computer.  If you&#8217;re connected to the same Wi-Fi network, you can use any PC to browse (e.g. http://iphone-local:8080/wiki/) and access your content just like it was on the original server.  For an added layer of security, while you&#8217;re on the go you can setup an AdHoc wireless network and connect privately between your computer and the iPhone to access your personal knowledge base.</p>
<p>Of course, accessing the content on your iPhone from another PC is an added bonus.  The real power in this solution is the ability to browse your wiki on the iPhone without needing any Internet access (3G Or WiFi).  Simply open up your Airsharing app and browse directly to your wiki folder and click on index.html.  Wala!, your browsing your personal wiki just like usual.</p>
<p>I exported the majority of the text content from my wiki (preserving the original formating, with Javascript support).  In fact, I even shared my <a href="http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/">digital scrapbook</a> that I blogged about last week.  but you can also choose to export your entire document collection and multimedia files (video, MP3s, etc).  This is incredibly useful for taking your knowledge on the go and having all of your data RIGHT at your finger tips.  Here are some screenshots of my personal knowledge manager wiki right on my iPhone:</p>
<p><strong>All Articles</strong></p>
<p><a href="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_all.png"><img class="aligncenter size-full wp-image-951" style="border: 1px solid #c0c0c0;" title="wiki_iphone_all" src="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_all.png" alt="" width="480" height="320" /></a></p>
<p><strong>Workout Journal</strong></p>
<p><a href="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_workout.png"><img class="aligncenter size-full wp-image-952" style="border: 1px solid #c0c0c0;" title="wiki_iphone_workout" src="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_workout.png" alt="" width="480" height="320" /></a></p>
<p><strong>Learning</strong></p>
<p><a href="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_learning.png"><img class="aligncenter size-full wp-image-954" style="border: 1px solid #c0c0c0;" title="wiki_iphone_learning" src="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_learning.png" alt="" width="480" height="320" /></a></p>
<p><strong>Documents</strong></p>
<p><a href="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_documents.png"><img class="aligncenter size-full wp-image-955" style="border: 1px solid #c0c0c0;" title="wiki_iphone_documents" src="http://eric-blue.com/wp-content/uploads/2009/12/wiki_iphone_documents.png" alt="" width="480" height="320" /></a></p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Knowledge+To+Go%3A+Put+Your+Wiki+On+Your+IPhone+http://eric-blue.com/?p=932+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2009/12/13/knowledge-to-go-put-your-wiki-on-your-iphone/&amp;title=Knowledge+To+Go%3A+Put+Your+Wiki+On+Your+IPhone" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2009/12/13/knowledge-to-go-put-your-wiki-on-your-iphone/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>How to Synchronize Your Digital Scrapbook</title>
		<link>http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/</link>
		<comments>http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 07:18:23 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Productivity]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=904</guid>
		<description><![CDATA[I had originally planned on calling this article 'How to Use Cloud Computing to Synchronize Your Digital Scrapbook For Research and Integrate Into Your Personal Knowledge Management Wiki for Extra Credit']]></description>
			<content:encoded><![CDATA[<p>I had originally planned on calling this article &#8216;How to Use Cloud Computing to Synchronize Your Digital Scrapbook For Research and Integrate Into Your Personal Knowledge Management Wiki for Extra Credit&#8217;, but I figured that would be a bit too much.  Luckily I am going to give info on how to do both of these things so stay with me!</p>
<p><strong>Background</strong></p>
<p>For my own <a href="http://eric-blue.com/projects/personal-memex/">personal knowledge management setup</a>, I&#8217;m very interested in tracking a number of different &#8216;things&#8217;:</p>
<p>* Documents &#8211; PDFs, word documents, mindmaps, etc.</p>
<p>* Notes &#8211; Journal entries, book summaries, personal notes (think wiki text)</p>
<p>* Links &#8211; Bookmarks (personal or social sites like del.icio.us)</p>
<p>* Multimedia &#8211; Audio / Video</p>
<p>* Snippets &#8211; Captured web pages (full or partiallly snipped content)</p>
<p>When I first mentioned my &#8216;Digital Scrapbook&#8217;, I wasn&#8217;t dropping any hints about me having any <a href="http://en.wikipedia.org/wiki/Scrapbooking">crafty hobbies</a>, I generally refer to my system for storing Snippets as my Scrapbook.  This name is no doubt in large part due to the fact that I&#8217;ve been using the popular Firefox plugin <a href="http://amb.vis.ne.jp/mozilla/scrapbook/">ScrapBook</a> to manage my digital snippets for a few years now.</p>
<p><a href="http://eric-blue.com/wp-content/uploads/2009/12/scrapbook_screen1.png"><img class="aligncenter size-full wp-image-911" title="scrapbook_screen1" src="http://eric-blue.com/wp-content/uploads/2009/12/scrapbook_screen1.png" alt="" width="570" height="331" /></a></p>
<p>ScrapBook is a fantastic solution for storing local copies of web pages for research (with highlighting, editing, and annotation), saving snips of important sections of sites, recording purchase confirmations or receipts, and saving your travel itineraries.  One major thing it has been lacking though is the ability to synchronize or share the Scrapbook with other computers. I use multiple computers (a couple laptops: Mac &amp; Window and a central desktop: Linux) so my goal is to have consistent and up to date data between all systems.  And, up until now, I&#8217;ve had no way to integrate this save data into my <a href="http://eric-blue.com/my-projects/personal-memex/">wiki-based knowledge management system</a>.</p>
<p>I started investigating a solution for this a number of months ago and stumbled across a related (and powerful) research tool called <a href="http://www.zotero.org/">Zotero</a>.  I haven&#8217;t had a chance to use Zotero in depth, but one new feature in the beta version that stuck out to me was the ability to synchronize your data with a remote server.  On the surface this feature looks good (and probably is for most people &#8211; data sync to Zotero server and webdav support for documents), but I was looking for a solution where I have more control over where the data is hosted.  Although I&#8217;m usually not concerned with hosting my data with most providers, I often save private financial information in my Scrapbook (credit reports, financial statements, account numbers, etc.) so I&#8217;d like to have control over where the data is saved and how it&#8217;s encrypted.  Further research eventually sparked a few ideas for a solution.</p>
<p><strong>Synchronizing and Sharing ScrapBook Data</strong></p>
<p>I decided to find a way to explore a setup using some file sharing/sync services after reading an article on <a href="http://www.makeuseof.com/tag/how-to-share-synchronize-research-data-to-other-computers/">syncing Scrapbook using Dropbox</a>.  I had never used <a href="https://www.dropbox.com/">Dropbox</a> before and after giving it a brief testdrive it looked very promising.  Hey, you get a 2GB account for free so that&#8217;s definitely an added bonus!  Although Dropbox has some <a href="https://www.dropbox.com/features">killer features </a>(a big one being an iPhone app to access your files), I opted to experiment with another sync service.  I&#8217;ve been using <a href="http://www.jungledisk.com/">JungleDisk</a> for a couple years as my Amazon-S3 backed offsite backup solution, and was curious if this could be used.  After downloading the latest version (3.0.2 for Linux)  I discovered that it now supports file/directory synchronization between computers.  After about 10-15 minutes of setup and file syncing I had a working solution between my laptop and desktop computers.  Here&#8217;s what you&#8217;ll need to do:</p>
<blockquote><p><strong>Step 1:</strong> Download and install the latest version of the <a href="http://amb.vis.ne.jp/mozilla/scrapbook/">Scrapbook plugin</a> for Firefox on your 1st computer.  For a good quick intro/tutorial to Scrapbook, check out this <a href="http://assets.lifehacker.com/software/uploaded/2006-04-21/scrapbook_sm/scrapbook_sm.html">video from Lifehacker</a>.</p>
<p><strong>Step 2: </strong>Setup an alternate Scrapbook location that resides outside of your Firefox profile directory (Prefrences -&gt; Organize -&gt; Save data to)</p>
<p><strong>Step 3: </strong>Setup your preferred sync solution and use the directory provided in Step 2.  I preferred JungleDisk for my setup, but there are other services like Dropbox, Box.net, SugarSync, etc.  Check out the <a href="http://wiki.activityowner.com/index.php?title=Synchronization">Activty Owner wiki</a> for a detailed list of sync services.  And, although I haven&#8217;t personally tried yet, I&#8217;m sure there are some other non-hosted <a href="http://en.wikipedia.org/wiki/File_synchronization">open source sync solutions</a> like <a href="http://www.cis.upenn.edu/~bcpierce/unison/">Unison </a>(cross-platform) that could be used.</p>
<p><strong>Step 4:</strong> For your 2nd (or subsequent computers) repeat steps 1 through 3.</p></blockquote>
<p><strong>Wiki Integration (Extra Credit)</strong></p>
<p>OK, for me this was the icing on the cake.  Since my Scrapbook data is now on the same computer as my wiki I thought it would be nifty to somehow integrate directly into some of my wiki pages.  I found out that Scrapbook supports the ability to export your Scrapbook hierarchy as a tree in HTML (from Scrapbook Sidebar: Tools -&gt; Output Tree as HTML).  Although this isn&#8217;t completely automatic (yet) this gave me the the content I needed to add to my wiki.  Now, since wikis by there very nature dont&#8217; typically allow you to embed other HTML pages I needed to find a way to make this work.</p>
<blockquote><p><strong>Step 1: </strong>Setup a directory on your webserver to serve content from your Scrapbook directory (setup in Step 2 above) (e.g. http://yourwebsite/scrapbook).  This can either be on the same server as your wiki or another, it doesn&#8217;t really matter.</p>
<p><strong>Step 2: </strong>Verify the output of the directory tree looks good.  If you enabled frames, the URL should be something like http://yourwebsite/scrapbook/tree/frame.html.</p>
<p><strong>Step 3: </strong>For MediaWiki users there are various ways to directly embed pages in your wiki content.  I found that the <a href="http://www.mediawiki.org/wiki/Extension:Anysite">AnySite extension</a> did the trick for me.  Enable the extension, pick a wiki page where you want to display your ScrapBook data and you are set!  For example, here is my content:</p>
<p><em><span class="wikEdListBlock"><span class="wikEdListLine"><span class="wikEdListTag"> *</span> Link to <span class="wikEdLinkTag">[</span><span id="wikEdFollowLink0" class="wikEdURLLink" title="http://w (ctrl-click)">http://w</span><span class="wikEdURLText">iki:8080/wikifiles/scrapbook/tree/frame.html ScrapBook Tree</span><span class="wikEdLinkTag">]</span></span></span><br />
<span class="wikEdUnknown"> &lt;anyweb mywidth=&#8221;1024&#8243; myheight=&#8221;768&#8243;&gt;</span><span id="wikEdFollowLink1" class="wikEdURLLink" title="http://w (ctrl-click)">http://w</span>iki:8080/wikifiles/scrapbook/tree/frame.html<span class="wikEdUnknown">&lt;/anyweb&gt;</span><br />
<span id="wikEdFollowLink2" class="wikEdCat" title="Category:Documents (ctrl-click)"><span class="wikEdLinkTag"> [[</span><span class="wikEdInter">Category:</span><span class="wikEdCatName">Documents</span><span class="wikEdLinkTag">]]</span></span></em></p>
<p><a href="http://eric-blue.com/wp-content/uploads/2009/12/scrapbook_screen2.png"><img class="aligncenter size-full wp-image-917" title="scrapbook_screen2" src="http://eric-blue.com/wp-content/uploads/2009/12/scrapbook_screen2.png" alt="" width="570" height="470" /></a></p></blockquote>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=How+to+Synchronize+Your+Digital+Scrapbook+http://eric-blue.com/?p=904+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/&amp;title=How+to+Synchronize+Your+Digital+Scrapbook" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2009/12/07/how-to-synchronize-your-digital-scrapbook/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Total Recall, Personal Informatics and Life Logging</title>
		<link>http://eric-blue.com/2009/10/18/total-recall/</link>
		<comments>http://eric-blue.com/2009/10/18/total-recall/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 05:38:51 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Interesting]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Quantified Self]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=877</guid>
		<description><![CDATA[“Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”  -- Vannevar Bush (1945)

Fast forward fifty four years, and we finally have the technology (hardware and software) to make the Memex a reality.  My project has been primarily focused on fulfilling a small portion of the the original idea, but has only touched the surface.  The Memex fully realized would be a system that completely (and automatically) digitizes experiences, memories, and interactions with the environment.  The capability for <a href="http://www.telegraph.co.uk/technology/3352900/Total-recall-becomes-a-reality.html">Total Recall </a>, offloading human memory to a digital space, is not too far away (think Cyborgs).]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a little while since my last post and figured it was time to get back into my blogging groove.  I recently came across a few interesting links that I thought I would share.   The two topics I want to discuss are Personal Informatics and Life Logging.  I found it fascinating that both of these topics, complex and mysterious sounding on their own, are very much related to my primary research project: <a href="http://eric-blue.com/my-projects/personal-memex/">My Personal Memex</a>.</p>
<p>For those not familiar with the concept of the <a href="http://en.wikipedia.org/wiki/Memex">Memex</a>:</p>
<blockquote><p>“Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”  &#8212; Vannevar Bush (1945)</p></blockquote>
<p><strong>Total Recall</strong></p>
<p>Fast forward fifty four years, and we finally have the technology (hardware and software) to make the Memex a reality.  My project has been primarily focused on fulfilling a small portion of the the original idea, but has only touched the surface.  The Memex fully realized would be a system that completely (and automatically) digitizes experiences, memories, and interactions with the environment.  The capability for <a href="http://www.telegraph.co.uk/technology/3352900/Total-recall-becomes-a-reality.html">Total Recall </a>, offloading human memory to a digital space, is not too far away (think Cyborgs).  Now for the fun!</p>
<div id="attachment_882" class="wp-caption aligncenter" style="width: 510px"><a href="http://eric-blue.com/wp-content/uploads/2009/10/dante_cyborg.jpg"><img class="size-full wp-image-882" title="Dante Cyborg" src="http://eric-blue.com/wp-content/uploads/2009/10/dante_cyborg.jpg" alt="Dante Cyborg from Flickr" width="500" height="450" /></a><p class="wp-caption-text">Dante Cyborg from Flickr</p></div>
<p><strong>Personal Informatics</strong></p>
<p>According to <a href="http://johnnyholland.org/2009/04/19/the-power-of-personal-informatics/">Johnny Holland</a>, Personal Informatics is:</p>
<blockquote><p>&#8220;&#8230; characterized as the monitoring and displaying of information about our daily activities through intelligent devices, services and systems. This information allows us to see trends and opportunities for change that we would otherwise miss.  With the rise in network and RFID technology we are pointing to a time where personal informatics can play an important role in our lives. If people can access this information about their daily routines, and interact with their own personal data currently invisible to them: would they make more informed decisions?&#8221;</p></blockquote>
<p>The example in this category that I wanted to share is a new product called <a href="http://www.fitbit.com/">FitBit</a>.  The Fitbit accurately tracks your calories burned, steps taken, distance traveled and sleep quality. The Fitbit contains a 3D motion sensor like the one found in the Nintendo Wii. The Fitbit tracks your motion in three dimensions and converts this into useful information about your daily activities.</p>
<p><img src="http://farm4.static.flickr.com/3492/3756822634_65dc62c5e0.jpg" alt="" /></p>
<div style="margin-top: 15px;">You can wear the Fitbit on your waist, in your pocket or on undergarments. At night, you can wear the Fitbit clipped to the included wristband in order to track your sleep. Anytime you walk by the included wireless base station, data from your Fitbit is silently uploaded in the background to Fitbit.com.</div>
<div style="margin-top: 15px;"><strong>Life Logging</strong></div>
<div style="margin-top: 15px;">A new camera promises to <a href="http://www.newscientist.com/article/dn17992-new-camera-promises-to-capture-your-whole-life.html">capture your whole life in digital form</a>!  For consumers, the gadget will provide an easy way to become a &#8220;lifelogger&#8221; – someone who attempts to electronically record as much of their life as possible. Microsoft researcher <a href="http://research.microsoft.com/en-us/um/people/gbell/" target="ns">Gordon Bell</a> has <a href="http://research.microsoft.com/en-us/projects/mylifebits/" target="ns">made his life an experiment in lifelogging</a>, recording everything from phone calls to TV viewing, and uses a SenseCam wherever he goes.</div>
<div style="margin-top: 15px;">
<p class="infuse">A camera you can wear as a pendant to record every moment of your life will soon be launched by a UK-based firm.</p>
<p class="infuse">Originally invented to help jog the memories of people with Alzheimer&#8217;s disease, it might one day be used by consumers to create &#8220;lifelogs&#8221; that archive their entire lives.</p>
<p class="infuse">Worn on a cord around the neck, the camera takes pictures automatically as often as once every 30 seconds. It also uses an accelerometer and light sensors to snap an image when a person enters a new environment, and an infrared sensor to take one when it detects the body heat of a person in front of the wearer. It can fit 30,000 images onto its 1-gigabyte memory.</p>
<p class="infuse">The ViconRevue was originally developed as the <a href="http://research.microsoft.com/en-us/um/cambridge/projects/sensecam/" target="ns">SenseCam</a> by Microsoft Research Cambridge, UK, for researchers studying Alzheimer&#8217;s and other dementias. Studies showed that reviewing the events of the day using SenseCam photos could <a href="http://www.newscientist.com/article/mg18625066.700-wearable-camera-restores-lost-memories.html">help some people improve long-term recall</a>.</p>
</div>
<div style="margin-top: 15px;">For an intriguing, in-depth article on Gordon Bell, check out the article on <a href="http://www.fastcompany.com/magazine/110/head-for-detail.html">Fast Company from 2007</a>.</div>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Total+Recall%2C+Personal+Informatics+and+Life+Logging+http://eric-blue.com/?p=877+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2009/10/18/total-recall/&amp;title=Total+Recall%2C+Personal+Informatics+and+Life+Logging" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2009/10/18/total-recall/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Information Visualization Toolkits for Mind Mapping</title>
		<link>http://eric-blue.com/2009/06/04/information-visualization-toolkits-for-mind-mapping/</link>
		<comments>http://eric-blue.com/2009/06/04/information-visualization-toolkits-for-mind-mapping/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 05:25:22 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[InfoViz]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Mind Mapping]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=865</guid>
		<description><![CDATA[The other week, I wrote a blog post The Visual Wiki: A New Metaphor For Knowledge Access and Management.  At the time, until I read the paper in depth, I hadn&#8217;t realized that this ...]]></description>
			<content:encoded><![CDATA[<p>The other week, I wrote a blog post <a href="http://eric-blue.com/2009/05/12/the-visual-wiki-a-new-metaphor-for-knowledge-access-and-management/">The Visual Wiki: A New Metaphor For Knowledge Access and Management</a>.  At the time, until I read the paper in depth, I hadn&#8217;t realized that this was about a project that I had blogged about last year: <a href="http://eric-blue.com/2008/06/05/thinkbase-visual-semantic-wiki/">Thinkbase &#8211; A Visual Semantic Wiki</a>.  In a nutshell:</p>
<blockquote><p>
“Thinkbase is a new way to navigate and explore information on the web. It is what we call a ‘Visual Wiki’. It is based on Freebase, an open, shared database of the world’s knowledge &#8211; in other words a Semantic Wiki. Thinkbase uses a visualization tool (Thinkmap) to create an interactive visual representation of the semantic relationships in Freebase.”
</p></blockquote>
<p>The other similar project that was mentioned in the research paper was <a href="http://thinkpedia.cs.auckland.ac.nz/">ThinkPedia</a>.  While ThinkBase offers visual navigation for FreeBase, ThinkPedia does the same for Wikipedia content.  One of the sub-projects of my <a href="http://eric-blue.com/my-projects/personal-memex/">Personal Memex</a> project intends to offer visual navigation in very much the same was as both of these applications.  The engine that these projects use for visual navigation, <a href="http://thinkmap.com/">ThinkMap</a>, is VERY impressive.  Unfortunately, it&#8217;s a commercial license (~5k) and keeping with the spirit of my open source model, I need to find something that is free.</p>
<p>With that said, I&#8217;ve started to research various visualization toolkits/APIs that offer some time of visual navigation.  This navigation is very mindmap or concept map like in nature.  There are some variations: some are force-directed graphs while others are hyperbolic.  My research is still very much underway, but I&#8217;ve been collecting my links and have assembled into a mindmap.  I&#8217;ve broken down the categories based on open source vs. commercial (for illustrative purposes), and platform (Java, Flash, or JavaScript).</p>
<p>Stay tuned on my progress in this area over the coming months since this will more than likely be my primary &#8220;pet technology project&#8221; for the summer.</p>
<p><center><br />
<iframe id="mindmap" name="mindmap" height="400" width="500" src="http://eric-blue.com/projects/mindmapviewer/display.cgi?mmap_url=http://eric-blue.com/research/mindmap/Visualization%20Toolkits%2Emmap&#038;format=flash"></iframe><br />
</center></p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Information+Visualization+Toolkits+for+Mind+Mapping+http://eric-blue.com/?p=865+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2009/06/04/information-visualization-toolkits-for-mind-mapping/&amp;title=Information+Visualization+Toolkits+for+Mind+Mapping" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2009/06/04/information-visualization-toolkits-for-mind-mapping/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Freebase Parallax: Set-based Browsing Interface</title>
		<link>http://eric-blue.com/2009/05/23/freebase-parallax/</link>
		<comments>http://eric-blue.com/2009/05/23/freebase-parallax/#comments</comments>
		<pubDate>Sat, 23 May 2009 16:02:36 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=852</guid>
		<description><![CDATA[I found a very interesting project from David François Huynh, developer of some impressive projects over at Simile. Parallax offers a new way to browse and explore data on Freebase, one of the largest open ...]]></description>
			<content:encoded><![CDATA[<p>I found a very interesting project from <a href="http://davidhuynh.net/">David François Huynh</a>, developer of some impressive projects over at <a href="http://simile.mit.edu/">Simile</a>. <a href="http://mqlx.com/~david/parallax/">Parallax</a> offers a new way to browse and explore data on <a href="http://en.wikipedia.org/wiki/Freebase_(database)">Freebase</a>, one of the largest open and shared (structured) databases of knowledge on the web. </p>
<p><center><br />
<object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=1513562&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=1513562&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object>
<p><a href="http://vimeo.com/1513562">Freebase Parallax: A new way to browse and explore data</a> from <a href="http://vimeo.com/user392740">David Huynh</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<p></center></p>
<p>I also discovered a somewhat related research project at Stanford called <a href="http://graphics.stanford.edu/projects/vispedia/">Vispedia</a>.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Freebase+Parallax%3A+Set-based+Browsing+Interface+http://eric-blue.com/?p=852+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2009/05/23/freebase-parallax/&amp;title=Freebase+Parallax%3A+Set-based+Browsing+Interface" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2009/05/23/freebase-parallax/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Visual Wiki: A New Metaphor For Knowledge Access and Management</title>
		<link>http://eric-blue.com/2009/05/12/the-visual-wiki-a-new-metaphor-for-knowledge-access-and-management/</link>
		<comments>http://eric-blue.com/2009/05/12/the-visual-wiki-a-new-metaphor-for-knowledge-access-and-management/#comments</comments>
		<pubDate>Wed, 13 May 2009 05:32:13 +0000</pubDate>
		<dc:creator>ericblue76</dc:creator>
				<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://eric-blue.com/?p=831</guid>
		<description><![CDATA[Truly fascinating work!  This is next on my list for my Personal Memex.




And, the original paper on Scribd:

The Visual Wiki: A New Metaphor for Knowledge Access and Management 
    Publish at ...]]></description>
			<content:encoded><![CDATA[<p>Truly fascinating work!  This is next on my list for my <a href="http://eric-blue.com/projects/personal-memex/">Personal Memex</a>.</p>
<p><center><br />
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/yZ52ORG89Yg&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/yZ52ORG89Yg&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object><br />
</center><br />
</p>
<p>And, the original paper on Scribd:</p>
<p><center><br />
<a title="View The Visual Wiki: A New Metaphor for Knowledge Access and Management on Scribd" href="http://www.scribd.com/doc/15304419/The-Visual-Wiki-A-New-Metaphor-for-Knowledge-Access-and-Management" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">The Visual Wiki: A New Metaphor for Knowledge Access and Management</a> <object codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" id="doc_903037423934002" name="doc_903037423934002" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" align="middle"	height="500" width="450" ><param name="movie"	value="http://d.scribd.com/ScribdViewer.swf?document_id=15304419&#038;access_key=key-ov7njr7xbbwmz55ozbq&#038;page=1&#038;version=1&#038;viewMode=list"><param name="quality" value="high"><param name="play" value="true"><param name="loop" value="true"><param name="scale" value="showall"><param name="wmode" value="opaque"><param name="devicefont" value="false"><param name="bgcolor" value="#ffffff"><param name="menu" value="true"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="salign" value=""><param name="mode" value="list"><embed src="http://d.scribd.com/ScribdViewer.swf?document_id=15304419&#038;access_key=key-ov7njr7xbbwmz55ozbq&#038;page=1&#038;version=1&#038;viewMode=list" quality="high" pluginspage="http://www.macromedia.com/go/getflashplayer" play="true" loop="true" scale="showall" wmode="opaque" devicefont="false" bgcolor="#ffffff" name="doc_903037423934002_object" menu="true" allowfullscreen="true" allowscriptaccess="always" salign="" type="application/x-shockwave-flash" align="middle" mode="list" height="500" width="450"></embed></object>
<div style="margin: 6px auto 3px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 12px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block;">    <a href="http://www.scribd.com/upload" style="text-decoration: underline;">Publish at Scribd</a> or <a href="http://www.scribd.com/browse" style="text-decoration: underline;">explore</a> others:            <a href="http://www.scribd.com/explore/School-Work/" style="text-decoration: underline;">School Work</a>              <a href="http://www.scribd.com/explore/School-Work/Essays-Theses" style="text-decoration: underline;">Essays &#038; Theses</a>              <a href="http://www.scribd.com/explore/Books/Nonfiction" style="text-decoration: underline;">Non-fiction</a>                  <a href="http://www.scribd.com/tag/web" style="text-decoration: underline;">web</a>              <a href="http://www.scribd.com/tag/2.0" style="text-decoration: underline;">2.0</a>      	</div>
<p></center></p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=The+Visual+Wiki%3A+A+New+Metaphor+For+Knowledge+Access+and+Management+http://eric-blue.com/?p=831+via+@ericblue" title="Post to Twitter"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-twitter-big3.png" alt="Post to Twitter" /></a> <a class="tt" href="http://delicious.com/post?url=http://eric-blue.com/2009/05/12/the-visual-wiki-a-new-metaphor-for-knowledge-access-and-management/&amp;title=The+Visual+Wiki%3A+A+New+Metaphor+For+Knowledge+Access+and+Management" title="Post to Delicious"><img class="nothumb" src="http://eric-blue.com/wp-content/plugins/tweet-this/icons/tt-delicious-big3.png" alt="Post to Delicious" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://eric-blue.com/2009/05/12/the-visual-wiki-a-new-metaphor-for-knowledge-access-and-management/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

