Main


May 18, 2008

Document Management Systems: SCAN (Smart Content Aggregation and Navigation)

Back in March, I began evaluating some open source Document Management Systems (DMS) to help compliment my wiki-based Personal Knowledge Manager (PKM). That's a little bit of acronym-overload. But, in simple terms I really am looking for a way to easily store, categorize, and retrieve a number of my documents related to research and learning (PDFs, Word Docs, etc).

I quickly discovered that although my wiki can manage attaching simple documents, there was no way to easily store metadata for the documents or search within the wiki itself. As I alluded to in my original post, I narrowed down my search to 3 main DMS choices: SCAN, Alfresco, and Knowledge Tree. Of these three, SCAN (Smart Content Aggregation and Navigation) ended up being the most feature-rich and least complicated. Alfresco and Knowledge tree are both fantastic products, but they ended up being too complex for my needs. I believe this ultimately boiled down to the fact that my DMS is for one person (me) and not an entire team or company. Many features related to roles, access restrictions, and document workflow aren't a concern to me right now.

At a high-level, SCAN supports the following features:

  • Java-based UI with a multitude of browsing, searching, and tagging functions (Can run on a variety of platforms - Linux, Mac, & Windows
  • Support (with plugins) for PDF, Word, Excel, XML/XHTML, Plain text
  • Tag cluster browsing for both Documents and Del.icio.us links
  • Sophisticated tagging and text analysis

A full list of features can be found here. I received an anonymous tip the other day (well, it was actually from a guest Google chatter) that SCAN version 1.3 was just released. This release has a number of UI enhancements for browsing the document collections, adding document annotations, and better management of metadata through document properties.

For the most part, SCAN ended up being my "Killer App" for Document Management Systems. One slight drawback on my wishlist is that there is no web-interface for SCAN. Most of my time spent searching, browsing, and tagging will more than likely be on my primary desktop where SCAN is installed. Using the SCAN GUI is fine for 90% of the time, however if I am remote I would like to have access to my documents.

For the short-term I've simply configured my wiki to display my document repository so I can download documents as needed. What's convenient is that I can preserve the document directory hierarchy however I like in the file system. And with SCAN, I can choose to create multiple document repositories and organize & aggregate my document collections with tags. Sure, I can only get a hierarchy (file system) list for my web view, but this is OK for now.

Long-term, I would like to experiment with adding the following features for my own needs (I am a fan of Eating My Own Dog Food or Sipping My Own Champagne):

  • Add support for password-protected PDFs (if this is possible with Lucene)
  • Add support for indexing and searching MindManager mindmaps. This is a *huge* must have for me given the number of mindmaps I've created for my own research.
  • Create a basic SOAP-based service layer on top of SCAN so I can access metadata, create tag clouds, and search from a web-interface. This web-interface will more than likely be a barebones MediaWiki plugin.

This, of course, will need to be added to my ever-growing Personal Pet Project Queue.

I'd highly recommend giving SCAN a whirl, especially if you're interested in wrangling a large number of documents. SCAN is simple, effective, & powerful. And, best of all, it's free!

November 17, 2007

gnizr: Open Source Semantic Del.icio.us With Mashup Capability

Wow, now that's a tall order to fill! And, it appears that gnizr has delivered that order. Gnizr (short for organizer) is one of the latest additions to the Google code repository, and this code-base has been donated from Image Matters LLC.

I haven't had a chance to install yet, but from looking over the website and screenshots it looks pretty amazing!

gnizr™ (gə-nīzər) is an open source application for social bookmarking and web mashup. It is easy to use gnizr to create a personalized del.icio.us-like portal for a group of friends and colleagues to store, classify, and share information, and mash-it-up with information about location.


* Archive saved bookmarks and organize bookmarks using tags and folders.
* Edit notes using WYSIWYG bookmark editor.
* Assign geographical location values to bookmarks and view bookmarks on a map.
* Define relationships between bookmark tags -- broader, narrower and member-of.
* Tag bookmarks using Machine Tags.
* View bookmarks in Clustermap and Timeline.
* Import new bookmarks from user-defined RSS subscriptions -- RSS, Atom and GeoRSS.
* Create new application behaviors using gnizr API. For example:
o Add modules to support custom Machine Tags;
o Add listeners to handle bookmark change events;
o Develop custom RSS crawlers to perform automated bookmark imports; and
o Create third-party mashups from data published by gnizr (RDF, RSS and JSON).

Screenshots (Click here for more)

Thanks to James at Semantic Wave for bringing this to my attention via del.icio.us!

August 13, 2005

Persistence Strategies: DAO (Data Access Objects) and ORM (Object-Relational Mapping) - Part 2

Article 2: DAO - Data Access Objects

Some time has passed since my first posting. My investigation on persistence originally focused on DAOs. I was looking for a relatively simple way to persist data, without a full-blown persistence solution (like Hibernate, OJB, or iBatis). Before I get too far ahead, a definition of DAOs are needed.

Data Access Objects (DAOs) are a design pattern, popularized by Java, that essentially separates your presentation code from your persistence layer. Or, the code that is responsible for reading/writing your objects to and from disk/database/memory. DAOs provide a layer of abstraction for basic CRUD (Create, Read/Retrieve, Update, and Delete) operations.

After some experimenting, I quickly discovered 2 key points:


1) DAOs work great by themselves for simple requirements: 1 object = 1 table

2) DAOs and other persistence solutions are by no means mutually exclusive. In fact, they really should be used in conjunction wherever possible.


On the first point, let's use the example of having both Customer and Order objects. In this scenario, the relationship between Customer and Order is 1 to many, m:n (A given customer can have multiple orders). And, to make things more complex, this association can be made bidirectional. Meaning, that the Order object may need to make a reference to the Customer (possibly via a getCustomer() method).

While experimenting, I discovered a great (and free) online DAO generator (http://titaniclinux.net/daogen/). DAOs are a great candidate for automatic generation since there is a great deal of replication between each implementation. In this case, DAOgen essentially prompts you to provide a table name and corresponding columns, along with a class name and corresponding properties. When it is done, your class and DAO will be generated for you. In the case of Customer, the following methods were generated for the DAO:

delete() deleteAll() load() loadAll() searchMatching() create()
All CRUD methods are here and the SQL queries are dynamically generated and placed inside the DAO implementation. This system works great when dealing with single objects. But, in the Customer->Order relationship, some manual coding is needed in the Customer DAO to instantiate a Order DAO and load all Orders for the given Customer. This is certainly doable, but probably does not scale well for a more complex system. Especially one that has a deep object graph, and is hierarchical in nature.

On the second point, the key thing to keep in mind is that DAOs are really solving a different problem than persistence, or ORM (object relational mapping) itself. DAOs are really one strategy for implementing a generic data access layer. While ORM, is strictly responsible for mapping your objects to your database (directly to tables themselves or SQL mapping). Together, both the DAL (Data Access Layer) and ORM form a total persistence framework.

While researching DAOs, I've put together a list of good sites on the topic. Along with some links that focus on general design patterns, best practices, business object, and data transfer objects. I posted this info on the iBatis mailing list the other week. This has now been incorporate into their Wiki, under "Where can I get more information about Data Access Objects". Here is the original list, along with some additional links.

DAO Resources

iBatis DAO http://www.onjava.com/pub/a/onjava/2005/08/10/ibatisdao.html
Data Access Object (DAO) J2EE Pattern http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html

Transfer Object (DTO) J2EE Pattern
http://java.sun.com/blueprints/corej2eepatterns/Patterns/TransferObject.html

Protecting the Domain Model
http://www.theserverside.com/news/thread.tss?thread_id=34278#172661

Pattern Problem
http://forum.java.sun.com/thread.jspa?threadID=569418&tstart=225

General Design with BO's, DTO's and DAO's
http://forum.java.sun.com/thread.jspa?threadID=582832&tstart=134

Using BeanUtils to avoid duplication between DTOs and BOs
http://www.javaranch.com/newsletter/July2003/TouringTheCommonsPart1.html

DAO Generator
http://titaniclinux.net/daogen/

FireStorm DAO Code Generator
http://www.codefutures.com/

DAO Examples
http://daoexamples.sourceforge.net/

One big Service class or several small classes?
http://www.theserverside.com/news/thread.tss?thread_id=23705

Custom-Grained Data Transfer Objects
http://www.practicalsoftwarearchitect.com/articles/customgrained/customgrained.html

Dynamically generate DTOs with DynaBeans
http://www.javaranch.com/newsletter/200404/Commons_Part3.html


The final article will focus on some popular open source persistence frameworks.

April 25, 2005

Persistence Strategies: DAO (Data Access Objects) and ORM (Object-Relational Mapping) - Part 1

Article 1: Overview

I've been working on some recent projects, both personal and professional, where the topic of object persistence has come up. As I've learned over the last few months, while this is certainaly a common problem and there are no shortage of solutions out there, this can be a challenging area. While researching different solutions, I've focused both on generic design patterns and best practices, as well as language/platform specific implementations. My area of interest has really focused on solutions for both Perl and Java. However, there are far more (and more mature) options out there for Java at the moment.

Generically speaking, data/object persistence essentially boils down to one of two categories:

  • You need a way to persist, or store, your business objects for retrieval at a later time. Top->Down: Applies to new projects where you aren't constrained by a legacy schema.
  • You need a way to take an existing database schema, and provide a high-level OO domain model (business objects). Bottom->Up: Existing projects where you must map between your database tables and your business objects. Typically a 1:1 mapping between your table and class.

As with most people, my perstistence requirements fall into category 2. One of the challenges is in the scenario where there are many tables and there is a hierarchical relationship between objects. As a very generic example, lets say you have 4 objects.

Objects A, B, C, D

Where A can have 1 or more B, B can have 1 or more C, and C can have one or more D. While this 1 to many (m:n) relationship may not be that uncommon, the challenge is database load and performance. If you instantiate object A, you more than likely would not want a cascaded load where all objects are loaded. This is particualy true if your collections have tens of thousands of rows. This is where the decision to use best patterns and practices to roll your own solution, and choosing an existing perstistence framework becomes much easier. Solutions such as IBatis and Hibernate support lazy loading and caching, which address these issues.

Before I get into the topic of perstistence frameworks, I've done some investigation on DAO (Data Access Objects) and will publish this next. While DAO is not a full fledged perstistence solution, it does provide a clean way to seperate your business objects from your data retrieval mechanism (i.e. Removes SQL from your code). Over the next few weeks I plan on posting my findings as I make progress. My hope is that some of my findings may be helpful to people who have just started research similar to this.