Dspace is probably one of the most widely used digital repository software package. It has evolved into a complex software stack over the past couple of years; as of this writing (April 2011), the latest version is 1.7.1 with new features that include DSpace discovery, Mirage XMLUI theme, a curation administration UI and most importantly, the AIP Backup& Restore process.

I believe I mentioned in one of my previous posts that my research is focused arguing that the complex software stacks currently employed in most of the digital repositories tools can in actual fact be abstracted; in other words, make the overall architecture simple.  Not quite sure how to interpret the statement below (it’s on the Dspace about page)

It is free and easy to install “out of the box” and completely customizable to fit the needs of any organization.

I beg to differ, it's not "easy to install" and it sure as hell not "out of the box"….. I had to install a motherload of components; and the installation manual makes reference to specific software versions of the same components. And who ever came up with the idea of integrating it with Oracle? We all know the software is not free; well the free express edition has CPU(2 processor limit), RAM (no more than 2GB) and above all space(10g XE has a 4GGB cap….. the newer 11gR2 XE version however has a 11GB cap, but as of this writing, it's still a beta version).

Greenstone Digital Library Software

So I’ve spent the past couple of weeks trying to familiarize myself with the most common open source digital repository software tools, I’ve using the Parasoft services to test them out. The thing is, I’m going to spend the next 2 years of my life researching on current existing digital libraries software architectures… and eventually (hopefully) prove that a much simpler architecture can work just as well as the current complex software stacks.

Greenstone is a legend! It’s probably because it was one of the first digital repository software package to be implemented. The other reason why it is popular is because it is developed and distributed in corporation with UNESCO.

The Linux distribution basically comes with statically linked binaries (see output below). The installation package also comes bundled with all the necessary tools(ImageMagick with JPEG200 support and Apache web server)  required for Greenstone to function appropriately.

The Fourth Paradigm

So my supervisor gave me my first reading last week; “The Fourth Paradigm – Data Intensive Scientific Discovery” (The Fourth Paradigm: Data-Intensive Scientific Discovery). I just finished going through the first chapter (an edited version of the last talk – eScience Talk at NRC-CSTB meeting – given by Jim Gray in 2007 before he got lost at sea) and I must say, it is a rather interesting  book.

The foreword is especially interesting as it starts off with a classic example of how useful curated data can be; it basically talks about how Johannes Kepler discovered the laws of planetary motion using Tycho Barhe’s catalog of systematic astronomical observations. Gordon Bell then describes Data-Intensive science as being comprised of three basic activities: capture, curation, and analysis.