Apache Solr Indexing Using UpdateRequestHandler

I’ve been running a series of Apache Sorl indexing benchmarks on a Dublic Core [1] encoded XML corpus. Batch indexing was trivial using Data Import Handler [2], however, there is currently NO delta-import support for XPathEntityProcessor –it’s only possible with the SqlEntityProcessor. (more…)

Using R ggplot2 Package for Elegant Graphics

I have been analysing results from a user study I conducted a couple of weeks back, and quite naturally, I have been experimenting with varying ways of presenting the results graphically. I have primarily been using R base graphics for constructing my plots, but recently came across ggplot2… incidentally though, I also briefly experimented with lattice. (more…)

[Screencasts] DSpace Via IIS Using ISAPI Filter Plugin

Today was really nice, except for the weather though; it was bitterly cold and I tend to get really home sick when it gets this cold. I still got to learn quite a bit about how things work in a typical Windows server environment though. So I spent the early hours of this afternoon with a colleague who manages the MS Win 2K3 i386 production server where the DSpace instance I set up is going to run from; we were essentially configuring IIS to proxy requests through to Tomcat 6… (more…)

Metadata Harvesting via OAI-PMH Using Python

I am conducting my last set of experiments –basically a series of performance evaluations, and I will be using metadata, harvested via OAI-PMH [5], from the NDLTD portal [1] as my dataset… It fits in perfectly with what needs to be evaluated; 1,985,695 well-structured records. Incidentally, I had a very interesting chat with my supervisor the other day when we discussed my experimental plan, and one of the things that came up was the question of figuring out whether or not a particular dataset is a prime candidate for performance evaluations… (more…)