Category: Research

ggplot2 – Multiple Plots in One Graph Using gridExtra

I’ve been using ggplot2’s facet_wrap and facet_grid feature mostly because multiplots I’ve had to plot thus far were in one way or the other related. However, I needed to plot a multiplot consisting of four (4) distinct plot datasets. In the past, when working with R base graphics, I used the layout() function to achive this [1]. (more…)

Randomly Spawning Sample Objects With Python

Too tired to think straight and with very little time left, I came up with a primitive script to randomly spawn workloads for a series of experiments I was running. As it turns out, the natural order of the dataset [1] I am using needs a bit of randomisation –the records in the different SetSpecs have inconsistent structures. Moral of the story here is that free-styling isn’t always approapriate 😉 (more…)

Apache Solr DIH Indexing Benchmarks

I have been benchmarking Apache Solr batch indexing of a linearly increasing workload of XML documents… I am using Solr’s Data Input Handler on a Pentium(R) Dual-Core CPU E5200@ 2.50GHz that has 4 GB RAM. I ran 5 repeated runs for each workload and subsequently picked the minimum successful run for workload. The smallest workload (100 documents) took 1.11 seconds, while the largest took 18,261 seconds (~5 hours). (more…)

Apache Solr Indexing Using UpdateRequestHandler

I’ve been running a series of Apache Sorl indexing benchmarks on a Dublic Core [1] encoded XML corpus. Batch indexing was trivial using Data Import Handler [2], however, there is currently NO delta-import support for XPathEntityProcessor –it’s only possible with the SqlEntityProcessor. (more…)