Randomly Spawning Sample Objects With Python

Too tired to think straight and with very little time left, I came up with a primitive script to randomly spawn workloads for a series of experiments I was running. As it turns out, the natural order of the dataset [1] I am using needs a bit of randomisation –the records in the different SetSpecs have inconsistent structures. Moral of the story here is that free-styling isn’t always approapriate ;) (more…)

Apache Solr DIH Indexing Benchmarks

I have been benchmarking Apache Solr batch indexing of a linearly increasing workload of XML documents… I am using Solr’s Data Input Handler on a Pentium(R) Dual-Core CPU E5200@ 2.50GHz that has 4 GB RAM. I ran 5 repeated runs for each workload and subsequently picked the minimum successful run for workload. The smallest workload (100 documents) took 1.11 seconds, while the largest took 18,261 seconds (~5 hours). (more…)

Apache Solr Indexing Using UpdateRequestHandler

I’ve been running a series of Apache Sorl indexing benchmarks on a Dublic Core [1] encoded XML corpus. Batch indexing was trivial using Data Import Handler [2], however, there is currently NO delta-import support for XPathEntityProcessor –it’s only possible with the SqlEntityProcessor. (more…)