ApacheBench [1] was an obvious choice for my benchmarking related tasks, at least until I ended up spending almost three hours trying to figure out why I kept getting “apr_poll: The timeout specified has expired (70007)” errors… it turns out there is a hardcoded 30 seconds on socket timeout value [2]. (more…)
Randomly Spawning Sample Objects With Python
Too tired to think straight and with very little time left, I came up with a primitive script to randomly spawn workloads for a series of experiments I was running. As it turns out, the natural order of the dataset [1] I am using needs a bit of randomisation –the records in the different SetSpecs have inconsistent structures. Moral of the story here is that free-styling isn’t always approapriate
(more…)
Apache Solr DIH Indexing Benchmarks
I have been benchmarking Apache Solr batch indexing of a linearly increasing workload of XML documents… I am using Solr’s Data Input Handler on a Pentium(R) Dual-Core CPU E5200@ 2.50GHz that has 4 GB RAM. I ran 5 repeated runs for each workload and subsequently picked the minimum successful run for workload. The smallest workload (100 documents) took 1.11 seconds, while the largest took 18,261 seconds (~5 hours). (more…)
Apache Solr Indexing Using UpdateRequestHandler
I’ve been running a series of Apache Sorl indexing benchmarks on a Dublic Core [1] encoded XML corpus. Batch indexing was trivial using Data Import Handler [2], however, there is currently NO delta-import support for XPathEntityProcessor –it’s only possible with the SqlEntityProcessor. (more…)
A Taste of the Beamer LaTeX Class Using Torino Theme
tl;dr
LaTeX source files are on my github account [1]… Use R Sweave command to generate TeX file. Thereafter, run latex on resulting TeX file and then dvipdf to generate final pdf output –pdflatex will NOT work with PSTricks and so you need to run latex to generate a dvi file. (more…)


LinkedIn
Twitter
Facebook
Flickr
GooglePlus