Apache Solr DIH Indexing Benchmarks

I have been benchmarking Apache Solr batch indexing of a linearly increasing workload of XML documents… I am using Solr’s Data Input Handler on a Pentium(R) Dual-Core CPU E5200@ 2.50GHz that has 4 GB RAM. I ran 5 repeated runs for each workload and subsequently picked the minimum successful run for workload. The smallest workload (100 documents) took 1.11 seconds, while the largest took 18,261 seconds (~5 hours). Continue reading “Apache Solr DIH Indexing Benchmarks”

Apache Solr Indexing Using UpdateRequestHandler

I’ve been running a series of Apache Sorl indexing benchmarks on a Dublic Core [1] encoded XML corpus. Batch indexing was trivial using Data Import Handler [2], however, there is currently NO delta-import support for XPathEntityProcessor –it’s only possible with the SqlEntityProcessor. Continue reading “Apache Solr Indexing Using UpdateRequestHandler”