Lighton Phiri

Naine ndine blogger
  • Home
  • Publications
  • Talks
  • Contact
  • About Me
  • RSS

DSpace Ingestion Performance Degradation

February 24, 2013
  • Tweet
  • Email
  • Tweet
  • Email

As part of my performance evaluations for my research [1], I designed a comparative experiment aimed at comparing certain operations of our approach with DSpace. I had initially estimated the completion time for the ingest process of the last remaining of my 15 workloads [2] at 7 days, however, today marks day number 11 from the time I started ingesting the 1,638,400 records; notwithstanding the fact that results from this workload are in no way going to change what I’ve been able to infer from results of the other 14 workloads. So I decided to try and figure out why performance has degraded…

Background Info

I have scripts that take advantage of DSpace’s metadata-import utility, and restored to using batch files with 1k records after evaluating varying batch sizes. Incidentally, there is a correlation between the batch size and the ingestion time, but larger batch sizes come at a cost –more RAM.

Findings

My initial guess was that the performance degradation was as a result of larger sized batch files, so I awk’d the now 2.2GB nohup log file to determine how long the processing it taking; drilling down to the hour level –the plots below show daily and hourly ingest times. However, the batch files I’ve been processing seem to have a relatively consistent size so no problem here…

dspace-ingestion-1k-batches-total

dspace-ingestion-1k-batches

Conclusions

At this point, I am still unsure of why the ingest process is slowing down –I’ve decided to investigate this further when I have the time to, but my guess is the bottleneck is PostgreSQL.

Bibliography

[1] http://people.cs.uct.ac.za/~lphiri
[2] http://lightonphiri.org/blog/randomly-spawning-sample-objects-with-python

Related

Categories: Research, Technical
Tags: dspace, ingestion, metadata-import
  • Archive

    • expandGeneral (20) 
    • expandResearch (21) 
    • expandTeaching (1) 
    • expandTechnical (38) 
    • expandZambia (9) 
  • Popular Posts

  • Subscription


  • Tweet Stream

    • RT @tomvog: This is pretty cool: @dblp_org has integrated @unpaywall! Now links to open-access versions of papers are shown in dblp. #openaccess, Nov 14
    • Very informative hands-on workshop [1] by @sudobear (William A. Ingram) and @edwardafox (Edward Fox) from… https://t.co/OXJx6EWabb, Nov 9
  • Who's Online

    11 visitors online now
    5 guests, 6 bots
    Map of Visitors
Lighton Phiri   ¯\_(ツ)_/¯    
This work is licensed under a Creative Commons Attribution 4.0 International License. — Theme Ghostwriter — Powered by Wordpress.
LinkedIn LinkedIn Twitter Twitter Facebook Facebook Flickr Flickr GooglePlus GooglePlus Instagram Instagram SlideShare SlideShare
grab this