Skip to content

Lighton Phiri

Naine ndine blogger

  • Home
  • Research
    • Publications
    • Projects
      • Streamlined Enterprise Medial Imaging
      • Registry of Environmental Information
      • Automatic Forcasting of Seasonal Rainfall
    • Datasets
      • Computer Systems & Architecture Student Performance
    • Talks
  • Teaching
  • Et al.
    • Dashcam Footage
      • Landmarks
      • Roads
      • Suburbs
    • Cycling Footage
      • Endurance Cycling
      • MTB Cycling
      • Road Cycling
  • Contact
  • About Me

DSpace Ingestion Performance Degradation

Research, Technical
Lighton PhiriFebruary 24, 2013February 24, 2013
  • Tweet
  • Email
  • Tweet
  • Email

As part of my performance evaluations for my research [1], I designed a comparative experiment aimed at comparing certain operations of our approach with DSpace. I had initially estimated the completion time for the ingest process of the last remaining of my 15 workloads [2] at 7 days, however, today marks day number 11 from the time I started ingesting the 1,638,400 records; notwithstanding the fact that results from this workload are in no way going to change what I’ve been able to infer from results of the other 14 workloads. So I decided to try and figure out why performance has degraded…

Background Info

I have scripts that take advantage of DSpace’s metadata-import utility, and restored to using batch files with 1k records after evaluating varying batch sizes. Incidentally, there is a correlation between the batch size and the ingestion time, but larger batch sizes come at a cost –more RAM.

Findings

My initial guess was that the performance degradation was as a result of larger sized batch files, so I awk’d the now 2.2GB nohup log file to determine how long the processing it taking; drilling down to the hour level –the plots below show daily and hourly ingest times. However, the batch files I’ve been processing seem to have a relatively consistent size so no problem here…

dspace-ingestion-1k-batches-total

dspace-ingestion-1k-batches

Conclusions

At this point, I am still unsure of why the ingest process is slowing down –I’ve decided to investigate this further when I have the time to, but my guess is the bottleneck is PostgreSQL.

Bibliography

[1] http://people.cs.uct.ac.za/~lphiri
[2] http://lightonphiri.org/blog/randomly-spawning-sample-objects-with-python

dspaceingestionmetadata-import

Archive

  • expandGeneral (21) 
  • expandResearch (36) 
  • expandTeaching (1) 
  • expandTechnical (43) 
  • expandZambia (16) 

Popular Posts

Subscription


Tweet Stream

  • RT @NDLTD: #ETD2022 @lightonphiri (University of Zambia) presents a study of @NDLTD Union Catalog to evaluate ETD #metadata st… https://t.co/hQzgH2iUeH, Sep 8
  • This is a test Tweet during a Software Engineering lecture #ict3020, Apr 2

Who's Online

3 visitors online now
1 guests, 2 bots
Map of Visitors
Lighton Phiri   ¯\_(ツ)_/¯     This work is licensed under a Creative Commons Attribution 4.0 International License.