Randomly Spawning Sample Objects With Python

Too tired to think straight and with very little time left, I came up with a primitive script to randomly spawn workloads for a series of experiments I was running. As it turns out, the natural order of the dataset [1] I am using needs a bit of randomisation –the records in the different SetSpecs have inconsistent structures. Moral of the story here is that free-styling isn’t always approapriate 😉

I used the sample function from the random module… you’ll notice that I recursively swallow the entire dataset (~2million records) into memory and generate a random sample with files equal to the ‘bin’ value. I did mention that I was tired and not thinking straight, however, I checked to see if I was doing the right thing [2].


import os
import random
import shutil

def spawnrandomworkload(dataset, destination, bin):
    """Spawns random sets of experiment workloads.

    keyword arguments:
    dataset --location of original dataset to spawn
    destination --base location where workloads will be created

    workloads = (('w1', 100), ('w2', 200), ('w3', 400), ('w4', 800), ('w5', 1600), ('w6', 3200), ('w7', 6400), ('w8', 12800))
    swallowdataset = [str(os.path.abspath(os.path.join(root, filename))+":"+os.path.join(destination, bin, os.path.dirname(os.path.relpath(os.path.abspath(os.path.join(root, filename)), dataset)))).split(':',2) for root, dirs, files in os.walk(dataset) for filename in files if filename.endswith('.metadata')]
    for workload in workloads:
        if workload[0] == bin:
            payload = random.sample(swallowdataset, workload[1])
            for cargo in payload:
                if not os.path.exists(cargo[1]):
                shutil.copy2(cargo[0], cargo[1])

In addition, I had a companion bash ‘workhorse’ script do most of the dirty work…

for workloads in `seq 1 8`
   echo $workloads
   echo $w
   echo Processing directory.... /home/lphiri/datasets/ndltd/random/workload/$w
   echo Copying contents to... /home/lphiri/datasets/ndltd/random/workload2/$w
   python -c "import simplyctperformance; simplyctperformance.spawnstructworkload('/home/lphiri/datasets/ndltd/random/workload/$w2', '/home/lphiri/datasets/ndltd/random/workload2', '$w2')";

[1] http://lightonphiri.org/blog/metadata-harvesting-via-oai-pmh-using-python
[2] http://stackoverflow.com/a/855455/664424