jump to navigation

Parallel Processing 26 February 2008

Posted by ojmason in programming.
trackback

My grammar pattern processing was running at a speed of about 7 words per minute, which amounted to an estimated completion time of 1 week for all the 56,000 words in the BNC I am currently looking at. However, as processes on the Blue Bear system can only run a certain length of time it would have required occasional restarts (about 2 a day or so). And due to the way it was set up a restart would go through all words, beginning at ‘a’, finding that for the first x words processing had already been done. Clearly not very efficient.

Now, this is sequential processing. With a highly parallel system it would be a lot faster to run things in parallel, and so I changed the setup and split the whole task into different runs for each letter. Now there are 27 processes churning away at the letters a-z, and the overall throughput is a lot higher, at approximately 180 words per minute. That means the total user time would be little over 5 hours; which is about factor 27 faster than a week!

And once this has finished, I can set it running for the remaining features while working with the data that has then been gathered already.

Comments»

No comments yet — be the first.