Parallel Processing 26 February 2008

Posted by Oliver Mason in programming.
My grammar pattern processing was running at a speed of about 7 words per minute, which amounted to an estimated completion time of 1 week for all the 56,000 words in the BNC I am currently looking at. However, as processes on the Blue Bear system can only run a certain length of time it would have required occasional restarts (about 2 a day or so). And due to the way it was set up a restart would go through all words, beginning at ‘a’, finding that for the first x words processing had already been done. Clearly not very efficient.

Now, this is sequential processing. With a highly parallel system it would be a lot faster to run things in parallel, and so I changed the setup and split the whole task into different runs for each letter. Now there are 27 processes churning away at the letters a-z, and the overall throughput is a lot higher, at approximately 180 words per minute. That means the total user time would be little over 5 hours; which is about factor 27 faster than a week!

And once this has finished, I can set it running for the remaining features while working with the data that has then been gathered already.

WordPress after all 26 February 2008

Posted by Oliver Mason in meta.
After having played around with my own, hand-written blog-engine in PHP for a while I realise that it’s not worth it. Going ‘mainstream’ as it is I have perhaps a bit less control over things, but at least it’s a complete system, with categories, navigation, and even comments. I simply don’t have the time to mess about with these things myself anymore.

So the first step is to migrate all worthwhile postings from my old system to this one, not too hard, as I didn’t post that much recently.