jump to navigation

Parallel Processing 26 February 2008

Posted by Oliver Mason in programming.

My grammar pattern processing was running at a speed of about 7 words per minute, which amounted to an estimated completion time of 1 week for all the 56,000 words in the BNC I am currently looking at. However, as processes on the Blue Bear system can only run a certain length of time it would have required occasional restarts (about 2 a day or so). And due to the way it was set up a restart would go through all words, beginning at ‘a’, finding that for the first x words processing had already been done. Clearly not very efficient.

Now, this is sequential processing. With a highly parallel system it would be a lot faster to run things in parallel, and so I changed the setup and split the whole task into different runs for each letter. Now there are 27 processes churning away at the letters a-z, and the overall throughput is a lot higher, at approximately 180 words per minute. That means the total user time would be little over 5 hours; which is about factor 27 faster than a week!

And once this has finished, I can set it running for the remaining features while working with the data that has then been gathered already.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: