Joe Cincotta: Thoughts and such…

Icon

Nerdism for the masses.

A Practical Exmaple of Multi-Threading

I will briefly describe the way we improved performance using multithreading below, but first – please understand I was actually pleased that DevX are willing to do the ‘Gauntlet’ series on Intel home turf (their dev center on DevX) so the subterfuge was really referring to the general theme of Intel advertising aimed at developers lately and not this specific series. Please keep binging it on!

One thing you should realize though is that I honestly don’t think huge numbers of organizations are looking to optimize their code for multi-core architectures just yet – however the big selling point for many organizations right now is the immediate ROI for organizations in power savings both in computational efficiency per watt and also in reduced cooling and associated infrastructure costs when moving to multi-core. Its real.

Ok, iVisual performance improvements: let me explain a very brief overview of how we segment the system from a data-model perspective to help understand how we break down the threading model.

1: We have Customers
2: Customers have multiple Locations
3: Locations have multiple Displays

The thing which takes the time is the render pipeline and is the focus of this examination. There are three stages to the render pipeline:
1: The first stage downloads XML feeds from 3rd party data sources and homogenizes the data into a database
2: The second stage composites the data on to the customer-location templates and downloads third party images referred to in the data feeds
3: The system packages each location in to a compressed package for distribution

When we started in a small scale proof of concept the whole thing was sequential and as it scaled it took hours. Multithreading was the answer, but it needed a clear strategy and this is how we broke it down:

First we looked at the stages of the render pipeline and examined the code to look for immediate code optimization (not necessarily multithreading, just common sense performance improvements). Now, I don’t have a heavyweight profiler but I was still able to analyze my code without fancy tools – the answer is in two things, use Windows perfmon to watch relationships between CPU, Disk, Network and Database when running strenuous code and secondly, just by using Log4Net (The Apache open source logging project) and taking a DateTime.Now before and after a block of code which you suspect of being ‘hot’ you can easily see performance metrics and you optimize the code.

Once the optimization was done we look at where we can see a use for multithreading. The first download stage was really quite linear and the process of downloading a large chunk of data was the thing which took the most time, so we left it alone.

The second stage clearly could do with multi-threading, but how should we split it out. First, we changed the way images get downloaded – we started caching them as we parsed each template; but we used a common directory for storing files. The image caching engine would go and download an image when it doesn’t have it – so it would block the caller until it came. The ad-hoc nature of the process meant that we should really rearchitect the whole approach.

If we sequentially pre-cached every image before rendering we would know we had every image, but due to the blocking nature of the web calls to retrieve images – it would still take forever. We put the process of downloading an image out to a thread – but don’t just loop through every image and run the process of downloading to a thread as this will saturate your network and CPU and OS and nobody will want to play nice with you anymore. We use an ‘artificial’ thread-pool – that is our own round-robin model pool of thread where we control how many are running at once. This means we needed some fault tolerance in case of thread collissions where multiple locations for the same customer use the same image and we happen to try and download it at the same time. Its highly improbable, but as an aside, this is a good example of the sheer complexity which multihtreading adds to system design.

We also chose to set the parameters for determining the size of the thread pool as an application configuration parameter which means we can tweak performance of the system without recompiling.

If you’re really clever you will see how we chose to handle our logical breakdown of the system to suit multithreading. We used Locations as the target for the main threading model – so instead of looping through a list of locations and calling Location(i).Render(); we create a group of worker classes which contain references to each Location object and when run will Render() their location. Again, we use the artificial thread-pool design to manage how many locations are being rendered concurrently. Because a location deals with its own content in its own directory structure, we didn’t have to worry about any thread contention or race issues with rendering.

Finally, the CPU intensive process of compression was also moved to a thread-pool model and again it improved performance dramatically.

The change to multiple thread-pools running meant we needed to have an orchestrator which managed the seperate thread pools and ensured that only one was running at once and that they could not be invoked multiple times (through the web administration interface) so we ended up with a ‘Render Pipeline’ which was the only way to invoke on these three elements for added safety.

The really funny thing is that the most CPU intensive part of the process, the compression stage, is the fastest now – we use a quad-xeon with HT setup and it makes short work of these things.

The place where mutithreading really came to shine is in allowing our code to improve the overall load across the system – not just raw CPU improvements. We improved our caching speed by 10 times because our network was able to handle 10 concurrent connections to the content servers for precaching images.

Multithreading also allowed us to push the database server a little more during the rendering process which is both CPU intensive and massively IO intensive.

Bottom line: Multi-threading on Multi-core SMP architectures allows your code to get more than just extra CPU cycles – you can squeeze more power out of your whole system by removing latency and unnecessary waiting.

Hope this helps shine some light on a practical implementation of threading.

Advertisement

Filed under: agile development, Software Development

One Response

  1. Craig Phillips says:

    Joe you are such a braniac. I never understood anything you were working on at uni and I still dont. How you have the headspace for all this tech stuff Ill never know! Maybe one day when we are quiet with work(like that ever happens) we can do a project together purely for fun. You know, animation, 3d stuff etc. I design the pictures you make them move.

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.