Monday, July 21, 2008

On Parallel Computing and Programming

When dealing with high performance parallel code on a daily basis, you will find yourself having to think in mostly common terms. These usually distill into some common concepts which I would try to discuss in the next series of articles covering a broad range of topics including concurrency, functional programming, graph theory, and software design. Of course, I will put these in the C++ context and how they map to the work I usually deal with on a daily basis as a C++ programmer.

Generally, you can find that there is a simple matrix of four different parallel programming patterns differentiated by two axes: grains of parallelism and the type of parallelism. There are two 'degrees' of grains when it comes to parallelism: coarse and fine. Then there are two types of parallelism: computational parallelism and data parallelism.

Given these two factors, we get four different quadrants of parallelism:

  • Coarse Grained, Computationally Parallel Programs -- these are programs that have large computation-intensive parts that operate in parallel. An example would be batch processing systems where individual jobs would run concurrently of each other almost independent of each other in coarse grains of 'tasks'.
  • Fine Grained, Computationally Parallel Programs -- these are programs that have small computation-intensive parts that operate in parallel. An example would be graphics rendering programs where short-lived 'tasks' work concurrently usually (but not necessarily) synchronizing on data structures being manipulated.
  • Coarse Grained, Data Parallel Programs -- these are programs that have large data-processing (usually I/O, data filtering, data related routines) parts that operate in parallel. Examples would be data-mining applications that deal mostly with concurrent processing of data rather than actual computations.
  • Fine Grained, Data Parallel Programs -- these are programs that have smaller data-processing parts that operate in parallel, usually (but not necessarily) synchronizing on I/O. One example would be a parallel relational database management system where processing data that comes from many different sources happens concurrently to produce a coherent 'view' of the data.

There are some patterns that arise unique to the quadrant in which your program is located in. I will talk more about these patterns as I write more about them and the patterns I am talking about.

At a more fundamental level however, in any parallel system you will have to delineate regarding the following factors: concurrent computation, data dependencies, and task granularity. These factors all taken together will help you determine how best to approach the problem you are solving, and eventually what kind of program you will be writing.

In this parallel view of things there are a few terms that need to be defined. The first term to be defined is a task which I will hastily define as:

A unit of work which is considered atomic, or executed in a single thread of execution. Usually tasks are modeled as data objects with metadata, manipulated and executed externally.

Using this definition of a task, you can then look at a parallel program as a set or list of tasks which need to be executed -- each task having its own data dependencies (metadata) that gets executed externally by threads of execution. More concretely, these threads of executions can be either real threads (hardware threads), "green" threads (software-driven threads), operating system threads, or remote compute nodes (in the case of Cluster of Workstations or Beowulf clusters).

It's interesting to note that there are arguably infinitely many ways of breaking a problem (or solution) up into tasks, and the challenge really is to find the right recipe to achieve both speedup and efficiency. I will also write more about the difference between just speeding things up, and making your programs more efficient. Effectively, we would also like to consider how the program would scale as the problem sizes increase.

In the next few posts I will discuss ways of breaking up a problem and eventually the solution(s) down into tasks that get processed in parallel. These will involve brute force "pen-and-paper" approaches as well as analytical graph theoretic approaches. I will also present them in the context of C++, maybe highlighting the use of the Intel Threading Building Blocks as well as the Boost C++ Library.

This post is meant to be an introductory in a long line of posts related to parallel computing with C++.

If you would like to learn more about turning your application into parallel applications that leverage multi-core technologies, or are interested in hiring a consultant that has experience with parallel computing and C++, send an email to dean@cplusplus-soup.com -- I should be able to help you connect with people who can help, or be available for consulting depending on the situation.

1 comments:

Jon said...

Looking forward to the upcoming articles, keep up the good work!