3 min to read
Parallel Computing: Amdahl's Law

I was following an e-learning course on Parallel Programming with Java on LinkedIn, and it presented Amdahl’s Law to describe the potential overall speed-up when using parallel code.
If you are like me, you probably tried to optimize the number of multi-threads in your multi-threaded programs. Often the gain of speed grows sub-linearly (less than linear) to the number of processing cores. Too many threads, they will fight against each other for the resource. Too few threads, you will be under-utilizing your resource. Nevertheless, the speed ratio will rarely be close to the number of cores.
What I like about Amdahl’s law is that it provides a simple approximation of the speed gain. But remember this theoretical formula is precise only if all the hypotheses and input variables of the model are observed.
Amdahl’s Law
Amdahl’s law states that the overall speed-up ratio $s$ when using $n$ processing cores is: \[ s = \frac{1}{(1-p) + \frac{p}{n}} \] where is the proportion of the program that can be run in parallel, using all $n$ cores.
If you look at the denominator, the formula computes a simple weighted ratio of the speed-up. I think the limit on $s$ when $n \rightarrow \infty$ is more obvious if we re-formulate the equation using $q = 1 - p$, that is, the proportion of code that cannot be parallelized. Assuming $q > 0$ then: \[ s = \lim_{n \rightarrow \infty} \frac{1}{q + \frac{1-q}{n}} = \frac{1}{q}. \] The result is pretty obvious. Even if you have a billion CPU cores, the speed-up will be limited by the non-parallelized section of your program.
For example, if:
- $q=50\%$ of your code is not multi-threaded, your speed-up is at most 2x.
- $q=20\%$ of your code is not multi-threaded, your speed-up is at most 5x.
When designing your program and algorithms, a minimal $q$ is critical if you aim to have scalable code w.r.t. to number of CPUs. Finding and optimizing the critical paths of your program can help.
In practice, finding your actual values of $p$ or $q$ and $n$ is not trivial. Even if your CPU has $n$ cores, if doesn’t mean it can run $n$ threads in parallel. See the next section on logical cores.
Physical versus Logical cores
If you have been using Intel CPU chips, then you have probably heard of hyper-threading and physical and logical cores. Note that when I check on Intel and AMD websites today (May 2020), they advertise them as cores and threads, which are more accurate terms.
In layman’s term, with hyper-threading enabled, each physical core can run 2 threads in parallel, if the instructions are on different parts of the core. Therefore, the parameter $n$ is likely to be smaller than the actual number of logical cores or threads specified by your CPU.
How many CPU cores do I need?
Let’s re-define $s$ as a function of $n$: \[ s(n) = \frac{1}{q + (1-q)/n} = \frac{n}{qn + 1-q}. \]
The first derivative is positive: \[ \frac{\delta}{\delta n}s(n) = \frac{1-q}{(qn-q+1)^2} > 0 \] if $q < 1$. The good news is that will always be some gain by adding more cores.
The bad news is that the second derivative always negative, given $0 < q < 1$: \[ \frac{\delta^2}{\delta n}s(n) = \frac{-2(1-q)q}{(qn-q+1)^3} < 0. \] This means there is a diminishing return as $n$ increases.
You probably need less CPU cores than you think (or hope).
Optimization problem
We assume there is an unlimited number of jobs waiting. By using multi-threading, you can process more jobs and generate more revenue. Assume that you generate a revenue of $\alpha$ if $n=1$. Therefore, your profit by running $n$ cores is $\alpha s(n)$ because you can process more jobs.
We assume a fixed cost for buying a total of $n$ cores, let’s says $\beta n$. We do not consider price depreciation. But if you consider $\alpha$ as revenue per month, then you set $\beta$ as the (adjusted) monthly cost.
We have a profit maximization problem, where we want the optimal $\bar{n}$ \[ \bar{n} = \arg \max_{n \ge 1} (\alpha s(n) - \beta n), \] where $n$ is an integer greater or equal to 1.