Multiplying two 32-bit numbers without using 64-bit variable
---15 Aug 2023---
In this post, we figure out how to multiplying two 32-bit positive numbers by only using 32-bit variables.
Suppose that we are given two positive numbers \(x,y \in [0, 2^{32}-1]\), the goal is to calculate \(z = x \times y\) where \(z \in [0, 2^{64} - 1]\) without using any 64-bit variable (and also, without using conditional expressions).
We use two variables, say \(z_1\) and \(z_0\), to represent \(z\) where \(z_1\) is 32 upper bits and \(z_0\) is 32 lower bits of \(z\), respectively. In other words, \(z = z_1 \times 2^{32} + z_0\). So, our goal now is to find \(z_1\) and \(z_0\) using only calculations on 32-bit variables.
The first and naive idea is to decompose \(x\) and \(y\) into:
\[x = x_1 \times 2^{16} + x_0,\] \[y = y_1 \times 2^{16} + y_0,\]where \(x_0, x_1, y_0, y_1 \in [0, 2^{16}-1]\) are 16-bit numbers. We then have:
\[z = x_1y_1 \times 2^{32} + (x_0y_1 + x_1y_0) \times 2^{16} + x_0y_0\]When we see the factor \(2^{32}\) here, we might say that let’s take \(z_1 = x_1y_1\) and \(z_0 = (x_0y_1 + x_1y_0) \times 2^{16} + x_0y_0\) !?
Be careful! It is a bit more complicated than that. Why? Let’s take a look on \(x_0y_0\). Notice that \(x_0\) and \(y_0\) are 16-bit numbers. So their product is maximum 32 bits. It is similar to \(x_0y_1\) and \(x_1y_0\). Thus, \((x_0y_1 + x_1y_0) \times 2^{16} + x_0y_0\) does not fit 32 bits of \(z_0\).
To deal with this, instead of considering two 32-bit parts \(z_1\) and \(z_0\), we consider \(z\) as four 16-bit parts \(a_3, a_2, a_1, a_0 \in [0, 2^{16}-1]\) such that:
\[z = a_3 \times 2^{48} + a_2 \times 2^{32} + a_1 \times 2^{16} + a_0,\]or
\[z_1 = a_3 \times 2^{16} + a_2,\] \[z_0 = a_1 \times 2^{16} + a_0.\]Now we sequentially find \(a_0\), \(a_1\), \(a_2\) and \(a_3\). Before starting, we need to introduce the following two functions for a 32-bit number \(v\):
- \(\mathsf{UP}(v)\): returns the 16 upper bits of \(v\),
- \(\mathsf{LO}(v)\): returns the 16 lower bits of \(v\).
Firstly, we start with \(a_0\). Since only \(x_0y_0\) involves in the value of \(a_0\), it can be sure that \(a_0\) is the 16 lower bits of \(x_0y_0\). We write:
\[a_0 = \mathsf{LO}(x_0y_0).\]Secondly, we find \(a_1\). We have that \(x_0y_1\), \(x_1y_0\) and \(\mathsf{UP}(x_0y_0)\) involve in the value of \(a_1\). We need to calculate the sum of them and \(a_1\) is the 16 lower bits of the sum. However, we have to be careful with the sum’s calculation, otherwise, it can cause overflow. To have a closer look, we assign the maximum values for \(x_0, x_1, y_0, y_1\). Namely,
\[x_0 = x_1 = y_0 = y_1 = 65535 = (\texttt{FFFF})_{16},\]then
\[x_0y_1 = x_1y_0 = 4294836225 = (\texttt{FFFE0001})_{16}\]We can observe that it is still possible to add \(2 \times \texttt{FFFF}\) (twice of the maximum value of 16 bits) to \(x_0y_1\) or \(x_1y_0\) before it reaches the maximum value of 32 bits. Plus, we cannot calculate \(x_0y_1 + x_1y_0\) directly. So, we do the following steps:
\[a_1 = \mathsf{LO}(x_1y_0 + \mathsf{LO}(x_0y_1 + \mathsf{UP}(x_0y_0)))\]Thirdly, we find \(a_2\). We have that \(x_1y_1\) and all the previous calculations (\(x_0y_1\), \(x_1y_0\) and \(\mathsf{UP}(x_0y_0)\)) involve in the value of \(a_2\) (in case their sums are propagatedly over 16 bits). We pay attention to the previous calculations in \(a_1\) where we only take the lower bits, including \(x_0y_1 + \mathsf{UP}(x_0y_0)\) and \(x_1y_0 + \mathsf{LO}(x_0y_1 + \mathsf{UP}(x_0y_0))\). Now:
\[\begin{align} a_2 = \mathsf{LO} &( \\ &x_1y_1 + \\ &\mathsf{UP}(x_0y_1 + \mathsf{UP}(x_0y_0)) + \\ &\mathsf{UP}(x_1y_0 + \mathsf{LO}(x_0y_1 + \mathsf{UP}(x_0y_0))) \\ &) \end{align}\]Finally, we find \(a_3\). It is similar to \(a_2\), but we take the upper bits at the end:
\[\begin{align} a_3 = \mathsf{UP} &( \\ &x_1y_1 + \\ &\mathsf{UP}(x_0y_1 + \mathsf{UP}(x_0y_0)) + \\ &\mathsf{UP}(x_1y_0 + \mathsf{LO}(x_0y_1 + \mathsf{UP}(x_0y_0))) \\ &) \end{align}\]NOTE: It is not necessary to calculate \(a_2\) and \(a_3\) separately, because:
\[z_1 = a_3 \times 2^{16} + a_2 = x_1y_1 + \mathsf{UP}(x_0y_1 + \mathsf{UP}(x_0y_0)) + \mathsf{UP}(x_1y_0 + \mathsf{LO}(x_0y_1 + \mathsf{UP}(x_0y_0))))\]Voilà. Now we have \(z_1 = a_3 \times 2^{16} + a_2\) and \(z_0 = a_1 \times 2^{16} + a_0.\) Below is the C implementation and here is the link to github: https://github.com/nvietsang/mult32