Algorithm for computing all "prefixes" of an expression a1#a2#...#an, where # is some associative binary operation (e.g. addition, multiplication, AND, OR, XOR, or something more exotic, such as the weird monoid used in the carry lookahead adder). That is, parallel prefix is a way of calculating

a1
a1#a2
a1#a2#a3
...
a1#a2#a3#...#an
You can calculate them all in linear time; the parallel algorithm uses more computational resources (does more work), but takes only logarithmic time.

Ω(n) processors are needed to do this. For convenience, I'll assume n=2k. First, calculate all these pairs:

a1#a2
a3#a4
...
an-1#an
(n/2 independent operations, time O(1)).

Now combine every adjacent pair of results, to get n/4 "quartets" (n/4 independent operations, time O(1)). Continue in this manner, so as to have 2n-k results of combining 2k elements. Total for this stage: time O(k)=O(log n).

Create all the prefixes by combining the largest possible blocks, using the binary representation as a guide which blocks to combine. For instance, 1310=11012, so we compute

(a1#...#a8)#(a9#...#a12)#(a13)
where all the bracketed expressions have already been calculated. These are n independent operations, each taking time O(k)=O(log n) (the number of binary digits in a number 1,...,n).

We see that the algorithm finishes in time O(log n), and computes all needed prefixes.

Of course, a similar result holds in circuits, since no logic (beyond that for computing #) is required. Just hook up each step. It uses O(n log n) gates and runs in time O(log n).

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.