generating primes with a wheel

Warning: all code is in Racket!

One of the assignments from my CS 145 class was to generate as many twin primes as possible in a specified amount of time. To have received full marks, you needed less than a thousand pairs. My original solution using an optimized trial division got me roughly thirty thousand pairs. But of course, this being the advanced offering of Waterloo’s CS course, some people generated over a million pairs.

While I don’t want to dedicate so much time to constant optimization, I’m still interested in the algorithms behind how they achieved such values. So let’s investigate.

For this article, and the sake of simplicity, I will restrict the problem to just generating a list of primes (not twin pairs) up until some integer limit. The solution boils down to checking the primality of all numbers up until some N, and appending to a list if it is prime. I’ll discuss four different approaches for doing so, in the order of decreasing time complexity, and also in the order of my thought process working through this problem. These are trial division, optimized trial division, Miller-Rabin primality testing, and the prime wheel (Why no sieving?).

trial division

This algorithm is probably the most naive, but intuitive solution. A number ${p}$ is prime if and only if it only has two factors, being one and itself. Thus, an integer ${n > 1}$ is not prime if it has a factor ${1 < a < n}$ . So it suffices to check every integer ${1 < a < n}$ , see if it divides ${n}$ , and conclude accordingly. Each check is ${O(n)}$ . Here’s what this looks like in Racket:

(define (prime? n)
  (define (loop i)
    (cond [(= i n) true]
          [(zero? (remainder n i)) false]
          [else (loop (add1 i))]))
  (loop 2))

(define (get-primes limit)
  (define (helper i acc)
    (cond [(> i limit) acc]
          [(prime? i) (helper (add1 i) (cons i acc))]
          [else (helper (add1 i) acc)]))
  (helper 2 empty))

Here are some benchmarks, note I’m deferring the output with empty?:

> (time (empty? (get-primes 1000)))
cpu time: 2 real time: 2 gc time: 0

> (time (empty? (get-primes 10000)))
cpu time: 281 real time: 258 gc time: 134

> (time (empty? (get-primes 100000)))
cpu time: 14877 real time: 15424 gc time: 4370

optimized trial division

An immediate observation that can be made is to only check factors ${1 < a \le \sqrt{n}}$ . The reasoning behind this is explained in the Wikipedia article so I won’t explain it again here. Each check is ${O(\sqrt{n})}$ . In Racket, we only need to change the (prime? n) definition.

(define (prime? n)
  (define (loop i)
    (cond [(> (* i i) n) true]
          [(zero? (remainder n i)) false]
          [else (loop (add1 i))]))
  (loop 2))

Here are the benchmarks:

> (time (!! (empty? (get-primes 1000))))
cpu time: 5 real time: 5 gc time: 0

> (time (!! (empty? (get-primes 10000))))
cpu time: 68 real time: 71 gc time: 4

> (time (!! (empty? (get-primes 100000))))
cpu time: 1178 real time: 1350 gc time: 40

> (time (!! (empty? (get-primes 1000000))))
cpu time: 29286 real time: 32928 gc time: 1142

miller-rabin primality test

I first stumbled across the Miller-Rabin test back in high school while bashing out competitive programming problems. This is a probabilistic primality test, meaning it will output whether or not a number is likely to be prime.

It’s based on the idea that if an odd integer ${n > 2}$ is prime, then certain equations involving ${n}$ should hold. Let’s rewrite ${n - 1}$ as ${2^s \cdot d}$ , where ${s}$ is a positive integer and ${d}$ is odd. Choose some base ${a}$ , and the two equations are:

${a^d \equiv 1 \pmod{n}}$
${a^{2^r \cdot d} \equiv -1 \pmod{n}}$ for some ${0 \le r < s}$

We can conclude ${n}$ is composite if neither of the equations holds. Otherwise, ${n}$ is probably prime. To increase the probability of primality, we can test with multiple values of ${a}$ . In total, if we test with ${k}$ values of ${a}$ , the time complexity is ${O(k \log^3 n)}$ .

An observation to be made is that the primes we generate will probably never go over around four million, and thus we can test with a few hardcoded values for ${a}$ .

Here is the code in Racket:

(define (qpow base exp mod)
  (define (helper a exp acc)
    (cond
      [(zero? exp) acc]
      [(even? exp) (helper (modulo (* a a) mod)
                           (quotient exp 2) acc)]
      [else (helper (modulo (* a a) mod)
                    (quotient exp 2)
                    (modulo (* acc a) mod))]))
  (helper base exp 1))

(define (check-composite? n base d s)
  (let ([x (qpow base d n)]
        [sub1n (sub1 n)])
    (cond [(= x 1) false]
          [(= x sub1n) false]
          [else
           (define (loop i last_x)
             (cond [(= i s) true]
                   [(= last_x sub1n) false]
                   [else (loop (add1 i) (remainder (* last_x last_x) n))]))
           (loop 1 (remainder (* x x) n))])))

(define bases (list 2 3 5 7 11))
(define (prime? n)  ; assume n is odd
  (define-struct pair (first second))
  (define (get-rd r d)
    (if (odd? d)
        (make-pair r d)
        (get-rd (add1 r) (quotient d 2))))
  (let* ([rd (get-rd 1 (sub1 n))]
         [r (pair-first rd)]
         [d (pair-second rd)])
    (define (loop bases)
      (cond [(empty? bases) true]
            [(= n (car bases)) true]
            [(check-composite? n (car bases) d r) false]
            [else (loop (cdr bases))]))
    (loop bases)))

Here are the benchmarks:

> (time (!! (empty? (get-primes2 1000))))
cpu time: 14 real time: 15 gc time: 1

> (time (!! (empty? (get-primes2 10000))))
cpu time: 137 real time: 148 gc time: 6

> (time (!! (empty? (get-primes2 100000))))
cpu time: 1393 real time: 1566 gc time: 45

> (time (!! (empty? (get-primes2 1000000))))
cpu time: 17087 real time: 19072 gc time: 1937

It’s interesting to notice that this algorithm isn’t necessarily faster than the optimized trial division, at least for smaller numbers.

prime wheel

Something you might have noticed in common with all the previous algorithms is that we are checking every even number for primality. Even numbers are divisible by two and thus aren’t prime. So we can advance our loop function by two each time, skipping all even numbers. We’ve effectively reduced the number of checks we need to do by half. Taking this a step further, why not skip all multiples of three as those are not prime either, but are pretty common. And one final step let’s skip all multiples of five. This significantly reduces the amount of numbers to check. Let’s see what this looks like on the integers 1 through 30.

1  2  3  4  5  6  7  8  9  10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
---------- becomes ----------
1                 7
11    13          17    19
      23                29

Let’s call the remaining numbers our “candidates”. The commonality among all candidates is that none of them are divisible by 2, 3, or 5, and thus all have higher chances of being prime. So for integers 1 through 30, we now only need to run the primality test eight times. We only need to do ~26% of the work as before. To check the next 30 integers, that is, 31 through 60, we add 30 to each of the candidates. This gives

31, 37, 41, 43, 47, 49, 53, 59

None of these integers are divisible by any of 2, 3, or 5. Notice that 30 is the product of 2, 3, and 5. It can be shown that ${30k + c}$ for some non-negative integer ${k}$ and candidate ${c}$ is never divisible by 2, 3, or 5. With that, we’ve built our prime wheel of 2, 3, and 5. It’s possible to extend the wheel to more primes but at a certain point, we will suffer from diminishing returns.

Note: this algorithm is very similar to sieving, but works better for infinite sequences.

To convert this to Racket, we first need a function to build the prime wheel or the list of candidates.

(define (coprime? a b) (= (gcd a b) 1))

(define (build-wheel primes)
  (define product (apply * primes))
  (define (helper acc i)
    (cond [(> i product) acc]
          [(andmap (λ (p) (coprime? p i)) primes)
                   (helper (cons i acc) (add1 i))]
          [else (helper acc (add1 i))]))
  (reverse (helper empty 1)))

;; (build-wheel (list 2 3 5))
;; '(1 7 11 13 17 19 23 29)

And now the function to generate the (infinite) list of primes. (Disclaimer: I couldn’t find a way to make this work elegantly outside of lazy Racket.)

#lang lazy

(define (prime-wheel primes)
  (define wheel (build-wheel primes))
  (define product (apply * primes))

  (define (wheel-sequence candidates offset)
    (cond
     [(empty? candidates)
      (let ([new-candidates (map (λ (p) (+ offset p)) wheel)])
        (wheel-sequence new-candidates (+ offset product)))]
     [else
      (let ([current (car candidates)])
        (if (prime? current)  ; same prime? as defined before
            (cons current
                  (wheel-sequence (cdr candidates) offset))
            (wheel-sequence (cdr candidates) offset)))]))

  (append primes (wheel-sequence (cdr wheel) product)))

(define primes-seq (prime-wheel (list 2 3 5 7)))

;; (!! (take 15 primes-seq))
;; '(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47)

With this, we can make define get-primes.

(define (get-primes limit)
  (define (helper primes acc)
    (if (> (car primes) limit)
        acc
        (helper (cdr primes)
                (cons (car primes) acc))))
  (helper prime-seq empty))

Finally, some benchmarks:

> (time (!! (empty? (get-primes 1000))))
cpu time: 11 real time: 13 gc time: 1

> (time (!! (empty? (get-primes 10000))))
cpu time: 111 real time: 105 gc time: 47

> (time (!! (empty? (get-primes 100000))))
cpu time: 667 real time: 726 gc time: 55

> (time (!! (empty? (get-primes 1000000))))
cpu time: 6352 real time: 7415 gc time: 701

This is quite an improvement over Miller-Rabin.

what about sieving?

As far as I know, the Sieve of Eratosthenes algorithm only works on a finite array of numbers and works better with random access data structures. I couldn’t figure out an efficient way to make it work for Racket, or for this problem where we needed an infinite sequence.