Tommy Ogden

Sieving for Primes

A common task in number theory problems is to find all the primes up to a number $n$. Project Euler is a great resource for problems like this. A simple method of finding primes, the Sieve of Eratosthenes, was known in Ancient Greece. You start with 2 and discard as non-prime all multiples of 2 up to $n$. You then move onto the next number you haven’t discarded, 3, and mark as non-prime all multiples of 3 up to $n$. We’ve discarded 4, so we move on to 5 and continue. We can stop when we get to $\sqrt n$ as we’ll have discarded all the multiples by then.

Here’s what the algorithm looks like for $n = 400$.

When should we use the Sieve? In this notebook I’ll compare it to a naive iterative search by writing example algorithms in Python.

Our first approach is to iteratively build a list of primes up to a limit $n$.

When checking a number $i$ for primality, we only need to check prime factors, so we can check the list as we’re building it. Also, in checking if a number $i$ is prime, we only need to check for factors up to $\sqrt{i}$.

We’ll write a method that appends the next prime to an existing list of primes.

def primes_next(primes):
"""Take an ordered list of prime numbers and append the next

primes: list of ordered primes, with the last prime odd,
e.g. [2,3].

The list primes with the next prime appended,
e.g. [2,3,5].

i = primes[-1]
while True:
i += 2
for p in primes:
if p**2 > i: # No factors found, i prime
return primes
if i%p == 0: # Factor found, try next i

So for example, if we have the list [2, 3, 5] we expect to return [2, 3, 5, 7].

print(primes_next([2, 3, 5]))

Now we repeat this until we get to $n$.

def primes_iterative(n):
""" Build a list of the primes up to n iteratively. """

p = [2,3]
while p[-1] <= n:
p.pop() # Discard the last one.
return p

We’ll compare the methods later by counting the number of primes up to $n$ each returns.

def sol_a(n):
""" Count the number of primes up to n iteratively."""
return len(primes_iterative(n))
print("The number of primes up to 1000 is {0}.".format(sol_a(n=1000)))

Solution B: Sieve of Eratosthenes

For the Sieve of Eratosthenes algorithm, we’ll use some helper functions. First we want a method to sieve out the multiples of a factor $f$.

def sieve_multiples(A, f):
"""Set the primality of multiples of f to False.

A: List of booleans representing primality of each index.
f: Factor to find multiples of.

- A is indexed such that A[0] represents 1, A[1] represents 2, etc.
- Only sieves factors greater than f^2.

for i in range(f**2-1, len(A), f):
A[i] = False

Note that we only need to sieve for factors greater than $f^2$, as the smaller multiples will already have been discarded.

A = [True]*10
sieve_multiples(A, 2)
sieve_multiples(A, 3)
sieve_multiples(A, 5)

Next a couple of simple helper methods. One to get us the next True value in a boolean list.

def next_factor(A, start=0):
"""Returns the next True index in A, after start."""
return next((i for i, a in enumerate(A[start:], start=start) if a), None)

Another to return all the indexes of all remaining True values, shifted by 1. Once we’ve performed the sieve, this will represent the reminaing primes.

def remaining(A):
"""Returns the indexes of all remaining True values, shifted by 1."""
return [i+1 for i, a in enumerate(A) if a]

Now we’re ready to perform the sieve.

def primes_eratosthenes(n):
""" Build a list of the primes below n by the Sieve of Eratosthenes. """

A = [True]*n
A[0] = False # 1 is not prime

primes = []
p = 1 # 2 is prime

while p**2 < n: # Only need to check up to sqrt(n)
A[p] = False
sieve_multiples(A, p+1)
p = next_factor(A, p)

primes.extend(remaining(A)) # All remaining must be prime.

return primes

Again we’ll write a count method to compare.

def sol_b(n):
""" Count the number of primes up to n by the Sieve of Eratosthenes. """
return len(primes_eratosthenes(n))
print("The number of primes up to 1000 is {0}.".format(sol_b(n=1000)))

Compare Counts

To check the result of the methods against each other, we’ll try a few values of $n$. The methods could both be wrong of course.

N = [5, 10, 17, 20, 50, 100, 1000, 10000, 100000, 1000000]

for i, n in enumerate(N):
assert(sol_a(n) == sol_b(n))

Complexity Analysis

Finally, we’ll compare timings by solving the problem over a range of $n$ values.

N = [10, 20, 50, 100, 1000, 10000, 100000, 1000000, 10000000]

sol_times_a = [0.0]*len(N)
sol_times_b = [0.0]*len(N)

for i, n in enumerate(N):
result = %timeit -o sol_a(n)
sol_times_a[i] =

result = %timeit -o sol_b(n)
sol_times_b[i] =
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 5))
plt.loglog(N, sol_times_a, label='A: Iterative', marker='o', clip_on=False)
plt.loglog(N, sol_times_b, label='B: Eratosthenes', marker='o', clip_on=False)
plt.ylabel('time (s)');
Fig. 1: Best-of-three timings.

In figure 1 we compare the best-of-three timings on my laptop of the two methods over a range of $n$ values. We see that the iterative search is quicker for $n = 10$, but for $n = 20$ and beyond, the Eratosthenes method is quicker. Over this range, both are polynomial below quadratic.

speedup = [sol_times_a[i]/sol_times_b[i] for i, x in enumerate(sol_times_b)]

plt.figure(figsize=(10, 5))
plt.semilogx(N, speedup, marker='o', clip_on=False)
plt.axhline(1, c='grey', ls='--', lw=1)
plt.ylabel('Eratosthenes Speedup');
Fig. 2: Speedup of the Eratosthenes method.

In figure 2 we show the speedup of the Eratosthenes method over the iterative search. The speedup peaks at about 7$\times$ faster at $n = 10^4$. Beyond this Eratosthenes method is still significantly faster, but the speedup is coming down. This is likely due to the fact that the Sieve of Eratosthenes for high $n$ becomes memory intensive, as we need to store all of the numbers up to $n$, whereas with the more computationally intensive iterative search, we only need to store the primes we’ve already calculated.

The runtimes at $n = 10^7$ are in the order of minutes and already long enough that we’d want to look at further tuning of the algorithm beyond this.