Krishna Agaram

The prime number theorem, for dummies

A discussion with some friends on the LCM of the first $n$ natural numbers led to a simple, very elementary proof of a relaxation of the prime number theorem. I had to write it up. Of course, since it is so elementary, it is likely not new.

The prime number theorem states that the number of primes less than $n$, denoted $\pi(n)$, is asymptotically $\frac{n}{\log n}$. Formally,

\[\lim_{n \to \infty} \frac{\pi(n)}{n/\log n} = 1.\]

The relaxation of the prime number theorem that we will show is that $\pi(n) = \Theta(\frac{n}{\log n})$. In other words, it grows as $n/\log n$ up to constant factors: there exist constants $c_1, c_2 > 0$ such that for sufficiently large $n$, we have

\[c_1 \frac{n}{\log n} \leq \pi(n) \leq c_2 \frac{n}{\log n}.\]

This proof of the upper bound is folklore. The idea is to examine the quantity ${2n \choose n}$. Notice that every prime $p$ such that $n < p < 2n$ divides ${2n \choose n}$, since $p$ appears in the numerator but cannot appear in the denominator. Therefore, we have

\[\prod_{n < p < 2n} p \leq {2n \choose n} \leq 4^n,\]

where the upper bound follows from the binomial theorem: ${2n \choose n} \leq (1+1)^{2n}$. For now, we can use the crude bound that each prime between $n$ and $2n$ is at least $n$. Since the number of terms in the product is $\pi(2n) - \pi(n)$, we obtain

\[n^{\pi(2n) - \pi(n)} \leq \prod_{n < p < 2n} p \leq 4^n \implies \pi(2n) - \pi(n) \leq \frac{n \log 4}{\log n}.\]

Repeating this argument for $n/2$, $n/4$, and so on and adding the resulting inequalities (the left hand sides telescope) will give us the upper bound $\pi(n) \leq c n/\log n$ for some constant $c$. The details need a bit of care and may look somewhat arbitrary at first, but it is standard fare in analysis and will be second-nature after performing some such arguments. Here is the essential idea. We have, via the telescoping, that

\[\pi(2n) \leq \pi(n) + \frac{n \log 4}{\log n} \leq \pi(n/2) + \frac{n \log 4}{\log (n/2)} + \frac{n \log 4}{\log n} \leq n\log 4\sum_{k = 0}^{\log_2 n - 1} \frac{1}{2^k \log (n/2^k)}.\]

The $1/2^k$ factors are easy to bound via the geometric series. The $\log (n/2^k)$ factors are a bit more annoying, but we can get away cleverly: notice that for smaller $k$ with $n/2^k > \sqrt n$, we have $\log (n/2^k) \geq \log \sqrt n = \frac12 \log n$. And we shall deal with larger $k$ by simply not telescoping them. Let $\widetilde{\sqrt{n}}$ denote the largest number of the form $n/2^k$ for some $k$ that is at most $\sqrt{n}$, i.e., $k = \lceil \log_2 \sqrt{n} \rceil$. Then we telescope down to $\widetilde{\sqrt{n}}$ and get

\[\pi(2n) \leq \pi(\widetilde{\sqrt{n}}) + n\log 4\sum_{k = 0}^{\lceil \log_2 \sqrt{n} \rceil - 1} \frac{1}{2^k \log (n/2^k)}.\]

The first term we bound in trivial fashion: there are at most $\widetilde{\sqrt{n}}$ numbers less than $\widetilde{\sqrt{n}}$, and at most all of them are prime (we can do much better here: for example, at most half are prime) so $\pi(\widetilde{\sqrt{n}}) \leq \widetilde{\sqrt{n}} \leq \sqrt{2n}$. For the second term, as discussed, we have $\log (n/2^k) \geq \log \sqrt{n}$ for all $k$ in the range of the sum (since we chose $\widetilde{\sqrt{n}}$ to be the first time this inequalty is violated). The second term is therefore bounded like so:

\[n\log 4\sum_{k = 0}^{\lceil \log_2 \sqrt{n} \rceil - 1} \frac{1}{2^k \log (n/2^k)} \leq n\log 4\sum_{k = 0}^{\lceil \log_2 \sqrt{n} \rceil - 1} \frac{1}{2^k \log \sqrt{n}} = n\log 4\frac{2}{\log n}\sum_{k \geq 0} \frac{1}{2^k} = \frac{4n\log 4}{\log n}.\]

Putting the two together gives us $\pi(2n) \leq \sqrt{n} + \frac{4n\log 4}{\log n}$. For large enough $n$, $\sqrt{n}$ is at most $0.001 n/\log n$ and $\log (2n) \leq \frac{\log n}{0.999}$, so we get \(\pi(2n) \leq 0.001 n/\log n + 4n\log 4/\log (n) \leq 2.8 \frac{2n}{\log (2n)},\)

where we have used the fact that $\log 4 \leq 1.39$. It follows, upon replacing $2n$ by $n$, that $\pi(n) \leq 2.8 n/\log n$ for sufficiently large $n$.

A couple of remarks. Firstly, it seems as though we have been assuming that $n$ is a power of $2$ for the sake of telescoping; in general, we take the ceiling at each step. We can be off by at most one in each step of the telescoping, which only adds a $\log n$ factor to the final bound, which is negligible compared to the $n/\log n$ term anyways and does not affect the asymptotics. Secondly, the constant factor of $2.8$ can be considerably improved by replacing $\sqrt{n}$ with $n^c$ for $c$ close to $1$ with no change to the rest of the argument, except replacing $2.8$ by $1.4/c$ instead. This yields the bound $\pi(n) \leq 1.41 n/\log n$ for sufficiently large $n$.

Next, we show the lower bound. To do this, we define $U(n) = \mathsf{lcm}(1, 2, \ldots, n)$. Note that $U(n)$ clearly has lots to do with the primes and their powers that are smaller than $n$. In fact, it is easy to see that the largest power of a prime $p$ that divides $U(n)$ is $p^{\lfloor \log_p n \rfloor}$, since $p^k$ divides $U(n)$ if and only if $p^k \leq n$. Therefore, we can write \(U(n) = \prod_{p \leq n} p^{\lfloor \log_p n \rfloor},\)

where the product is taken over all primes $p$ less than or equal to $n$. We use the obvious bound $\lfloor x \rfloor \leq x$ to get

\[U(n) \leq \prod_{p \leq n} p^{\log_p n} = \prod_{p \leq n} n = n^{\pi(n)},\]

where $\pi(n)$ is as before the number of primes less than or equal to $n$. So if we can show a lower bound on $U(n)$, we automatically get a lower bound on $\pi(n)$. Informally, if $U(n)$ is large, then $\pi(n)$ must be large as well.

To show a lower bound on $U(n)$, what can we exploit? The basic definition of $U(n)$ is that for each $1 \leq i \leq n$, $i \mid U(n)$. An ingenious way to put this together is to say that for any sequence of integers $a_1, a_2, \ldots, a_n$, the value

\[\sum_{i=1}^n a_i \frac{U(n)}{i}\]

is an integer; this is equivalent to saying that for every polynomial $P_n(x)$ of degree at most $n-1$ with integer coefficients, the quantity $U(n) \int_0^1 P_n(x)\mathsf{d}x$ is an integer. Suppose that $n$ is odd, and let $P_n(x) = x^{\frac{n-1}{2}}(1-x)^{\frac{n-1}{2}}$. Since $x(1-x) \leq \frac14$ for $x \in [0, 1]$, we have

\[1 \leq U(n) \int_0^1 P_n(x)\mathsf{d}x \leq \frac{U(n)}{2^{n-1}} \implies U(n) \geq 2^{n-1}.\]

And for $n$ even, we pick $P_n(x) = x^{\frac{n}{2}-1}(1-x)^{\frac{n}{2}-1}$ (note that we have to keep the degree of $P_n$ at most $n-1$). For each $x \in [0, 1]$, we have

\[x^{\frac{n}{2}}(1-x)^{\frac{n}{2} - 1} \leq x 2^{-(n-2)},\]

so \(1 \leq U(n) \int_0^1 P_n(x)\mathsf{d}x \leq \frac{U(n)}{2^{n-2}} \int_0^1 x\mathsf{d}x \implies U(n) \geq 2^{n-1}.\)

Putting the two cases together, we have $U(n) \geq 2^{n-1}$ for all $n$. Now we can finish the lower bound on $\pi(n)$:

\[2^{n-1} \leq U(n) \leq n^{\pi(n)} \iff \pi(n) \geq \log 2\frac{n-1}{\log n} \geq 0.692\frac{n}{\log n},\]

where we can drop the $-1$ in the numerator for sufficiently large $n$ at the cost of a small drop in the constant factor since $(n)/(n-1) \to 1$.


To summarize, we have shown that for sufficiently large $n$, we have

\[0.692\frac{n}{\log n} \leq \pi(n) \leq 1.41 \frac{n}{\log n}.\]

The prime number theorem brings the lower and upper bounds to $(1-\varepsilon)$ and $(1+\varepsilon)$ respectively for any $\varepsilon > 0$ as $n \to \infty$. The proof of the prime number theorem is much more involved and relies on complex analysis. There is, however, one interesting point to make. The functions we used in the upper bound and lower bound proofs are closely related to the Chebyshev functions (so we have essentially been bounding them), which are key to the actual proof of the prime number theorem. The Chebyshev functions are defined as follows:

\[\begin{align*} \theta(n) &= \sum_{p \leq n} \log p, \\ \psi(n) &= \sum_{p^k \leq n} \log p. \end{align*}\]

Essentially, they are smoothed-out versions of $\pi(n)$ that are easier to analyze. The prime number theorem is equivalent to the statement that $\theta(n) \sim n$ and $\psi(n) \sim n$, and the proof of the prime number theorem goes via showing that the Riemann zeta function has no zeros on the line $\Re(s) = 1$, followed by relating the logarithm of the Riemann zeta function to the Chebyshev function $\psi(n)$, which then gives us the asymptotics of $\psi(n)$ and hence $\pi(n)$.

We actually upper-bounded $\theta(n)$ and lower-bounded $\psi(n)$ in our proofs. Can you find out where?