Patterns In Primes Via Probability

Anup Dixit, Siddhi Pathak
IMSc, India; CMI, India
Prime numbers appear to be scattered at random, yet their overall distribution follows striking statistical laws. This article explores how probabilistic thinking—especially Cramér’s random model for primes—helps predict typical gaps between primes while revealing surprising limitations. Along the way, it shows how randomness and arithmetic structure coexist in one of mathematics’ oldest problems.

Introduction

The idea of a number lies at the very foundation of human civilization. Some of the earliest archaeological records of writing already reveal an awareness of natural numbers, marking a crucial step in the development of abstract thought. Reflecting on this, R. Dedekind famously wrote that "numbers are free creations of the human intellect; they serve as a means of grasping more easily and more sharply the diversity of things". Since numbers provide a lens through which we interpret the world, their study has fascinated mathematicians for centuries.

One of the earliest and most profound discoveries in mathematics, dating back to Euclid, is that every natural number can be written as a product of prime numbers. In this sense, primes serve as the fundamental building blocks of arithmetic. A prime number is a natural number n>1n>1 that has no divisors other than 11 and itself. For instance, 22, 33, 55, 77, and 1111 are prime, whereas 9=329=3^2 is not. The study of prime numbers has attracted many of the greatest mathematicians, including Euler, Gauss, Dirichlet, and Riemann.

FIG 1. Apparent lack of patterns among the primes.

One of the first striking facts about primes is Euclid’s theorem asserting that there are infinitely many of them. Yet even a brief inspection of the primes among the first 100100 natural numbers reveals an apparent lack of regularity. Aside from the trivial observation that all primes except 22 are odd, no simple pattern emerges. It is therefore unsurprising that there is no fast and fully deterministic procedure for locating very large primes. Indeed, the largest known prime as of December 2025 is 2136,279,8411,2^{136,279,841} - 1, a number discovered through extensive computation rather than a simple formula.

Despite this apparent irregularity, prime numbers exhibit remarkable statistical regularities. In the late eighteenth century, Adrien-Marie Legendre and Carl Friedrich Gauss independently observed, based on extensive numerical evidence, that the number of primes up to a large number NN satisfies π(N):=#{pN:p is prime}NlogNas N.\pi(N) := \#\{ p \le N : p \text{ is prime} \} \sim \frac{N}{\log N}\quad \text{as } N \to \infty. Gauss further suggested that primes occur near a large number nn with density approximately 1logn\frac{1}{log n}, leading to the refined approximation π(N)2Ndtlogt=li(N)\pi(N) \sim \int_2^N \frac{dt}{\log t} = \operatorname{li}(N) known as the logarithmic integral. This approximation is surprisingly accurate: for example, there are exactly 7849878498 primes below 10610^6, while li(106)78,627\mathrm{li}(10^6) \approx 78{,}627, an error of only about 0.16%0.16\%.

FIG 2. The above construction represents numbers in a hexagonal pattern (the innermost hexagon shows 1 through 6, and then the numbers keep spiralling outward). The green mini hexagons represent prime numbers, and their frequency decays as the numbers get bigger (from inside to outside). This relationship is consistent with the prime number theorem which says that the frequency of prime numbers goes down as $1\/log n$. [Idea for the hexagonal representation - Quanta Magazine]

This observation, now known as the Prime Number Theorem, was proved independently by Jacques Hadamard and Charles Jean de la Vall'ee Poussin in 1896, building on ideas introduced earlier by Riemann. Thus, although primes may appear erratic, their overall distribution follows a precise and elegant law.

Still, the intuition that primes behave in many ways like an unpredictable sequence remains useful. In this article, we explore how this perspective can lead to meaningful predictions, focusing in particular on the behavior of gaps between consecutive primes. The smallest possible gap is 22, as seen in twin primes such as (3,5)(3,5) or (11,13)(11,13). On the other hand, by considering the numbers n!+2,n!+3,,n!+n,n!+2, n!+3, \ldots, n!+n, one can construct arbitrarily long stretches of composite numbers, and hence arbitrarily large gaps between primes. This naturally raises the question: near a large number NN, what should a typical gap between consecutive primes look like?

FIG 3. Cramér’s conjecture is a prediction about how large the gaps between consecutive prime numbers can get. Using a probabilistic model in which an integer near $n$ is “randomly prime” with probability $1\/log n$, Cramér argued that unusually large gaps should be rare and that the largest prime gaps near $n$ should grow no faster than about $(log n)^2$. In simple terms, the conjecture says that although prime gaps do grow without bound, they grow much more slowly than the primes themselves, and their size is tightly constrained by logarithmic factors.

To investigate this, consider an interval [N,N+H][N, N+H], where NN is large and HH is much smaller than NN, but still large enough to contain many primes. The Prime Number Theorem predicts that the number of primes in this interval is approximately π(N+H)π(N)HlogN.\pi(N+H) - \pi(N) \approx \frac{H}{\log N}. Dividing the length of the interval by the number of primes it contains suggests that the average gap between consecutive primes near NN is about logNlog N. This reasoning also reinforces Gauss’s original insight that roughly one out of every logNlog N numbers near NN is prime.

This line of thought inspired the Swedish mathematician Harald Cramér in 1936 to introduce a probabilistic model for primes. Imagine an infinite sequence of urns (Un)nN(U_n)_{n\in\mathbb{N}} The urn U1U_1 contains only blue balls, U2U_2 only red balls, and for n3n \ge 3, the urn UnU_n contains both colors. Suppose that when drawing a ball from UnU_n, the probability of obtaining a red ball is 1logn\frac{1}{log n}. Drawing independently from each urn produces an infinite sequence of red and blue outcomes. Let PnP_n denote the index of the urn from which the nn-th red ball is drawn. The resulting sequence (Pn)nN(P_n)_{n\in\mathbb{N}} is increasing and serves as a model for the sequence of prime numbers.

FIG 4. A histogram of gaps between consecutive prime numbers in a large numerical range. While individual prime gaps fluctuate irregularly, the distribution as a whole reflects the Prime Number Theorem: most gaps cluster around a size comparable to $log N$ for primes near N. This visualization supports the heuristic argument that, despite their apparent randomness, primes obey predictable average behavior—an idea that motivates probabilistic models such as Cramér’s and underlies modern conjectures on extreme prime gaps. [Source: InScight]

If we define Π(N):=#{nN:PnN},\Pi(N) := \#\{ n \in \mathbb{N} : P_n \le N \}, then E[Π(N)]=n=1N1lognli(N)NlogN.\mathbb{E}[\Pi(N)] = \sum_{n=1}^N \frac{1}{\log n} \approx \mathrm{li}(N) \approx \frac{N}{\log N}.

This mirrors the behavior of pi(N)pi(N), reinforcing the analogy between (Pn)(P_n) and the primes. In particular, one may expect the largest gaps between primes to resemble the largest gaps between successive PnP_n.

Using results from probability theory, Cramér showed that with probability one, lim supnPn+1Pn(logPn)2=1.\limsup_{n\to\infty} \frac{P_{n+1}-P_n}{(\log P_n)^2} = 1. Motivated by this, he conjectured that prime gaps satisfy a similar bound.

Conjecture 1 (Cramér’s Conjecture)
Let 2=p1<p2<p3<2=p_1<p_2<p_3<\cdots be the sequence of prime numbers.

Although powerful, Cramér’s model is undeniably simplistic. It ignores basic arithmetic constraints, such as the fact that all primes except 22 are odd, or that no number divisible by 33 can be prime. Consequently, relying on the model without modification can lead to misleading predictions.

A striking example concerns twin primes—pairs of primes (p,p+2)(p,p+2). While it is widely believed that infinitely many twin primes exist, this remains unproven. Cramér’s model predicts that the number of twin primes up to NN should be approximately n=2N1(logn)(log(n+2))N(logN)2.\sum_{n=2}^N \frac{1}{(\log n)(\log(n+2))} \approx \frac{N}{(\log N)^2}. However, the same reasoning would predict a similar number of consecutive primes, which is clearly impossible beyond the pair (2,3)(2,3). This illustrates the limitations of the model.

FIG 5. Gauss’s (left) empirical observations on the density of prime numbers led to the Prime Number Theorem and the logarithmic integral. Cramér’s (right) probabilistic model builds on this viewpoint to predict the typical and maximal size of gaps between consecutive primes, highlighting both the power and the limitations of treating primes as a random sequence.

Nevertheless, when refined to account for divisibility by small primes, Cramér’s framework yields more accurate predictions. Granville showed that such corrections replace the constant 11 in the strong conjecture with 2eγ1.2292e^{-\gamma}\approx1.229, where γ\gamma is Euler’s constant. Similarly, the refined model predicts that the number of twin primes up to NN should be 2C2N(logN)2,C2=p primep3(11(p1)2)0.66016.\sim 2C_2\,\frac{N}{(\log N)^2},\quad C_2=\prod_{\substack{p\ \text{prime}\\ p\ge3}}\left(1-\frac{1}{(p-1)^2}\right)\approx0.66016.

Cramér’s model also sheds light on the distribution of primes in short intervals. Under this framework, intervals [N,N+H)[N,N+H) with (logN)2+δ<H<N(\log N)^{2+\delta}<H<N typically contain about HlogN\frac{H}{log N} primes. In 1943, assuming the Riemann Hypothesis, A. Selberg showed that this holds for almost all such intervals. For many years, it was believed that this behavior should hold uniformly. However, in a groundbreaking result, H. Maier demonstrated in 1985 that there are infinitely many short intervals where the number of primes deviates significantly from this expectation.

Seen in this light, Cramér’s model reveals both the power and the limitations of probabilistic reasoning in number theory. It captures the correct scale of prime gaps while falling short of fully encoding arithmetic structure. Its successes and failures together show why such models are best viewed as guiding principles rather than final answers. By exploring where the model works, where it breaks down, and how it can be refined, mathematicians continue to uncover subtle patterns beneath the surface, ensuring that the study of prime numbers remains a vibrant and evolving field.


Anup Dixit is an Assistant Professor at the Institute of Mathematical Science (IMSc) India. His research interest lies in number theory. Apart from various outreach programs for school students, he is also involved in training students for the Mathematics Olympiad