| Download
William Stein -- Talk for Mathematics is a long conversation: a celebration of Barry Mazur
\documentclass[openany]{book}123%\documentclass[11pt,draft]{article} % uncomment this and comment out the above line for *fast* typesetting (no images)45\usepackage{fix-cm}67\usepackage{soul}89\usepackage{float}10\usepackage{tikz}1112\usepackage{wrapfig}131415\usepackage{color}16\definecolor{dblackcolor}{rgb}{0.0,0.0,0.0}17\definecolor{dbluecolor}{rgb}{.01,.02,0.7}18\definecolor{dredcolor}{rgb}{0.8,0,0}19\definecolor{dgraycolor}{rgb}{0.30,0.3,0.30}20\usepackage{listings}21\lstdefinelanguage{Sage}[]{Python}22{morekeywords={True,False,sage,singular},23sensitive=true}24\lstset{frame=none,25showtabs=False,26showspaces=False,27showstringspaces=False,28commentstyle={\ttfamily\color{dredcolor}},29keywordstyle={\ttfamily\color{dbluecolor}\bfseries},30stringstyle ={\ttfamily\color{dgraycolor}\bfseries},31language = Sage,32basicstyle={\scriptsize \ttfamily},33aboveskip=.3em,34belowskip=.1em35}363738\usepackage{fancybox}39\usepackage{graphicx}40\usepackage{amsmath}41\usepackage{amsfonts}42\usepackage{amssymb}43\usepackage{amsthm}4445\usepackage{url}4647\usepackage{makeidx}\makeindex4849\DeclareMathOperator{\Gap}{Gap}50\DeclareMathOperator{\Li}{Li}51\DeclareMathOperator{\li}{li}5253\DeclareMathOperator{\Sing}{Sing}54\DeclareMathOperator{\Triv}{Triv}55\DeclareMathOperator{\Osc}{Osc}56\DeclareMathOperator{\Elem}{Elem}57\DeclareMathOperator{\Double}{Double}58\DeclareMathOperator{\Smooth}{Smooth}59\DeclareMathOperator{\Ces}{Ces}606162\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}6364\newcommand{\mycaption}[1]{\begin{quote}{\bf Figure: } \large #1\end{quote}}6566\newcommand{\ill}[3]{%67\begin{figure}[H]%68\vspace{-2ex}69\centering%70\includegraphics[width=#2\textwidth]{illustrations/#1}%71\caption{#3}%72\vspace{-2ex}73\end{figure}}7475\newcommand{\illtwo}[4]{%76\begin{figure}[H]\centering%77\includegraphics[width=#3\textwidth]{illustrations/#1}$\qquad$\includegraphics[width=#3\textwidth]{illustrations/#2}%78\caption{#4}%79\end{figure}}8081\newcommand{\illthree}[5]{%82\begin{figure}[H]%83\centering%84\includegraphics[width=#4\textwidth]{illustrations/#1}$\qquad$\includegraphics[width=#4\textwidth]{illustrations/#2}$\qquad$\includegraphics[width=#4\textwidth]{illustrations/#3}%85\caption{#5}%86\end{figure}}8788\newcommand{\illtwoh}[4]{%89\begin{figure}[H]\centering%90\includegraphics[height=#3\textheight]{illustrations/#1}$\qquad$\includegraphics[height=#3\textheight]{illustrations/#2}%91\caption{#4}%92\end{figure}}9394\newcommand{\illthreeh}[5]{%95\begin{figure}[H]%96\centering%97\includegraphics[height=#4\textheight]{illustrations/#1}$\qquad$\includegraphics[height=#4\textheight]{illustrations/#2}$\qquad$\includegraphics[height=#4\textheight]{illustrations/#3}%98\caption{#5}%99\end{figure}}100101102%%%% Theoremstyles103\theoremstyle{plain}104\newtheorem{theorem}{Theorem}[chapter]105\newtheorem{proposition}[theorem]{Proposition}106\newtheorem{corollary}[theorem]{Corollary}107\newtheorem{claim}[theorem]{Claim}108\newtheorem{lemma}[theorem]{Lemma}109\newtheorem{hypothesis}[theorem]{Hypothesis}110\newtheorem{conjecture}[theorem]{Conjecture}111112\theoremstyle{definition}113\newtheorem{definition}[theorem]{Definition}114\newtheorem{question}[theorem]{Question}115\newtheorem{problem}[theorem]{Problem}116\newtheorem{alg}[theorem]{Algorithm}117\newtheorem{openproblem}[theorem]{Open Problem}118119%\theoremstyle{remark}120\newtheorem{goal}[theorem]{Goal}121\newtheorem{remark}[theorem]{Remark}122\newtheorem{remarks}[theorem]{Remarks}123\newtheorem{example}[theorem]{Example}124\newtheorem{exercise}[theorem]{Exercise}125126127128%\hoffset=-0.05\textwidth129%\textwidth = 1.1\textwidth130%\voffset=-0.05\textheight131%\textheight = 1.1\textheight132%133% Set equal margins on book style134\setlength{\oddsidemargin}{53pt}135\setlength{\evensidemargin}{53pt}136\setlength{\marginparwidth}{57pt}137\setlength{\footskip}{30pt}138139% Dutch style of paragraph formatting, i.e. no indents.140\setlength{\parskip}{1.3ex plus 0.2ex minus 0.2ex}141\setlength{\parindent}{0pt}142143\setcounter{tocdepth}{0}144145%\textheight = 9 in146%\oddsidemargin = 0.0 in147%\evensidemargin = 0.0 in148%\topmargin = 0.0 in149%\headheight = 0.0 in150%\headsep = 0.0 in151%\parskip = 0.2in152%\parindent = 0.0in153154155\def\GL{\mathrm{GL}}156\def\PGL{\mathrm{PGL}}157\def\PSL{\mathrm{PSL}}158\def\GSP{\mathrm{GSP}}159\def\Z{\mathrm{Z}}160\def\Q{\mathrm{Q}}161\def\Gal{\mathrm{Gal}}162\def\Hom{\mathrm{Hom}}163\def\Ind{\mathrm{Ind}}164\def\End{\mathrm{End}}165\def\Aut{\mathrm{Aut}}166\def\loc{\mathrm{loc}}167\def\glob{\mathrm{glob}}168\def\Kbar{{\bar K}}169\def\D{{\mathcal D}}170\def\L{{\mathcal L}}171\def\R{{\mathcal R}}172\def\G{{\mathcal G}}173\def\W{{\mathcal W}}174\def\H{{\mathcal H}}175\def\OH{{\mathcal OH}}176177178179\newcommand{\RH}{Riemann Hypothesis\index{Riemann Hypothesis}}180181\title{Prime Numbers and the Riemann Hypothesis}182\date{}183\author{\Large Barry Mazur \and \Large William Stein \and \vspace{20ex}\\184%\begin{center}185%A formula that counts the prime numbers ...186%\includegraphics[width=.65\textwidth]{illustrations/Rk_50_2_100}187%... built out of their spectrum:188% cover = use fractal trace with params 0.3 -0.1 0 0189\includegraphics[width=1.1\textwidth]{illustrations/cover}190%$$191%f(t) = -\sum_{\text{prime powers }p^n}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))192%$$193%\end{center}194}195196\usepackage{notes2bib}197\bibliographystyle{amsplain}198199\usepackage{hyperref}200201202\newcommand{\todo}[1]{\par\vspace{1em}{\small---------\\{{\bf To be done:} #1}\\-----------}\par\vspace{1em}}203204205% More space between footnotes -- see206% http://www.latex-community.org/forum/viewtopic.php?f=44&t=22437207\setlength{\footnotesep}{1.2\baselineskip}208209210\usepackage{pdfpages}211212\begin{document}213214%\maketitle215\includepdf{cover}216217% Remove parskip for toc218\setlength{\parskip}{0ex plus 0.5ex minus 0.2ex}219220\tableofcontents221222223% Dutch style of paragraph formatting, i.e. no indents.224\setlength{\parskip}{1.3ex plus 0.2ex minus 0.2ex}225226\chapter*{\label{preface}Preface}227\addcontentsline{toc}{chapter}{Preface}228The \RH{} is one of the great unsolved problems of229mathematics, and the reward of \$1{,}000{,}000 of {\em Clay Mathematics230Institute} prize money awaits the person who solves it. But---with231or without money---its resolution is crucial for our understanding of232the nature of numbers.233234235There are several full-length books recently published, written236for a general audience, that have the \RH{} as their main237topic. A reader of these books will get a fairly rich picture of the238personalities engaged in the pursuit, and of related mathematical and239historical issues.\footnote{See, e.g., {\em The Music of the Primes} by Marcus du Sautoy (2003)240and {\em Prime Obsession: Bernhard Riemann and241the Greatest Unsolved Problem in Mathematics} by John Derbyshire (2003).}242243This is {\em not} the mission of the book that you now hold in your244hands. We aim---instead---to explain, in as direct a manner as245possible and with the least mathematical background required, what246this problem is all about and why it is so important. For even before247anyone proves this {\em hypothesis} to be true (or false!), just248getting familiar with it, and with some of the ideas behind it, is249exciting. Moreover, this hypothesis is of crucial importance in a250wide range of mathematical fields; for example, it is a251confidence-booster for computational mathematics: even if the \RH{}252is never proved, assuming its truth (and that of closely253related hypotheses) gives us an excellent sense of254how long certain computer programs will take to run, which, in some255cases, gives us the assurance we need to initiate a computation that256might take weeks or even months to complete.257258259% I (William Stein) took this very picture in the Princeton common room in 2001!260\ill{sarnak}{0.4}{Peter Sarnak\index{Sarnak, Peter}}261262263264Here is how the265Princeton mathematician Peter Sarnak describes the broad impact the266\RH{} has had\footnote{See page 222 of267{\em The Riemann hypothesis: the greatest unsolved problem in mathematics} by Karl Sabbagh (2002).}:268\begin{quote}269``The Riemann hypothesis is the central problem and it implies many,270many things. One thing that makes it rather unusual in mathematics271today is that there must be over five hundred papers---somebody should272go and count---which start `Assume the Riemann hypothesis\footnote{Technically,273a generalized version of the Riemann hypothesis (see274Chapter~\ref{ch:companions} below).},' and275the conclusion is fantastic. And those [conclusions] would then become276theorems ... With this one solution you would have proven five hundred277theorems or more at once.''278\end{quote}279% This is from page 106 of the Borwein book, which is just citing Sabbagh.280281282283So, what {\it is} the \RH{}? Below is a {\it first284description} of what it is about. The task of our book is to develop285the following boxed paragraph into a fuller explanation and to286convince you of the importance and beauty of the mathematics it287represents. We will be offering, throughout our book, a number of288different---but equivalent---ways of precisely formulating this289hypothesis (we display these in boxes). When we say that two290mathematical statements are ``equivalent'' we mean that, given the291present state of mathematical knowledge, we can prove that if either292one of those statements is true, then the other is true. The endnotes293will guide the reader to the relevant mathematical literature.294295\begin{center}296\shadowbox{297\begin{minipage}{.98\textwidth}298\hspace{.015\textwidth}299\begin{minipage}{0.95\textwidth}300\mbox{} \vspace{0.2ex}301\begin{center}{\bf\large What {\em sort} of Hypothesis is the \RH{}?}\end{center}302\medskip303304Consider the seemingly innocuous series of questions:305306\begin{quote}307\begin{itemize}308\item How many prime numbers (2, 3, 5, 7, 11, 13, 17, $\ldots$) are there less than 100?309\item How many less than 10{,}000?310\item How many less than 1{,}000{,}000?311\end{itemize}312313More generally, how many primes are there less than any given number $X$?314\end{quote}315316Riemann\index{Riemann, Bernhard} proposed, a century and half ago, a strikingly317simple-to-describe ``very good approximation'' to the number of318primes less than a given number $X$. We now319see that if we could prove this {\em Hypothesis of Riemann} we would have320the key to a wealth of powerful mathematics. Mathematicians are eager321to find that key.322323324\vspace{1ex}325\end{minipage}\end{minipage}}326\end{center}327328% I got this from http://www.math.harvard.edu/history/bott/329% Barry says there is a good picture in the book330% ''the Index Theorem'' that Yau published.331\ill{raoulbott}{0.35}{Raoul Bott (1923--2005). Photograph by George M. Bergman, Courtesy of the Department of Mathematics, Harvard University.\label{fig:bott}}332333334The mathematician Raoul Bott\index{Bott, Raoul}---in giving advice to335a student---once said that whenever one336reads a mathematics book or article, or goes to a math lecture, one337should aim to come home with something very specific (it can be small,338but should be {\em specific}) that has application to a wider class of339mathematical problems than was the focus of the text or lecture. If we340were to suggest some possible {\em specific} items to come home with,341after reading our book, three key phrases---{\bf prime numbers}\index{prime number}, {\bf342square-root accurate}\index{square-root accurate}, and {\bf spectrum}\index{spectrum}---would head the343list. As for words of encouragement to think hard about the first of344these, i.e., prime numbers, we can do no better than to quote a345paragraph of Don Zagier's\index{Zagier, Don} classic 12-page exposition, {\em The First34650 Million Prime Numbers}:347348% I took this myself when we were hiking in Oberwolfach once...349\ill{zagier}{.35}{Don Zagier}350351352\begin{quote}353``There are two facts about the distribution of prime numbers of354which I hope to convince you so overwhelmingly that they will be355permanently engraved in your hearts. The first is that, [they are]356the most arbitrary and ornery objects studied by mathematicians:357they grow like weeds among the natural numbers, seeming to obey no358other law than that of chance, and nobody can predict where the next359one will sprout. The second fact is even more astonishing, for it360states just the opposite: that the prime numbers exhibit stunning361regularity, that there are laws governing their behavior, and that362they obey these laws with almost military precision.''363\end{quote}364365366Mathematics is flourishing. Each year sees new exciting initiatives that extend and sharpen the applications of our subject, new directions for deep exploration---and finer understanding---of classical as well as very contemporary mathematical domains. We are aided in such explorations by the development of more and more powerful tools. We see resolutions of centrally important questions. And through all of this, we are treated to surprises and dramatic changes of viewpoint; in short: marvels.367368And what an array of wonderful techniques allow mathematicians to do their work: framing {\it definitions;} producing {\it constructions;} formulating {\it analogies relating disparate concepts, and disparate mathematical fields}; posing {\it conjectures,} that cleanly shape a possible way forward; and, the keystone: providing unassailable {\it proofs} of what is asserted, the idea of doing such a thing being itself one of the great glories of mathematics.369370Number theory has its share of this bounty. Along with all these modes of theoretical work, number theory also offers the pure joy of numerical experimentation, which---when it is going well---allows you to witness the intricacy of numbers and profound inter-relations that cry out for explanation. It is striking how little you actually have to know in order to appreciate the revelations offered by numerical exploration.371372Our book is meant to be an introduction to these pleasures. We take an experimental view of the fundamental ideas of the subject buttressed by numerical computations, often displayed as graphs. As a result, our book is profusely illustrated, containing 131 figures,373diagrams, and pictures that accompany the text.\footnote{We created the374figures using the free SageMath375software (see \url{http://www.sagemath.org}).376Complete source code is available, which377can be used to recreate every diagram in this book378(see \url{http://wstein.org/rh}).379More adventurous readers can try380to experiment with the parameters for381the ranges of data illustrated, so as to get an even more382vivid sense of how the numbers ``behave.''383We hope that readers become inspired to carry out numerical experimentation,384which is becoming easier as mathematical software385advances.}386387There are few388mathematical equations in Part~\ref{part1}. This first portion of our book is intended for readers who are generally interested in, or curious about, mathematical ideas, but who may not have studied any advanced topics. Part~\ref{part1} is devoted to conveying the essence of the Riemann Hypothesis and explaining why it is so intensely pursued. It requires a minimum of mathematical knowledge, and does not, for example, use calculus,\index{calculus} although it would be helpful to know---or to learn on the run---the meaning of the concept of {\it function}\index{function}. Given its mission, Part~\ref{part1} is meant to be complete, in that it has a beginning, middle, and end. We hope that our readers who only read Part~\ref{part1} will have enjoyed the excitement of this important piece of mathematics.389390391392Part~\ref{part2} is for readers who have taken at least one class in calculus,\index{calculus} possibly a long time ago. It is meant as a general preparation for the type of Fourier analysis\index{Fourier analysis} that will occur in the later parts. The notion of spectrum\index{spectrum} is key.393394Part~\ref{part3} is for readers who wish to see, more vividly, the link between the placement of prime numbers\index{prime number} and (what we call there) the {\it Riemann spectrum}.\index{Riemann spectrum}395396Part~\ref{part4} requires some familiarity with complex analytic functions, and returns to Riemann's original viewpoint. In particular it relates the ``Riemann spectrum'' that we discuss in Part~\ref{part3} to the {\it nontrivial zeroes\index{nontrivial zeroes} of the Riemann zeta function}.\index{zeta function} We also provide a brief sketch of the more standard route taken by published expositions of the Riemann Hypothesis.397398The end-notes are meant to link the text to references, but also to provide more technical commentary with an increasing dependence on mathematical background in the later chapters. References to the end notes will be in brackets.399400We wrote our book over the past decade, but devoted only one week to it each year (a week in August). At the end of our work-week for the book, each year, we put our draft (mistakes and all) on line to get response from readers.{\footnote{See {\url{http://library.fora.tv/2014/04/25/Riemann_Hypothesis_The_Million_Dollar_Challenge}} which is a lecture---and Q \& A---about the composition of this book.}} We therefore accumulated much important feedback, corrections, and requests from readers.\footnote{Including401Dan Asimov, Bret Benesh, Keren Binyaminov, Harald B\"{o}geholz, Louis-Philippe Chiasson, Keith Conrad, Karl-Dieter Crisman, Nicola Dunn, Thomas Egense, Bill Gosper, Andrew Granville, Shaun Griffith, Michael J. Gruber, Robert Harron, William R. Hearst III, David Jao, Fredrik Johansson, Jim Markovitch, David Mumford, James Propp, Andrew Solomon, Dennis Stein, and Chris Swenson.}402We thank them infinitely.403404405\part{The \RH{}\label{part1}}406407\numberwithin{equation}{chapter}408\numberwithin{figure}{chapter}409\numberwithin{table}{chapter}410411\chapter[Thoughts about numbers]{Thoughts about numbers: ancient, medieval, and modern}412413If we are to believe the ancient Greek philosopher Aristotle\index{Aristotle}, the early414Pythagoreans thought that the principles governing Number are ``the415principles of all things,'' the concept of Number being more basic than416{\em earth, air, fire, or water}, which were according to ancient tradition417the four building blocks of matter. To think about Number is to get418close to the architecture of ``what is.''419420So, how far along are we in our thoughts about numbers?421422423424% This is from http://en.wikipedia.org/wiki/File:Frans_Hals_-_Portret_van_Ren%C3%A9_Descartes.jpg425% There is a higher resolution scan available there426427\ill{descartes}{.35}{Ren\'e Descartes (1596--1650)}428429The French philosopher and mathematician Ren\'e Descartes,\index{Descartes, Ren\'{e}} almost four430centuries ago, expressed the hope that there soon would be ``almost431nothing more to discover in geometry.'' Contemporary physicists dream432of a final theory.\footnote{See Weinberg's book {\em Dreams of a433Final Theory: The Search for the Fundamental Laws of Nature}, by434Steven Weinberg (New York: Pantheon Books, 1992)} But despite its435venerability and its great power and beauty, the pure mathematics of436numbers may still be in the infancy of its development, with depths to437be explored as endless as the human soul, and {\it never} a final theory.438439\ill{quixote}{.5}{Jean de Bosschere, ``Don Quixote and his Dulcinea del Toboso,'' from The History of Don Quixote De La Mancha, by Miguel De Cervantes. Trans. Thomas Shelton. George H. Doran Company, New York (1923).}440441Numbers are obstreperous things. Don Quixote\index{Don Quixote} encountered this when he442requested that the ``bachelor'' compose a poem to his lady Dulcinea del443Toboso, the first letters of each line spelling out her name. The444``bachelor'' found\footnote{See Chapter IV of the Second Part of the {\em Ingenious Gentleman Don Quixote of La Mancha}.}445446\begin{quote}447``a great difficulty in their composition because the number of448letters in her name was $17$, and if he made four Castilian stanzas449of four octosyllabic lines each, there would be one letter too many,450and if he made the stanzas of five octosyllabic lines each, the ones451called {\em d{\'e}cimas} or {\em redondillas,} there would be three452letters too few...''453\end{quote}454455``It must fit in, however you do it,'' pleaded Quixote\index{Don Quixote}, not willing to456grant the imperviousness of the number $17$ to division.457458% http://www.jus.uio.no/sisu/don_quixote.miguel_de_cervantes/60.html#2068459460461462463{\em Seventeen} is indeed a prime number:\index{prime number} there is no way of factoring\index{factor}464it as the product of smaller numbers, and this accounts---people tell465us---for its occurrence in some phenomena of nature, as when466the seventeen-year cicadas all emerged to celebrate a ``reunion'' of some467sort in our fields and valleys.468469\ill{cicada}{.35}{Cicadas emerge every 17 years}470% from http://biology.clc.uc.edu/steincarter/cicadas.htm471472473474475Prime numbers\index{prime number}, despite their {\em primary} position in our modern476understanding of numbers, were not specifically doted over in the477ancient literature before Euclid,\index{Euclid} at least not in the literature that478has been preserved. Primes are mentioned as a class of numbers in the479writings of Philolaus (a predecessor of Plato\index{Plato}); they are not mentioned480specifically in the Platonic dialogues, which is surprising481given the intense interest Plato had in mathematical developments; and482they make an occasional appearance in the writings of Aristotle, which483is not surprising, given Aristotle's emphasis on the distinction484between the {\em composite} and the {\em incomposite}. ``The485incomposite is prior to the composite,'' writes Aristotle\index{Aristotle} in Book 13 of486the Metaphysics.487488489Prime numbers\index{prime number} do occur, in earnest, in Euclid's\index{Euclid} {\it Elements}!490491492% from http://en.wikipedia.org/wiki/File:Euklid-von-Alexandria_1.jpg493\ill{euclid}{.35}{Euclid\index{Euclid}}494495496There is an extraordinary wealth of established truths about whole497numbers; these truths provoke sheer awe for the beautiful complexity498of prime numbers\index{prime number}. But each of the important new discoveries we make499gives rise to a further richness of questions, educated guesses,500heuristics, expectations, and unsolved problems.501502503504505506507508509\chapter{What are prime numbers?}\label{ch:what_are_primes}\index{prime number}510511\noindent {\em Primes as atoms. }\index{atom} To begin from the beginning, think512of the operation of multiplication as a bond that ties numbers513together: the equation $2\times 3= 6$ invites us to imagine the number514$6$ as (a molecule, if you wish) built out of its smaller constituents515$2$ and $3$. Reversing the procedure, if we start with a whole516number, say $6$ again, we may try to factor\index{factor} it (that is, express it as517a product of smaller whole numbers) and, of course, we would518eventually, if not immediately, come up with $6 = 2\times 3$ and519discover that $2$ and $3$ factor no further; the numbers $2$ and $3$,520then, are the indecomposable entities (atoms, if you wish) that521comprise our number.522523\ill{factor_tree_6}{.3}{The number $6 = 2\times 3$}524525526By definition, a {\bf prime number}\index{prime number}527(colloquially, {\em a prime}) is a whole number, bigger than $1$, that528cannot be factored into a product of two smaller whole numbers. So,529$2$ and $3$ are the first two prime numbers. The next number along the530line, $4$, is not prime, for $4= 2\times 2$; the number after that,531$5$, is. Primes are, multiplicatively speaking, the building blocks532from which all numbers can be made. A fundamental theorem of533arithmetic tells us that any number (bigger than $1$) can be factored534as a product of primes, and the factorization is {\em unique} except535for rearranging the order of the primes.536537538539For example, if you try to factor\index{factor} $12$ as a product of540two smaller numbers---ignoring the order of the factors---there are two541ways to begin to do this:542$$54312 = 2 \times 6 \qquad\text{ and }\qquad 12 = 3 \times 4544$$545But neither of these ways is a full factorization of $12$, for both546$6$ and $4$ are not prime, so can be themselves factored, and in each547case after changing the ordering of the factors we arrive at:548$$54912= 2 \times 2 \times 3.550$$551552If you try to factor\index{factor} the number $300$, there are many553ways to begin:554$$555300= 30\times 10\qquad\text{or}\qquad 300 = 6 \times 50556$$557and there are various other starting possibilities. But if you558continue the factorization (``climbing down'' any one of the possible559``factoring trees'') to the bottom, where every factor is a prime560number as in Figure~\ref{fig:factor300}, you always end up with the561same collection of prime numbers\index{prime number}\footnote{See Section~1.1 of562Stein's {\em Elementary Number Theory: Primes, Congruences, and Secrets} (2008) at \url{http://wstein.org/ent/}563for a proof of the ``fundamental theorem of arithmetic,''564which asserts that every positive565whole number factors uniquely as a product of primes.}:566$$300 = 2^2\times 3\times 5^2.$$567568\illtwo{factor_tree_300_a}{factor_tree_300_b}{.47}569{Factor trees that illustrate the factorization of 300 as a product of primes.\label{fig:factor300}}570571\ill{factor_tree_big}{1}{Factorization tree for the product of the primes up to $29$.\label{factor.tree.big}}%NOTE: in our version of the book there is an annoying572%battle between the footnote and this figure. Maybe won't matter in573% CUP version.574575576The \RH{} probes the question: how intimately can we know577prime numbers\index{prime number}, those {\em atoms}\index{atom} of multiplication? Prime numbers are578an important part of our daily lives. For example, often when we visit a579website and purchase something online, prime numbers having hundreds of580decimal digits are used to keep our bank transactions private. This581ubiquitous use to which giant primes are put depends upon a very582simple principle: it is much easier to multiply numbers together than583to factor\index{factor} them. If you had to factor, say, the number $391$ you might584scratch your head for a few minutes before discovering that $391$ is585$17\times 23$. But if you had to multiply $17$ by $23$ you would do it586straightaway. Offer two primes, say, $P$ and $Q$ each with a few hundred587digits, to your computing machine and ask it to multiply them588together: you will get their product $N = P\times Q$ with its hundreds of digits589in about a microsecond. But present that number $N$ to any590current desktop computer, and ask it to factor $N$, and the computer591will (almost certainly) fail to do the task. See592\bibnote{{\bf How not to factor the numerator of a Bernoulli number:}593594As mentioned in Chapter~\ref{ch:envision}, the coefficient $B_k$ of the linear595term of the polynomial596$$597S_k(n)= 1^k+2^k+3^k+\dots +(n-1)^k598$$599is (up to sign) the $k$-th {\bf Bernoulli number}. These numbers are600rational numbers and, putting them in lowest terms, their numerators601play a role in certain problems, and their denominators in602others. (This is an amazing story, which we can't go into here!)603604One of us (Barry Mazur) in the recent article {\em How can we605construct abelian Galois extensions of basic number fields?} (see606\url{http://www.ams.org/journals/bull/2011-48-02/S0273-0979-2011-01326-X/})607found himself dealing (for various reasons) with the fraction608$-B_{200}/400$, where $B_{200}$ is the two-hundredth Bernoulli number.609The numerator of this fraction is quite large: it is---hold your610breath---\ $$389\cdot 691\cdot5370056528687 \ \ \ \ \text{times this611204-digit number:}612$$\vskip 10pt613$N:=3452690329392158031464109281736967404068448156842396721012$\newline614$\mbox{}\,\,\,\qquad 9920642145194459192569415445652760676623601087497272415557$\newline615$\mbox{}\,\,\,\qquad 0842527652727868776362959519620872735612200601036506871681$\newline616$\mbox{}\,\,\,\qquad 124610986596878180738901486527$617\vskip 10pt and he {\it incorrectly asserted} that it was prime. Happily, Bartosz Naskr\c{e}cki spotted this error: our $204$-digit $N$ is {\it not} prime.618% Note, according to http://perso.telecom-paristech.fr/~cappe/local/latex/accents.pdf619% the accent I get with \c{e} is ``NOTE: This is technically incorrect. The accent should be reversed in orientation,620% but there is currently no such accent in plain TEX or LATEX.''621622How did he know this? By using the most basic test in the repertoire623of tests that we have available to check to see whether a number is624prime: we'll call it the ``{\bf Fermat $2$-test.}'' We'll first give a625general explanation of this type of test before we show how $N$ {\it626fails the Fermat $2$-test.}627628629The starting idea behind this test is the famous result known as {\em630Fermat's Little Theorem} where the ``little'' is meant to631alliteratively distinguish it from you-know-what.632\begin{theorem}[Fermat's Little Theorem]633If $p$ is a prime number, and $a$ is any number relatively prime to $p$ then $a^{p-1} - 1$ is divisible by $p$.634\end{theorem}635A good exercise is to try to prove this, and a one-word hint that might lead you to one of the many proofs of it is636{\em induction}.\footnote{Here's the proof:637$$(N+1)^p \equiv N^p +1 \equiv (N+1)\ \mod p,$$638where the first equality is the binomial theorem and the second639equality is induction.}640641Now we are going to use this as a criterion, by---in effect---restating it in what logicians would call its {\it contrapositive}:642\begin{theorem}[The Fermat $a$-test]643If $M$ is a positive integer, and $a$ is any number relatively prime to $M$ such that $a^{M-1} - 1$ is {\it not} divisible by $M$, then $M$ is {\it not} a prime.644\end{theorem}645646% sage: N =345269032939215803146410928173696740406844815684239672101299206421451944591925694154456527606766236010874972724155570842527652727868776362959519620872735612200601036506871681124610986596878180738901486527; z = Mod(2, N)^(N-1) - 1; z647Well, Naskr\c{e}cki computed $2^{N-1}-1$ (for the $204$-digit $N$ above) and saw that648it is {\it not} divisible\footnote{The number $2^{N-1}-1$ has a residue after division by $N$ of\\ $3334581100595953025153969739282790317394606677381970645616725285996925$\newline$6496610000568292727335792620957159782739813115005451450864072425835484898$\newline$650565112763692970799269335402819507605691622173717318335512037457$.} by $N$.651Ergo, our $N$ fails the Fermat $2$-test so is {\it not} prime.652653But then, given that it is so ``easy'' to see that $N$ is not prime, a654natural question to ask is: what, in fact, is its prime factorization?655This---it turns out---isn't so easy to determine;656Naskr\c{e}cki devoted $24$ hours of computer time setting standard factorization657algorithms on the task, and that was not sufficient time to resolve658the issue. The factorization of the numerators of general Bernoulli659numbers is the subject of a very interesting website run by Samuel660Wagstaff (\url{http://homes.cerias.purdue.edu/~ssw/bernoulli}).661Linked to this web page one finds662(\url{http://homes.cerias.purdue.edu/~ssw/bernoulli/composite}) which gives663a list of composite numbers whose factorizations have resisted all664attempts to date. The two-hundredth Bernoulli number is 12th on the list.665666The page667\url{http://en.wikipedia.org/wiki/Integer_factorization_records} lists668record challenge factorization, and one challenge that was completed669in 2009 involves a difficult-to-factor number with 232 digits; its670factorization was completed by a large team of researchers and took around6712000 years of CPU time. This convinced us that672with sufficient motivation it would be possible to factor $N$, and so we673asked some leaders in the field to try. They succeeded!674\begin{quote}\sf675Factorisation of B200676677by Bill Hart on 4 Aug 05, 2012 at 07:24pm678679We are happy to announce the factorization of the numerator of the 200th680Bernoulli number:681682\begin{align*}683N &= 389 \cdot 691 \cdot 5370056528687 \cdot c_{204}\\684c_{204} &= p_{90} \cdot p_{115}\\685p_{90} &= 149474329044343594528784250333645983079497454292\\686&= 838248852612270757617561057674257880592603\\687p_{115} &= 230988849487852221315416645031371036732923661613\\688&= 619208811597595398791184043153272314198502348476\\689&= 2629703896050377709690\end{align*}691692The factorization of the 204-digit composite was made possible with the help693of many people:694\begin{itemize}695\item William Stein and Barry Mazur challenged us to factor this number.696\item Sam Wagstaff maintains a table of factorizations of numerators of Bernoulli697numbers at \url{http://homes.cerias.purdue.edu/~ssw/bernoulli/bnum}. According698to this table, the 200th Bernoulli number is the 2nd smallest index with699unfactored numerator (the first being the 188th Bernoulli number).700\item Cyril Bouvier tried to factor the c204 by ECM up to 60-digit level, using701the TALC cluster at Inria Nancy - Grand Est.702\item {\sf yoyo@home} tried to factor the c204 by ECM up to 65-digit level, using the703help of many volunteers of the distributed computing platform704\url{http://www.rechenkraft.net/yoyo/}.705After ECM was unsuccessful, we decided to factor the c204 by GNFS.706\item Many people at the Mersenne forum helped for the polynomial selection. The best707polynomial was found by Shi Bai, using his implementation of Kleinjung's708algorithm in CADO-NFS:709\url{http://www.mersenneforum.org/showthread.php?p=298264#post298264}.710Sieving was performed by many volunteers using {\sf NFS@home}, thanks to Greg711Childers. See \url{http://escatter11.fullerton.edu/nfs} for more details712about {\sf NFS@home}.713This factorization showed that such a distributed effort might be feasible for a714new record GNFS factorization, in particular for the polynomial selection.715This was the largest GNFS factorization performed by {\sf NFS@home} to date,716the second largest being $2^{1040}+1$ at 183.7 digits.717\item Two independent runs of the filtering and linear algebra were done: one by Greg718Childers with msieve (\url{http://www.boo.net/~jasonp/qs.html}) using a 48-core719cluster made available by Bill Hart, and one by Emmanuel Thom\'{e} and Paul720Zimmermann with CADO-NFS (\url{http://cado-nfs.gforge.inria.fr/}), using the Grid7215000 platform.722\item The first linear algebra run to complete was the one with CADO-NFS, thus we723decided to stop the other run.724\end{itemize}725726Bill Hart727\end{quote}728729We verify the factorization above in SageMath as follows: \\730731{\sf732sage: p90 = 1494743290443435945287842503336459830794974542928382\\73348852612270757617561057674257880592603\\734sage: p115 = 230988849487852221315416645031371036732923661613619\\7352088115975953987911840431532723141985023484762629703896050377709\\736sage: c204 = p90 * p115\\737sage: 389 * 691 * 5370056528687 * c204 == -numerator(bernoulli(200))\\738True\\739sage: is\_prime(p90), is\_prime(p115), is\_prime(c204)\\740(True, True, False)741}742} and \bibnote{\label{bibnote:factor}743Given an integer $n$, there are744many algorithms available for trying to write $n$ as a product of745prime numbers. First we can apply {\em trial division}, where we746simply divide $n$ by each prime $2, 3, 5, 7, 11, 13, \ldots$ in turn,747and see what small prime factors we find (up to a few digits). After748using this method to eliminate as many primes as we have patience to749eliminate, we typically next turn to a technique called {\em Lenstra's750elliptic curve method}, which allows us to check $n$ for751divisibility by bigger primes (e.g., around 10--15 digits). Once752we've exhausted our patience using the elliptic curve method, we would753next hit our number with something called the {\em quadratic sieve},754which works well for factoring numbers of the form $n=pq$, with $p$755and $q$ primes of roughly equal size, and $n$ having less than 100756digits (say, though the 100 depends greatly on the implementation).757All of the above algorithms---and then some---are implemented in SageMath,758and used by default when you type {\tt factor(n)} into SageMath. Try759typing {\tt factor(some number, verbose=8)} to see for yourself.760761If the quadratic sieve fails, a final recourse is to run the {\em762number field sieve} algorithm, possibly on a supercomputer. To give763a sense of how powerful (or powerless, depending on perspective!) the764number field sieve is, a record-setting factorization of a general765number using this algorithm is the factorization of a 232 digit number766called RSA-768 (see \url{https://eprint.iacr.org/2010/006.pdf}):767768$769n = 12301866845301177551304949583849627207728535695953347921973224521\\770517264005072636575187452021997864693899564749427740638459251925573263\\771034537315482685079170261221429134616704292143116022212404792747377940\\77280665351419597459856902143413773$774\par\noindent{}which factors as $pq$, where\par\noindent{}775$776p = 334780716989568987860441698482126908177047949837137685689124313889\\77782883793878002287614711652531743087737814467999489778$779\par\noindent{}and\par\noindent{}780$781q = 367460436667995904282446337996279526322791581643430876426760322838\\78215739666511279233373417143396810270092798736308917.783$784\par\noindent{}We encourage you to try to785factor $n$ in SageMath, and see that it fails.786SageMath does not yet include an implementation of the787number field sieve algorithm, though there are some free implementations788currently available (see \url{http://www.boo.net/~jasonp/qs.html}).789}.790791%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%792793794795796%We illustrate this in Sage:797% \begin{lstlisting}[]798% sage: n = ZZ.random_element(10^500)799% sage: m = ZZ.random_element(10^500)800% sage: timeit('n*m')801% 1.51 microseconds per loop802% sage: timeit('n*m')803% 2.33 microseconds per loop804% sage: factor(n*m)805% [wait forever]806% \end{lstlisting}}.807The safety of much808encryption depends upon this ``guaranteed'' failure!\footnote{Nobody has ever809published a {\em proof} that there is no fast way to factor810integers. This is an article of ``faith'' among some811cryptographers.}812813If we were latter-day number-phenomenologists we might revel in the814discovery and proof that815$$816p=2^{43,112,609}-1 = 3164702693\ldots\ldots\text{\small (millions of digits)}\ldots\ldots6697152511817$$818is a prime number\index{prime number}, this number having $12{,}978{,}189$ digits! This819prime, which was discovered on August 23, 2008 by the820GIMPS project,\footnote{The GIMPS project website is \url{http://www.mersenne.org/}.}821is the first prime ever found with more than ten million digits,822though it is not the largest prime currently known.823824Now $2^{43,112,609}-1$ is quite a hefty number! Suppose someone came825up to you saying ``surely $p = 2^{43,112,609}-1$ is the largest prime826number!'' (which it is not). How might you convince that person that827he or she is wrong without explicitly exhibiting a larger prime?828\bibnote{We can use SageMath (at \url{http://sagemath.org}) to quickly compute the ``hefty829number'' $p = 2^{43,112,609}-1$. Simply type830{\tt p = 2\^{}43112609 - 1} to instantly compute $p$.831In what sense have we {\em computed} $p$? Internally, $p$ is now832stored in base $2$ in the computer's memory; given the special form of833$p$ it is not surprising that it took little time to compute. Much more834challenging is to compute all the base 10 digits of $p$, which takes835a few seconds: {\tt d = str(p)}.836Now type {\tt d[-50:]} to see the last 50 digits of $p$.837To compute the sum $58416637$ of the digits of $p$ type {\tt sum(p.digits())}.838}839840841Here is a neat---and, we hope, convincing---strategy to show there are842prime numbers\index{prime number} larger than $p = 2^{43,112,609} - 1$. Imagine843forming the following humongous number: let $M$ be the product of all844prime numbers\index{prime number} up to and including $p = 2^{43,11,2609} - 1$. Now go845one further than $M$ by taking the next number $N=M+1$.846847848OK, even though this number $N$ is wildly large, it is either a prime849number itself---which would mean that there would indeed be a prime850number larger than $p=2^{43,112,609} - 1$, namely $N$; or in any event it is851surely divisible by some prime number\index{prime number}, call it $P$.852853Here, now, is a way of seeing that this $P$ is bigger than $p$: Since854every prime number smaller than or equal to $p$ divides $M$, these855prime numbers cannot divide $N= M+1$ (since they divide $M$ evenly, if856you tried to divide $N=M+1$ by any of them you would get a remainder857of $1$). So, since $P$ does divide $N$ it must not be any of the858smaller prime numbers: $P$ is therefore a prime number bigger than $p=8592^{43,112,609}-1$.860861This strategy, by the way, is not very new: it is, in fact, well over862two thousand years old, since it already occurred in Euclid's\index{Euclid} {\em863Elements}\index{Euclid}. The Greeks did know that there are infinitely many prime864numbers and they showed it via the same method as we showed that our865$p = 2^{43,112,609} - 1$ is not the largest prime number.866867Here is the argument again, given very succinctly:868Given primes $p_1, \ldots, p_m$, let $n=p_1 p_2 \cdots p_m +8691$. Then $n$ is divisible by some prime not equal to any $p_i$,870so there are more than $m$ primes.871872You can think of this strategy as a simple game that you can873play. Start with the bag of prime numbers\index{prime number}874that contains just the two primes $2$ and $3$. Now each ``move'' of the game875consists of multiplying together all the primes you have in your bag876to get a number $M$, then adding $1$ to $M$ to get the larger877number $N=M+1$, then factoring $N$\index{factor} into prime number factors, and then878including all those new prime numbers\index{prime number} in your bag. Euclid's\index{Euclid} proof879gives us that we will---with each move of this game---be finding more880prime numbers: the contents of the bag will increase. After, say,881a million moves our bag will be guaranteed to contain more than882a million prime numbers.\index{prime number}883884For example, starting the game with your bag containing885only one prime number $2$, here is how your bag grows with886successive moves of the game:887888$\mbox{}\qquad\{2\}$889\newline890$\mbox{}\qquad\{2,3\}$891\newline892$\mbox{}\qquad\{2,3, 7\}$893\newline894$\mbox{}\qquad\{2,3, 7, 43\}$895\newline896$\mbox{}\qquad\{2,3, 7, 43, 13, 139\}$897\newline898$\mbox{}\qquad\{2,3, 7, 43, 13, 139, 3263443\}$899\newline900$\mbox{}\qquad\{2,3, 7, 43, 13, 139, 3263443, 547, 607, 1033, 31051\}$901\newline902$\mbox{}\qquad\{2,3, 7, 43, 13, 139, 3263443, 547, 607, 1033, 31051, 29881, 67003,\\903\mbox{}\qquad\qquad\qquad 9119521, 6212157481\}$904\newline905\mbox{}\qquad{}etc.\footnote{The sequence of prime numbers we find by this906procedure is discussed in more detail with references907in the Online Encyclopedia of Integer Sequences \url{http://oeis.org/A126263}.}908909Though there are infinitely many primes, explicitly finding large primes is a910major challenge. In the 1990s, the Electronic Frontier Foundation\index{Electronic Frontier Foundation}911\url{http://www.eff.org/awards/coop} offered a \$100{,}000 cash reward912to the first group to find a prime with at least 10{,}000{,}000 decimal913digits (the group that found the record prime $p$ above won this prize\footnote{%914See \url{http://www.eff.org/press/archives/2009/10/14-0}.915Also the 46th Mersenne prime was declared by {\em Time Magazine} to be916one of the top 50 best ``inventions'' of 2008: \url{http://www.time.com/time/specials/packages/article/0,28804,1852747_1854195_1854157,00.html}.}), and offers917another \$150{,}000 cash prize to the first group to find a prime with918at least 100{,}000{,}000 decimal digits.919920921The number $p= 2^{43,112,609} - 1$ was for a time922the largest prime known, where by923``know'' we mean that we know it so explicitly that we can {\em924compute} things about it. For example, the last two digits of $p$925are both 1 and the sum of the digits of $p$ is 58,416,637.\label{sumdigits} Of course $p$ is not926the largest prime number\index{prime number} since there are infinitely many primes, e.g.,927the next prime $q$ after $p$ is a prime. But there is no known way928to efficiently compute anything interesting about $q$. For example,929what is the last digit of $q$ in its decimal expansion?930931932\chapter{``Named'' prime numbers}\label{ch:namedprimes}\index{prime number}933Prime numbers come in all sorts of shapes, some more convenient to934deal with than others. For example, the number we have been talking935about, $$p = 2^{43,112,609}-1,$$ is given to us, by its very notation,936in a striking form; i.e., {\it one less than a power of $2$}. It is no937accident that the largest ``currently known'' prime number\index{prime number} has such a938form. This is because there are special techniques we can draw on to939show primality of a number, if it is one less than a power of $2$940and---of course---if it also happens to be prime. The primes of that941form have a name, {\it Mersenne Primes},\index{Mersenne primes} as do the primes that are942{\it one more than a power of $2$}, those being called {\it Fermat943Primes}.944\bibnote{945In contrast to the situation with factorization, testing integers of946this size (e.g., the primes $p$ and $q$) for primality is relatively947easy. There are fast algorithms that can tell whether or not any948random thousand digit number is prime in a fraction of949second. Try for yourself using the SageMath command {\tt is\_prime}.950For example, if $p$ and $q$ are as in endnote~\ref{bibnote:factor}, then951{\tt is\_prime(p)} and {\tt is\_prime(q)} quickly output True952and {\tt is\_prime(p*q)} outputs False. However, if you type953{\tt factor(p*q, verbose=8)} you can watch as SageMath tries954forever and fails to factor $pq$.}955956Here are two exercises that you might try to do, if this is your first957encounter with primes that differ from a power of $2$ by $1$:958959\begin{enumerate}960\item Show that if a number of the form $M=2^n-1$ is prime, then the961exponent $n$ is also prime. [Hint: This is equivalent to proving962that if $n$ is composite, then $2^n-1$ is also composite.]963For example: $ 2^2-1= 3, 2^3-1= 7$ are964primes, but $2^4-1=15$ is not. So {\it Mersenne primes} are numbers965that are966\begin{itemize}967\item of the form $\displaystyle 2^\text{prime number} -1,$ and968\item are themselves prime numbers.969\end{itemize}970971\item Show that if a number of the form $F=2^n+1$ is prime, then the972exponent $n$ is a power of two. For example: $ 2^2+1= 5$ is prime,973but $2^3+1= 9$ is not. So {\it Fermat primes} are numbers that974are975\begin{itemize}976\item of the form $\displaystyle 2^\text{power of two} + 1,$ and977\item are themselves prime numbers.978\end{itemize}979\end{enumerate}980981982Not all numbers of the form $2^{\rm prime\ number} -1$ or of the form983$2^{\rm power\ of\ two} +1$ are prime. We currently know only finitely984many primes of either of these forms. How we have come to know what we985know is an interesting tale. See, for example, \url{http://www.mersenne.org/}.986987%\ill{sieve_boxes_100}{0.7}{Sieve of Eratosthenes\index{Eratosthenes}\label{fig:erat}}988989\chapter{Sieves}\label{ch:sieves}990991Eratosthenes,\index{Eratosthenes} the mathematician from Cyrene (and later, librarian at992Alexandria) explained how to {\em sift} the prime numbers\index{prime number} from the993series of all numbers: in the sequence of numbers,994$$2\ \ 3\ \ 4 \ \ 5\ \ 6\ \ 7\ \ 8\ \ 9\ \ 10\ \ 11995\ \ 12\ \ 13\ \ 14\ \ 15\ \ 16\ \ 17\ \ 18\ \ 19\ \ 20\ \ 21\ \ 22\ \ 23\ \ 24\ \ 25\ \ 26,$$996for example, start by circling the $2$ and crossing out all the other997multiples of $2$. Next, go back to the beginning of our sequence of998numbers and circle the first number that is neither circled nor999crossed out (that would be, of course, the $3$), then cross out all1000the other multiples of $3$. This gives the pattern: go back again to1001the beginning of our sequence of numbers and circle the first number1002that is neither circled nor crossed out; then cross out all of its1003other multiples. Repeat this pattern until all the numbers in our1004sequence are either circled, or crossed out, the circled ones being1005the primes.10061007\includegraphics[width=\textwidth]{illustrations/circled_primes}10081009In Figures~\ref{fig:erat2}--\ref{fig:erat2357} we use the primes $2$,1010$3$, $5$, and finally $7$ to sieve out the primes up to 100, where instead of1011crossing out multiples we grey them out, and instead of circling1012primes we color their box red.10131014\ill{sieve100-2}{.8}{Using the prime 2 to sieve for primes up to 100\label{fig:erat2}}1015Since all the even numbers greater than two are1016eliminated as being composite numbers and not primes they appear as1017gray in Figure~\ref{fig:erat2}, but none of the odd numbers are eliminated so they1018still appear in white boxes.10191020\ill{sieve100-3}{.8}{Using the primes 2 and 3 to sieve for primes up to 100\label{fig:erat23}}1021\ill{sieve100-5}{.8}{Using the primes 2, 3, and 5 to sieve for primes up to 100\label{fig:erat235}}1022Looking at Figure~\ref{fig:erat235}, we see that for all but1023three numbers (49, 77, and 91) up to 100 we have1024(after sieving by 2,3, and 5) determined1025which are primes and which composite.10261027\ill{sieve100-7}{.8}{Using the primes 2, 3, 5, and 7 to sieve for primes up to 100\label{fig:erat2357}}10281029Finally, we see in Figure~\ref{fig:erat2357} that1030sieving by 2, 3, 5, and 7 determines all primes up to $100$.1031See \bibnote{In Sage, the1032function {\tt prime\_range} enumerates primes in a range. For example,1033{\tt prime\_range(50)}1034outputs the primes up to $50$1035and {\tt prime\_range(50,100)}1036outputs the primes between $50$ and $100$.1037Typing {\tt prime\_range(10\^{}8)}1038in SageMath enumerates the primes up to a hundred million in around a second.1039You can also enumerate primes up to a billion by typing {\tt v=prime\_range(10\^{}9)}, but this will use1040a large amount of memory, so be careful not to crash your computer1041if you try this. You can see that there are $\pi(10^9) = 50{,}847{,}534$ primes up to a billion1042by then typing {\tt len(v)}.1043You can also compute $\pi(10^9)$ directly, without enumerating all primes,1044using the command {\tt prime\_pi(10\^{}9)}.1045This is much faster since it uses some clever counting tricks to find the1046number of primes without actually listing them all.10471048In Chapter~\ref{sec:tinkering} we tinkered with the staircase of1049primes by first counting both primes and prime powers.1050There are comparatively few prime powers that are not prime.1051Up to $10^8$, only $1{,}405$ of the $5{,}762{,}860$ prime powers are not themselves primes.1052To see this, first enter1053{\tt a~=~prime\_pi(10\^{}8); pp~=~len(prime\_powers(10\^{}8))}.1054Typing {\tt (a,~pp,~pp-a)} then outputs the triple1055{\tt (5761455, 5762860, 1405)}.} for more about explicitly1056enumerating primes using a computer.10571058% GIVEN WHAT IS ABOVE, THIS IS SILLY.1059%Especially if you have had little experience with math, may I suggest1060%that you actually follow Eratosthenes'\index{Eratosthenes} lead, and perform the repeated1061%circling and crossing-out to watch the primes up to 100 emerge, intriguingly1062%staggered through our sequence of numbers,1063%$$2\ \ 3\ \ \bullet \ \ 5\ \ \bullet\ \ 7\ \ \bullet\ \ \bullet\ \ \bullet\ \ 111064%\ \ \bullet\ \ 13\ \ \bullet\ \ \bullet\ \ \bullet\ \ 17\ \ \bullet\ \ 19\ \ \bullet\ \ \bullet\ \1065%\bullet\1066%\ 23\ \ \bullet\ \ \bullet\ \ \bullet\ \ \bullet\ \ \bullet\ 29,\dots $$1067%10681069%107010711072\chapter[Questions about primes]{Questions about primes that any person might ask}\label{ch:questions}10731074We become quickly stymied when we ask quite elementary questions about1075the spacing of the infinite series of prime numbers.\index{prime number}10761077For example, {\em are there infinitely many pairs of primes whose1078difference is $2$?} The sequence of primes seems to be rich in such1079pairs $$5-3 =2,\ \ \ 7-5=2,\ \ \ 13-11=2,\ \ \ 19-17 =2,$$ and we know1080that there are loads more such pairs\footnote{For example, according to1081\url{http://oeis.org/A007508} there are1082$10{,}304{,}185{,}697{,}298$ such pairs less than1083$10{,}000{,}000{,}000{,}000{,}000$.} but the answer to our question,1084{\em are there infinitely many?}, is not known. The conjecture that there are infinitely many such pairs of primes (``twin primes" as they are called) is known as the {\it Twin Primes Conjecture}.1085{\em Are there1086infinitely many pairs of primes whose difference is $4$, $6$?} Answer:1087equally unknown.1088Nevertheless there is very exciting recent work in this direction, specifically,1089Yitang Zhang\index{Zhang, Yitang} proved that there are infinitely many pairs of primes that differ1090by no more than $7\times 10^7$. For a brief account of1091Zhang's\index{Zhang, Yitang} work, see the Wikipedia entry \url{http://en.wikipedia.org/wiki/Yitang_Zhang}.1092Many exciting results have followed Zhang's\index{Zhang, Yitang} breakthrough;1093we know now, thanks to results\footnote{See \url{https://www.simonsfoundation.org/quanta/20131119-together-and-alone-closing-the-prime-gap/} and for further work1094\url{http://michaelnielsen.org/polymath1/index.php?title=Bounded_gaps_between_primes}.} of James Maynard and1095others, that there are infinitely many pairs of primes1096that differ by no more than $246$.10971098{\em Is every even number greater than $2$ a sum of1099two primes?} Answer: unknown. {\em Are there infinitely many primes1100which are $1$ more than a perfect square?} Answer: unknown.11011102\ill{zhang}{0.25}{Yitang\index{Zhang, Yitang} Zhang\label{fig:zhang}}1103110411051106Remember the Mersenne prime $p= 2^{43,112,609}-1$ from1107Chapter~\ref{ch:namedprimes} and how we1108proved---by pure thought---that there must be1109a prime $P$ larger than $p$? Suppose, though, someone1110asked us whether there was a {\it Mersenne Prime} larger than this1111$p$: that is, {\em is there a prime number\index{prime number} of the form $$2^{\rm some\1112prime\ number}-1$$ bigger than $p= 2^{43,112,609}-1$?} Answer:1113For many years we did not know; however, in 2013 Curtis Cooper discovered1114the even bigger Mersenne prime $2^{57,885,161}-1$, with a whopping111517,425,170 digits! Again we can ask if there is a Mersenne1116prime larger than Cooper's. Answer: we do not know.1117It is possible that there are infinitely many Mersenne primes1118but we're far from being able to answer such questions.11191120% from http://www.york.ac.uk/depts/maths/histstat/people/mersenne.gif1121\ill{mersenne}{.3}{Marin Mersenne (1588--1648)}1122112311241125{\em Is there some neat formula giving the next prime?} More1126specifically, {\em if I give you a number $N$, say $N=$ one million,1127and ask you for the first number after $N$ that is prime, is there a1128method that answers that question without, in some form or other,1129running through each of the successive odd numbers after $N$ rejecting1130the nonprimes until the first prime is encountered?} Answer:1131unknown.113211331134%%%%%%%%%%%%%%%11351136One can think of many ways of ``getting at'' some understanding of the1137placement of prime numbers\index{prime number} among all numbers. Up to this point we have1138been mainly just counting them, trying to answer the question ``how1139many primes are there up to $X$?'' and we have begun to get some feel1140for the numbers behind this question, and especially for the current1141``best guesses'' about estimates.114211431144What is wonderful about this subject is that people attracted to it1145cannot resist asking questions that lead to interesting, and sometimes1146surprising numerical experiments. Moreover, given our current state of1147knowledge, many of the questions that come to mind are still1148unapproachable: we don't yet know enough about numbers to answer them.1149But {\it asking interesting questions} about the mathematics that you1150are studying is a high art, and is probably a necessary skill to1151acquire, in order to get the most enjoyment---and understanding---from1152mathematics. So, we offer this challenge to you:11531154Come up with your own question about primes that1155\begin{itemize}1156\item is interesting to you,1157\item is not a question whose answer is known to you,1158\item is not a question that you've seen before; or at least not exactly,1159\item is a question about which you can begin to make numerical investigations.1160\end{itemize}1161If you are having trouble coming up with a question, read on for more1162examples that provide further motivation.11631164\chapter{Further questions about primes\label{ch:further}}11651166In celebration of Yitang Zhang's\index{Zhang, Yitang} recent result, let us consider more of the numerics1167regarding {\em gaps} between one prime and the next, rather than the tally1168of all primes. Of course, it is no fun at all to try to guess how many1169pairs of primes $p, q$ there are with gap $q-p$ equal to a fixed odd1170number, since the difference of two odd numbers is even, as1171in Chapter~\ref{ch:questions}. The fun,1172though, begins in earnest if you ask for pairs of primes with1173difference equal to $2$ (these being called {\em twin primes}) for it1174has long been guessed that there are infinitely many such pairs of1175primes, but no one has been able to prove this yet.11761177%NOTE: I omitted the commas in the numbers in the displayed formula.1178As of 2014, the largest known twin primes are1179$$3756801695685\cdot 2^{666669} \pm 1.$$1180These enormous primes, which were found in 2011, have $200{,}700$ digits each.\footnote{See \url{http://primes.utm.edu/largest.html\#twin} for the top ten1181largest known twin primes.}118211831184Similarly, it is interesting to consider primes $p$ and $q$1185with difference $4$, or $8$, or---in fact---any even number1186$2k$. That is, people have guessed that there are infinitely many1187pairs of primes with difference $4$, with difference $6$, etc.\ but1188none of these guesses have yet been proved.118911901191119211931194So, define1195$$1196\Gap_{k}(X)1197$$1198to be the number of pairs of {\em consecutive} primes $(p,q)$ with1199$q<X$ that have ``gap $k$'' (i.e., such that their difference $q-p$ is1200$k$). Here $p$ is a prime, $q>p$ is a prime, and there are no primes1201between $p$ and $q$. For example, $\Gap_2(10) = 2$, since the pairs1202$(3,5)$ and $(5,7)$ are the pairs less than $10$ with gap $2$,1203and $\Gap_{4}(10)=0$ because despite $3$ and $7$ being separated1204by $4$, they are not consecutive primes.1205See Table~\ref{tab:gap} for various values of $\Gap_{k}(X)$ and1206Figure~\ref{fig:primegapdist} for the distribution\index{distribution} of prime gaps for1207$X=10^7$.120812091210\begin{table}[H]\centering1211\caption{Values of $\Gap_{k}(X)$ \label{tab:gap}}1212\vspace{1em}12131214{\small1215\begin{tabular}{|l|c|c|c|c|c|c|}\hline1216$X$ & $\Gap_{2}(X)$ & $\Gap_{4}(X)$& $\Gap_{6}(X)$ & $\Gap_{8}(X)$ &1217$\Gap_{100}(X)$ & $\Gap_{246}(X)$\\\hline12181219$10$ & 2 & 0 & 0 & 0 & 0 & 0\\\hline1220$10^{2}$ & 8 & 7 & 7 & 1 & 0 & 0\\\hline1221$10^{3}$ & 35 & 40 & 44 & 15 & 0 & 0\\\hline1222$10^{4}$ & 205 & 202 & 299 & 101 & 0 & 0\\\hline1223$10^{5}$ & 1224 & 1215 & 1940 & 773 & 0 & 0\\\hline1224$10^{6}$ & 8169 & 8143 & 13549 & 5569 & 2 & 0\\\hline1225$10^{7}$ & 58980 & 58621 & 99987 & 42352 & 36 & 0\\\hline1226$10^{8}$ & 440312 & 440257 & 768752 & 334180 & 878 & 0\\\hline12271228\end{tabular}1229}1230\end{table}12311232The recent results of Zhang\index{Zhang, Yitang} as sharpened by Maynard (and others) we mentioned above tell us that for at least one even number $k$ among the even numbers $k \le 246$, $\Gap_{k}(X)$ goes to infinity as $X$ goes to infinity. One expects that this happens for {\it all} even numbers $k$. We expect this as well, of course, for $\Gap_{246}(X)$ despite what might be misconstrued as discouragement by the above data.1233% NOTE: the biggest gap up to $10^{8}$ is 220, as this 30s comp shows:1234% v = prime_range(10^8)1235% gaps = [v[i]-v[i-1] for i in range(1,len(v))]1236% max(gaps)12371238\ill{primegapdist}{1}{Frequency histogram showing the distribution\index{distribution} of1239prime gaps of size $\leq 50$ for all primes up to $10^7$. Six is1240the most popular gap in this data.1241\label{fig:primegapdist}}124212431244\ill{primegap_race}{1}{Plots of $\Gap_k(X)$ for $k=2,4,6,8$. Which wins?}12451246Here is yet another question that deals with the spacing of prime1247numbers that we do not know the answer to:12481249{\em Racing Gap $2$, Gap $4$, Gap $6$, and Gap $8$ against each other:}12501251\begin{quote}1252Challenge: As $X$ tends to infinity which of $\Gap_2(X)$,1253$\Gap_4(X)$,1254$\Gap_6(X),$ or $\Gap_8(X)$ do you think will grow faster? How1255much would you bet on the truth of your guess? \bibnote{%1256Hardy and Littlewood give a nice conjectural answer to such1257questions about gaps between primes. See Problem {\bf A8} of1258Guy's book {\em Unsolved Problems in Number Theory} (2004).1259Note that Guy's book discusses counting the number $P_k(X)$ of pairs of1260primes up to $X$ that differ by a fixed even number $k$; we have1261$P_k(X)\geq \Gap_k(X)$, since for $P_k(X)$ there is no requirement1262that the pairs of primes be consecutive.}1263\end{quote}1264126512661267126812691270127112721273Here is a curious question that you can easily begin to check out for1274small numbers. We know, of course, that the {\em even} numbers and the1275{\em odd} numbers are nicely and simply distributed: after every odd1276number comes an even number, after every even, an odd. There are an1277equal number of positive odd numbers and positive1278even numbers less than any given odd1279number, and there may be nothing else of interest to say about the1280matter. Things change considerably, though, if we focus our1281concentration on {\em multiplicatively even} numbers and {\em1282multiplicatively odd} numbers.128312841285A {\bf multiplicatively even} number is one that can be expressed as a1286product of {\em an even number of} primes; and a {\bf multiplicatively1287odd} number is one that can be expressed as a product of {\em an odd1288number of} primes. So, any prime is multiplicatively odd, the1289number $4 = 2\cdot 2$ is multiplicatively even, and so is $6=2\cdot12903$, $9=3\cdot 3$, and $10= 2\cdot 5$; but $12=2\cdot 2\cdot 3$ is1291multiplicatively odd. Below we list the numbers up to 25, and underline1292and bold the multiplicatively odd numbers.1293\begin{center}12941\, {\bf \underline{2}}\, {\bf \underline{3}}\, 4\, {\bf1295\underline{5}}\, 6\, {\bf \underline{7}}\, {\bf \underline{8}}\, 9\,129610\, {\bf \underline{11}}\, {\bf \underline{12}}\, {\bf1297\underline{13}}\, 14\, 15\, 16\, {\bf \underline{17}}\, {\bf1298\underline{18}}\, {\bf \underline{19}}\, {\bf \underline{20}}\, 21\,129922\, {\bf \underline{23}}\, 24\, 251300\end{center}130113021303Table~\ref{tab:evenodddata} gives some data:13041305\begin{table}[H] \caption{Count of multiplicatively odd and1306even positive numbers $\le X$\label{tab:evenodddata}}1307\vspace{1ex}1308\centering1309{\small1310\begin{tabular}{|l|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|}1311\hline1312$X$ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 & 16 \\ \hline\hline1313m. odd & 0 & 1 & 2 & 2 & 3 & 3 & 4 & 5 & 5 & 5 & 6 & 7 & 8 & 8 & 8 & 8 \\ \hline1314m. even & 1 & 1 & 1 & 2 & 2 & 3 & 3 & 3 & 4 & 5 & 5 & 5 & 5 & 6 & 7 & 8 \\ \hline1315\end{tabular}}1316\end{table}131713181319Now looking at this data, a natural, and simple, question to ask about the concept of multiplicative {\em oddness} and {\em evenness} is:13201321\noindent {\em Is there some $X\ge 2$ for which there are more1322multiplicatively even numbers less than or equal to $X$ than1323multiplicatively odd ones?}13241325Each plot in Figure~\ref{fig:liouville} gives the number of1326multiplicatively even numbers between $2$ and $X$ minus the number1327of multiplicatively odd numbers between $2$ and $X$, for $X$ equal1328to 10, 100, 1000, 10000, 100000, and 1000000. The above question1329asks whether these graphs would, for sufficiently large $X$, ever1330cross the $X$-axis.13311332\begin{figure}[H]1333\centering1334\includegraphics[width=.45\textwidth]{illustrations/liouville-17}1335\includegraphics[width=.45\textwidth]{illustrations/liouville-100}\\1336\includegraphics[width=.45\textwidth]{illustrations/liouville-1000}1337\includegraphics[width=.45\textwidth]{illustrations/liouville-10000}\\1338\includegraphics[width=.45\textwidth]{illustrations/liouville-100000}1339\includegraphics[width=.45\textwidth]{illustrations/liouville-1000000}\\1340\caption{Racing Multiplicatively Even and Odd Numbers.\label{fig:liouville}}1341\end{figure}13421343A {\em negative} response to this question---i.e., a proof that any1344plot as drawn in Figure~\ref{fig:liouville} never crosses1345the $X$-axis---would imply the Riemann Hypothesis! In contrast to the list of1346previous questions, the answer to this question is known\footnote{For more1347details, see P.~Borwein, ``Sign changes in sums of the Liouville1348Function'' and the nice short paper of Norbert Wiener ``Notes on1349Polya's and Turan's hypothesis concerning Liouville's factor''1350(page 765 of volume II of Wiener's Collected Works); see also:1351G. P\'{o}lya ``Verschiedene Bemerkungen zur Zahlentheorie,''1352{\em Jahresbericht der Deutschen Mathematiker-Vereinigung},1353{\bf 28} (1919)135431--40.}: alas, there is such an $X$. In 1960, Lehman showed that1355for $X=906{,}400{,}000$ there are $708$ more multiplicatively even numbers1356up to $X$ than multiplicatively odd numbers (Tanaka found in 1980 that1357the smallest $X$ such that there are more multiplicative even than odd1358numbers is $X=906{,}150{,}257$).13591360%%%%%%%%%%%%%%%%%1361%\begin{wrapfigure}{r}{0.2\textwidth}1362% \includegraphics[width=0.2\textwidth]{illustrations/questions}1363%\end{wrapfigure}1364These are questions that have been asked about primes (and we could1365give bushels more\footnote{See, e.g., Richard Guy's book {\em Unsolved Problems in Number Theory} (2004).}),1366questions expressible in simple vocabulary, that1367we can't answer today. We have been studying numbers for over two1368millennia and yet we are indeed in the infancy of our understanding.136913701371We'll continue our discussion by returning to the simplest counting1372question about prime numbers.\index{prime number}13731374\chapter{How many primes are there?}13751376\ill{sieve200}{.8}{Sieving primes up to 200}13771378Slow as we are to understand primes, at the very least we can try to1379count them. You can see that there are $10$ primes less than $30$, so1380you might encapsulate this by saying that the chances that a number1381less than $30$ is prime is $1$ in $3$. This frequency does not1382persist, though; here is some more data: There are $25$ primes less1383than $100$ (so $1$ in $4$ numbers up to $100$ are prime), there are1384$168$ primes less than a thousand (so we might say that among the1385numbers less than a thousand the chances that one of them is prime is1386roughly $1$ in $6$).13871388\ill{proportion_primes_100}{1}{Graph of the proportion of primes up to $X$ for each integer $X\leq 100$}13891390There are 78,498 primes less than a million (so we might say that1391the chances that a random choice among the first million numbers is1392prime have dropped to roughly $1$ in $13$).13931394\ill{proportion_primes_1000}{1}{Proportion of primes for $X$ up to $1{,}000$}1395\ill{proportion_primes_10000}{1}{Proportion of primes for $X$ up to $10{,}000$}13961397There are 455,052,512 primes less than ten billion; i.e.,139810{,}000{,}000{,}000 (so we might say that the chances are down to roughly1399$1$ in $22$).14001401Primes, then, seem to be thinning out. We return to the sifting process1402we carried out earlier, and take a look at a few graphs, to get a sense of why1403that might be so. There are a $100$ numbers less than or equal to1404$100$, a thousand numbers less than or equal to $1000$, etc.: the1405graph in Figure~\ref{fig:sieve_2_100} that looks like a regular staircase, each step the1406same length as each riser, climbing up at, so to speak, a 45 degree1407angle, counts all numbers up to and including~$X$.14081409Following Eratosthenes,\index{Eratosthenes} we have sifted those numbers, to pan for1410primes. Our first move was to throw out roughly half the numbers (the1411even ones!) after the number $2$. The graph that is labeled ``Sieve by 2" in Figure~\ref{fig:sieve_2_100} that is, with one hiccup, a regular staircase climbing at a1412smaller angle, each step twice the length of each riser, illustrates1413the numbers that are left after one pass through Eratosthenes'\index{Eratosthenes} sieve,1414which includes, of course, all the primes. So, the chances that a1415number bigger than $2$ is prime is {\em at most} $1$ in $2$. Our1416second move was to throw out a good bunch of numbers bigger than $3$.1417So, the chances that a number bigger than $3$ is prime is going to be1418even less. And so it goes: with each move in our1419sieving process, we are winnowing the field more extensively, reducing1420the chances that the later numbers are prime.14211422\ill{sieve_2_100}{.9}{Sieving by removing multiples of $2$ up to $100$\label{fig:sieve_2_100}}1423\ill{sieve1000}{.9}{Sieving for primes up to $1000$}14241425The red curve in these figures actually counts the primes: it is the1426beguilingly irregular {\em staircase of primes.} Its height above any1427number $X$ on the horizontal line records the number of primes less1428than or equal to $X$, the accumulation of primes up to $X$. Refer to1429this number as $\pi(X)$. So $\pi(2)=1$, $\pi(3) = 2$, $\pi(30) = 10$;1430of course, we could plot a few more values of $\pi(X)$, like1431$\pi(\text{ten billion}) = 455,052,512$.143214331434Let us accompany Eratosthenes\index{Eratosthenes} for a few further steps in his sieving1435process. Figure~\ref{fig:sieve3_100} contains a graph of all whole1436numbers up to 100 after we have removed the even numbers greater than1437$2$, and the multiples of $3$ greater than $3$ itself.143814391440\ill{sieves3_100}{.7}{Sieving out multiples of $2$ and $3$.\label{fig:sieve3_100}}144114421443From this graph you can see that if you go ``out a way'' the1444likelihood that a number is a prime is less than $1$ in $31445$. Figure~\ref{fig:sieve7_100} contains a graph of what Eratosthenes\index{Eratosthenes}1446sieve looks like up to 100 after sifting $2,3,5,$ and $7$.1447144814491450\ill{sieves7_100}{.7}{Sieving out multiples of $2$, $3$, $5$, and $7$.\label{fig:sieve7_100}}145114521453This data may begin to suggest to you that as you go further and1454further out on the number line the percentage of prime numbers\index{prime number} among1455all whole numbers tends towards $0\%$ (it does).145614571458To get a sense of how the primes accumulate, we will take a look at1459the staircase of primes for $X= 25$ and $X=100$ in Figures~\ref{fig:staircase25}1460and \ref{fig:staircase100a}.14611462\ill{prime_pi_25_aspect1}{.8}{Staircase of primes up to 25\label{fig:staircase25}}1463\ill{prime_pi_100_aspect1}{.8}{Staircase of primes up to 100\label{fig:staircase100a}}1464146514661467\chapter{Prime numbers viewed from a distance}\index{prime number}1468The striking thing about these figures is that as the numbers get1469large enough, the jagged accumulation of primes, those1470quintessentially discrete entities, becomes smoother and smoother to1471the eye. How strange and wonderful to watch, as our viewpoint zooms1472out to larger ranges of numbers, the accumulation of primes taking on1473such a smooth and elegant shape.14741475\ill{prime_pi_1000}{.8}{Staircases of primes up to 1,000\label{fig:staircases2}}1476\ill{prime_pi_10000}{.8}{Staircases of primes up to 10{,}000\label{fig:staircases2b}}14771478\ill{prime_pi_100000}{.8}{Primes up to 100{,}000\label{fig:pn100000}.}1479148014811482But don't be fooled by the seemingly smooth shape of the curve in the1483last figure above: it is just as faithful a reproduction of the1484staircase of primes as the typographer's art can render, for there are1485thousands of tiny steps and risers in this curve, all hidden by the1486thickness of the print of the drawn curve in the figure. It is1487already something of a miracle that we can approximately describe the1488build-up of primes, somehow, using a {\em smooth curve}. But {\em1489what} smooth curve?149014911492% NOTE about myriad below: After reading http://english.stackexchange.com/questions/20133/what-is-the-correct-usage-of-myriad, I'm not changing "myriad of smooth" to "myriad smooth". The of version, which we use, is correct when myriad is used as a noun, which is now a legal usage in english, according to the above page.1493That last question is {\em not} rhetorical. If you draw a curve with1494chalk on the blackboard, this can signify a myriad of smooth1495(mathematical) curves all encompassed within the thickness of the1496chalk-line, all---if you wish---reasonable approximations of one1497another. So, there are many smooth curves that fit the chalk-curve.1498With this warning, but very much fortified by the data of Figure~\ref{fig:pn100000},1499let us ask: {\em what is a smooth curve that is a reasonable1500approximation to the staircase of primes?}15011502\chapter{Pure and applied mathematics}\label{ch:pureapplied}15031504Mathematicians seems to agree that, loosely speaking, there are two1505types of mathematics: {\em pure} and {\em applied}. Usually---when we1506judge whether a piece of mathematics is pure or applied---this1507distinction turns on whether or not the math has application to the1508``outside world,'' i.e., that {\em world} where bridges are built,1509where economic models are fashioned, where computers churn away on the1510Internet (for only then do we unabashedly call it {\em applied math}),1511or whether the piece of mathematics will find an important place1512within the context of mathematical theory (and then we label it {\em1513pure}). Of course, there is a great overlap (as we will see later,1514Fourier analysis\index{Fourier analysis} plays a major role both in data compression and in1515pure mathematics).15161517Moreover, many questions in1518mathematics are ``hustlers'' in the sense that, at first view, what is1519being requested is that some simple task be done (e.g., the question1520raised in this book, {\em to find a smooth curve that is a reasonable1521approximation to the staircase of primes}). And only as things1522develop is it discovered that there are payoffs in many unexpected1523directions, some of these payoffs being genuinely applied (i.e., to1524the practical world), some of these payoffs being pure (allowing us1525to strike behind the mask of the mere appearance of the mathematical1526situation, and get at the hidden fundamentals that actually govern the1527phenomena), and some of these payoffs defying such simple1528classification, insofar as they provide powerful techniques in other1529branches of mathematics. The \RH{}---even in its current1530unsolved state---has already shown itself to have all three types of1531payoff.15321533The particular issue before us is, in our opinion, twofold, both1534applied, and pure: can we curve-fit the ``staircase of primes'' by a1535well approximating smooth curve given by a simple analytic formula?1536The story behind this alone is1537marvelous, has a cornucopia of applications, and we will be telling it1538below. But our curiosity here is driven by a question that is pure,1539and less amenable to precise formulation: are there mathematical1540concepts at the root of, and more basic than (and ``prior to,'' to1541borrow Aristotle's use of the phrase) {\em prime numbers}---concepts\index{prime number}1542that account for the apparent complexity of the nature of primes?154315441545\chapter{A probabilistic first guess\label{sec:firstguess} }15461547\ill{gauss}{.3}{Gauss\index{Gauss, Carl Friedrich}}15481549The search for such approximating curves began, in fact, two centuries1550ago when Carl Friedrich Gauss\index{Gauss, Carl Friedrich} defined a certain beautiful curve that,1551experimentally, seemed to be an exceptionally good fit for the1552staircase of primes.15531554\ill{pi_Li}{.8}{Plot of $\pi(X)$ and Gauss's smooth curve $G(X)$}15551556Let us denote Gauss's curve $G(X)$; it has an1557elegant simple formula comprehensible to anyone who has had a tiny bit1558of calculus.\index{calculus} If you make believe that the chances that a number $X$ is1559a prime is inversely proportional to the number of digits of $X$ you1560might well hit upon Gauss's curve.1561That is,1562\vskip10pt1563$$G(X)\hskip40pt {\rm is\ roughly\ proportional\ to} \hskip40pt {\frac{X}{{\rm the\ number\ of\ digits\ of\ } X}}.$$1564\vskip10pt1565But to describe Gauss's\index{Gauss, Carl Friedrich} guess precisely we need to discuss the {\it natural logarithm}\footnote{In advanced mathematics, ``common'' logarithms are sufficiently uncommon that ``log'' almost always denotes natural1566log and the notation $\ln(X)$ is not used.} ``$\log(X)$'' which is an elegant1567smooth function of a real number $X$ that is roughly proportional1568to the number of digits of the whole number part of $X$.1569157015711572\ill{log}{.8}{Plot of the natural logarithm $\log(X)$}15731574%NB: the sequence below converges very, very slowly to e.1575Euler's\index{Euler, Leonhard} famous constant $e=2.71828182\ldots$, which is the limit1576of the sequence1577$$\left(1+{\frac{1}{2}}\right)^2,1578\left(1+{\frac{1}{3}}\right)^3,1579\left(1+{\frac{1}{4}}\right)^4, \dots,$$1580is used in the definition of $\log$:1581\begin{center}1582{\em $A = \log(X)$ is the number $A$ for which $e^A = X$.}1583\end{center}1584Before electronic calculators, logarithms were frequently used to1585speed up calculations, since logarithms translate difficult multiplication1586problems into easier addition problems which can be done mechanically.1587Such calculations use that the logarithm of a product is the sum of the logarithms1588of the factors\index{factor}; that is, $$\log(X Y) = \log(X) + \log(Y).$$1589%FROM -- YURI TSCHINKEL's: ''ABOUT THE COVER: ON THE DISTRIBUTION\index{distribution} OF PRIMESÑGAUSSÕ TABLES ''15901591% From: http://en.wikipedia.org/wiki/File:Slide_rule_scales_back.jpg1592\ill{slide_rule}{.8}{A slide rule computes $2X$ by using that $\log(2X)=\log(2)+\log(X)$\label{fig:slide_rule}}15931594In Figure~\ref{fig:slide_rule} the numbers printed (on each of the slidable pieces of the rule)1595are spaced according to their logarithms, so that when one slides the1596rule arranging it so that the printed number $X$ on one piece lines up1597with the printed number $1$ on the other, we get that for every number $Y$1598printed on the first piece, the printed number on the other piece that1599is aligned with it is the product $XY$; in effect the ``slide'' adds1600$\log(X)$ to $\log(Y)$ giving $\log(XY)$.16011602% I found this here: http://kr.blog.yahoo.com/kimxx168/4981603% It is referenced in http://www.ams.org/bull/2006-43-01/S0273-0979-05-01096-7/home.html1604% but that link is broken.1605\ill{gauss_tables_half}{.9}{A Letter of Gauss\label{fig:gauss_letter}}16061607In 1791, when Gauss\index{Gauss, Carl Friedrich} was 14 years old, he received a book that contained1608logarithms of numbers up to $7$ digits and a table of primes up to 10{,}009.1609Years later, in a letter1610written in 1849 (see Figure~\ref{fig:gauss_letter}), Gauss\index{Gauss, Carl Friedrich}1611claimed that as early as 1792 or 1793 he had already observed that the1612density of prime numbers\index{prime number} over intervals of numbers of a given rough1613magnitude $X$ seemed to average $1/\log(X)$.16141615Very {\em very} roughly speaking, this1616means that {\em the number of primes up to $X$ is approximately $X$ divided by1617twice the number of digits of $X$}. For example,1618the number of primes less than $99$ should be roughly1619$$1620\frac{99}{2\times 2} = 24.75 \approx 25,1621$$1622which is pretty amazing, since the correct number of1623primes up to $99$ is $25$. The number of primes up to $999$ should1624be roughly1625$$1626\frac{999}{2\times 3} = 166.5 \approx 167,1627$$1628which is again close, since there are 168 primes up to $1000$.1629The number of primes up to $999{,}999$ should be roughly1630$$1631\frac{999999}{2\times 6} = 83333.25 \approx 83{,}333,1632$$1633which is close to the correct count of $78{,}498$.16341635Gauss\index{Gauss, Carl Friedrich} guessed that the expected number of primes up to $X$1636is approximated by the area under the1637graph of $1/\log(X)$ from $2$ to $X$ (see Figure~\ref{fig:G}).1638The area under $1/\log(X)$ up to $X=999{,}999$ is $78{,}626.43\ldots$, which1639is remarkably close to the correct count $78{,}498$ of the primes1640up to $999{,}999$.16411642\illthree{area_under_log_graph_30}{area_under_log_graph_100}{area_under_log_graph_1000}{.27}{The1643expected tally of the number of primes $\leq X$ is approximated by the1644area underneath the graph of $1/\log(X)$ from $1$ to $X$.\label{fig:G}}164516461647Gauss\index{Gauss, Carl Friedrich} was an inveterate computer:1648he wrote in his 1849 letter that1649there are $216{,}745$ prime numbers\index{prime number} less than three million. This is1650wrong: the actual number of these primes is $216{,}816$. Gauss's curve $G(X)$1651predicted that there would be $216{,}970$ primes---a miss, Gauss\index{Gauss, Carl Friedrich}1652thought, by1653$$225 = 216970 - 216745.$$1654But actually he was closer than he thought: the1655prediction of the curve $G(X)$ missed by a mere $154 = 216970 - 216816$.1656%; not as close as the 2004 US1657%elections, but pretty close nevertheless).1658Gauss's computation brings up two queries: will this spectacular ``good1659fit'' continue for arbitrarily large numbers? and, the (evidently1660prior) question: what counts as a good fit?1661166216631664\chapter{What is a ``good approximation''?}\label{sec:sqrterror}16651666If you are trying to estimate a number, say, around ten thousand, and1667you get it right to within a hundred, let us celebrate this kind of1668accuracy by saying that you have made an approximation with {\em1669square-root error}\index{square-root error} (${\sqrt{10{,}000}}=100$). Of course, we should1670really use the more clumsy phrase ``an approximation with at worst1671{\em square-root error.}''\index{square-root error} Sometimes we'll simply refer to such1672approximations as {\em good approximations.} If you are trying to1673estimate a number in the millions, and you get it right to within a1674thousand, let's agree that---again---you have made an approximation1675with {\em square-root error}\index{square-root error} (${\sqrt{1{,}000{,}000}}=1{,}000$).1676Again, for short, call this a {\em good approximation.} So, when Gauss\index{Gauss, Carl Friedrich}1677thought his curve missed by $226$ in estimating the number of primes1678less than three million, it was well within the margin we have given1679for a ``good approximation.''168016811682More generally, if you are trying to estimate a number that has $D$1683digits and you get it almost right, but with an error that has no more1684than, roughly, half that many digits, let us say, again, that you have1685made an approximation with {\em square-root error}\index{square-root error} or synonymously, a1686{\em good approximation}.168716881689This rough account almost suffices for what we will be discussing1690below, but to be more precise, the specific {\em gauge of accuracy}1691that will be important to us is not for a mere {\em single} estimate1692of a {\em single} error term,1693$$1694\text{ Error term } = \text{ Exact value } - \text{ Our ``good approximation" }1695$$1696but rather for {\em infinite sequences} of estimates of1697error terms. Generally, if you are interested in a numerical quantity1698$q(X)$ that depends on the real number parameter $X$ (e.g., $q(X)$1699could be $\pi(X)$, ``the number of primes $\leq{}X$'') and if you have an1700explicit candidate ``approximation,'' $q_{\rm approx}(X)$, to this1701quantity, let us say that $q_{\rm approx}(X)$ is {\bf essentially a1702square-root accurate\index{square-root accurate} approximation to $q(X)$} if for {\em any}1703given exponent greater than $0.5$ (you choose it: $0.501$, $0.5001$,1704$0.50001$, $\dots$ for example) and for large enough $X$---where the1705phrase ``large enough'' depends on your choice of exponent---the {\bf1706error term}---i.e., the difference between $q_{\rm approx}(X)$ and1707the {\em true} quantity, $q(X)$, is, in absolute value, less than1708$X$ raised to that exponent (e.g. $< X^{0.501}$, $<1709X^{0.5001}$, etc.). Readers who know calculus\index{calculus} and wish to have a1710technical formulation of this definition of {\em good approximation}1711might turn to the endnote \bibnote{If $f(x)$ and1712$g(x)$ are real-valued functions of a real variable $x$ such that1713for any1714$\epsilon >0$ both of them take their values between $ x^{1-\epsilon}$1715and $ x^{1+\epsilon}$ for1716$x$1717sufficiently large, then say that $f(x)$ and $g(x)$ are {\bf good1718approximations of one another} if,1719for any positive $\epsilon$ the absolute value of their difference1720is less than $ x^{{1\over17212}+\epsilon}$ for1722$x$1723sufficiently large. The functions $\Li(X)$ and $R(X)$ are good approximations of1724one another.} for a precise statement.172517261727If you found the above confusing, don't worry: again, a1728square-root accurate\index{square-root accurate} approximation is one in which at least roughly1729half the digits are correct.173017311732\begin{remark}1733To get a feel for how basic the notion of {\em approximation to data1734being square root close to the true values of the data} is---and1735how it represents the ``gold standard'' of accuracy for1736approximations, consider this fable.17371738Imagine that the devil had the idea of saddling a large committee of1739people with the task of finding values of $\pi(X)$ for various large1740numbers $X$. This he did in the following manner, having already1741worked out which numbers are prime himself. Since the devil is, as1742everyone knows, {\em in the details,} he has made no mistakes: his1743work is entirely correct. He gives each committee member a copy of1744the list of all {\em prime numbers}\index{prime number} between $1$ and one of the large1745numbers $X$ in which he was interested. Now each committee member1746would count the number of primes by doing nothing more than1747considering each number, in turn, on their list and tallying them1748up, much like a canvasser counting votes; the committee members needn't even know that these numbers are prime, they just think of these numbers as items on their list. But since they are human,1749they will indeed be making mistakes, say $1\%$ of the time.1750Assume further that it is just as likely for them to make the1751mistake of undercounting or overcounting. If many people are1752engaged in such a pursuit, some of them might overcount $\pi(X)$;1753some of them might undercount it. The average error (overcounted1754or undercounted) would be proportional to~${\sqrt X}$.17551756In the next chapter we'll view these undercounts and overcounts as analogous to a random walk\index{random walk}.175717581759\end{remark}176017611762\chapter{Square root error and random walk\index{random walk}s}17631764To take a random walk\index{random walk} along a (straight) east--west path you would start at your home base, but every minute, say, take a step along the path, each step being of the same length, but randomly either east or west. After $X$ minutes, how far are you from your home base?17651766The answer to this {\it cannot} be a specific number, precisely because you're making a random decision that affects that number for each of the $X$ minutes of your journey. It is more reasonable to ask a statistical version of that question. Namely, if you took {\it many} random walks\index{random walk} $X$ minutes long, then---on the average---how far would you be from your home base? The answer, as is illustrated by the figures below, is that the average distance you will find yourself from home base after (sufficiently many of) these excursions is proportional to ${\sqrt X}$. (In fact, the average is equal to $\sqrt{\frac{2}{\pi}}\cdot {\sqrt X}$.)17671768To connect this with the committee members' histories of errors, described in the fable in Chapter~\ref{sec:sqrterror}, imagine every error (undercount or overcount by $1$) the committee member makes, as a ``step" East for undercount and West for overcount. Then if such errors were made, at a constant frequency over the duration of time spent counting, and if the over and undercounts were equally likely and random, then one can model the committee members' computational accuracy by a random walk.\index{random walk} It would be---in the terminology we have already discussed---no better than {\it square-root accurate}\index{square-root accurate}; it would be subject to {\it square-root error\index{square-root error}}.17691770To get a real sense of how constrained random walks\index{random walk} are by this ``square-root law," here are a few numerical experiments of random walks. The left-hand squiggly (blue) graphs in Figures~\ref{fig:random_walks_3}--\ref{fig:random_walks_1000} below are computer-obtained random walk trials (three, ten, a hundred, and a thousand random walks). The blue curve in the right-hand graphs of those four figures is the average distance from home-base of the corresponding (three, ten, a hundred, and a thousand) random walks.1771The red curve in each figure below is the graph of the quantity $\sqrt{\frac{2}{\pi}}\cdot {\sqrt X}$ over the $X$-axis.1772As the number of random walks\index{random walk} increases, the red curve1773better and better approximates the average distance.17741775\illtwo{random_walks-3}{random_walks-3-mean}{.45}{Three Random Walk\index{random walk}s\label{fig:random_walks_3}}17761777\illtwo{random_walks-10}{random_walks-10-mean}{.45}{Ten Random Walk\index{random walk}s\label{fig:random_walks_10}}17781779\illtwo{random_walks-100}{random_walks-100-mean}{.45}{One Hundred Random Walk\index{random walk}s\label{fig:random_walks_100}}17801781\illtwo{random_walks-1000}{random_walks-1000-mean}{.45}{One Thousand Random Walk\index{random walk}s\label{fig:random_walks_1000}}178217831784178517861787\chapter[What is Riemann's Hypothesis?] { What is Riemann's Hypothesis? ({\em first formulation})\label{sec:rh1}}1788178917901791Recall from Chapter~\ref{sec:firstguess} that a rough guess for an approximation to1792$\pi(X)$, the number of primes $\leq{} X$, is given by the function1793$X/\log(X)$. Recall, as well, that a refinement of that guess, offered by Gauss\index{Gauss, Carl Friedrich}, stems from this curious thought: the ``probability'' that a number $N$ is a prime is proportional to the reciprocal of its number of digits; more precisely, the probability is $1/\log(N)$. This would lead us to guess that the approximate value of $\pi(X)$ would be1794the area of the region from $2$ to $X$ under the graph of1795$1/\log(X)$, a quantity sometimes referred to as $\Li(X)$.1796``Li'' (pronounced L$\overline{\rm i}$, so the same as ``lie" in ``lie down")1797is short for {\em logarithmic integral},1798because the area of the region from $2$ to $X$ under $1/\log(X)$1799is (by definition) the {\em integral} $\int_2^X 1/\log(t) dt$.18001801180218031804Figure~\ref{fig:threeplots} contains a graph of the three functions1805$\Li(X)$, $\pi(X)$, and $X/\log X$ for $X\leq 200$.1806But data, no matter how impressive, may be deceiving (as we learned in1807Chapter~\ref{ch:further}). If you think1808that the three graphs never cross for all large values of $X$, and1809that we have the simple relationship $X/\log(X) < \pi(X) < \Li(X)$ for1810large $X$, read \url{http://en.wikipedia.org/wiki/Skewes'_number}.181118121813\ill{three_plots}{.9}{Plots of $\Li(X)$ (top), $\pi(X)$ (in the middle), and $X/\log(X)$ (bottom).\label{fig:threeplots}}18141815It is a {\em major challenge} to1816evaluate $\pi(X)$ for large values of $X$.1817For example, let $X=10^{24}$.1818Then \label{pili_vals} (see \bibnote{This computation of $\pi(X)$ was done by David J. Platt in 2012,1819and is the largest value of $\pi(X)$ ever computed. See1820\url{http://arxiv.org/abs/1203.5712} for more details.}) we have:1821\begin{align*}1822\pi(X) &= {\color{dredcolor} 18{,}435{,}599{,}767{,}3}49{,}200{,}867{,}866\\1823\Li(X) &= {\color{dredcolor}18{,}435{,}599{,}767{,}3}66{,}347{,}775{,}143.10580\ldots \\1824X/(\log(X)-1) &=182518{,}429{,}088{,}896{,}563{,}917{,}716{,}962.93869\ldots\\1826\Li(X) - \pi(X) &= \hspace{7em}182717{,}146{,}907{,}277.105803\ldots\\1828\sqrt{X}\cdot \log(X) &= \hspace{5.2em}182955{,}262{,}042{,}231{,}857.096416 \ldots1830\end{align*}1831Note that several of the left-most digits of $\pi(X)$ and $\Li(X)$ are the1832same (as indicated in red), a point we will return to on page~\pageref{page:leftmost}.183318341835More fancifully, we can think of the error in this approximation to $\pi(X)$, i.e., $|\Li(X)-\pi(X)|$, (the absolute value) of the difference between $\Li(X)$ and $\pi(X)$, as (approximately) the result of a walk having roughly $X$ steps where you move by the following rule: go east by a distance of $1/\log N$ feet if $N$ is not a prime and west by a distance of $1-\frac{1}{\log N}$ feet if $N$ is a prime. Your distance, then, from home base after $X$ steps is approximately $|\Li(X)-\pi(X)|$ feet.1836%% Computer code to double check the above claim without using1837%% your brain (will be useful for a SageMath version of the book):1838% def f(X):1839% t = 01840% from math import log1841% for N in [2..floor(X)]:1842% if is_prime(N):1843% t -= 1-1/log(N)1844% else:1845% t += 1/log(N)1846% return t1847%1848% def g(X):1849% return N(Li(X) - prime_pi(X))185018511852We have no idea if this picture of things resembles a truly {\it random walk}\index{random walk} but at least it makes it reasonable to ask the question: is $\Li(X)$ {\it essentially a square root approximation} to1853$\pi(X)$? Our first formulation of Riemann's1854Hypothesis says yes:18551856\begin{center}1857\shadowbox{ \begin{minipage}{0.9\textwidth}1858\mbox{} \vspace{0.2ex}1859\begin{center}{\bf\large The {\bf \RH{}} (first formulation)}\end{center}1860\medskip18611862For any real number $X$ the number of prime numbers\index{prime number} less than $X$ is1863approximately $\Li(X)$ and this approximation is essentially square1864root accurate (see \bibnote{In fact, the Riemann Hypothesis is equivalent to $|\Li(X) - \pi(X)| \leq \sqrt{X}\log(X)$ for all $X\geq 2.01$. See Section 1.4.1 of Crandall and Pomerance's book {\em Prime Numbers: A Computational Perspective}.}).1865\label{rh:first}18661867\vspace{1ex}1868\end{minipage}}1869\end{center}18701871\chapter{The mystery moves to the error term}18721873Let's think of what happens when you have a {\it mysterious quantity} (say, a function of a real number $X$) you wish to understand. Suppose you manage to approximate that quantity by an easy to understand expression---which we'll call the ``dominant term"---that is also simple to compute, but only approximates your mysterious quantity. The approximation is not exact; it has a possible error, which happily is significantly smaller than the size of the dominant term. ``Dominant" here just means exactly that: it is of size significantly larger than the error of approximation.18741875$${\it Mysterious\ quantity}(X)\ \ = \ \ {\it Simple, \ but \ dominant\ quantity}(X) \ + \ {\it Error}(X).$$18761877If all you are after is a general estimate of {\it size} your job is done. You might declare victory and ignore ${\it Error}(X)$ as being---in size---negligible. But if you are interested in the deep structure of your {\it mysterious quantity} perhaps all you have done is to manage to transport most questions about it to ${\it Error}(X)$. In conclusion, ${\it Error}(X)$ is now your new mysterious quantity.18781879Returning to the issue of $\pi(X)$ (our {\it mysterious quantity}) and $\Li(X)$ (our {\it dominant term}) the first formulation of the Riemann Hypothesis (as in Chapter \ref{sec:rh1} above) puts the spotlight on the {\it Error\ term} $|\Li(X) - \pi(X)|$, which therefore deserves our scrutiny, since---after all---we're not interested in merely counting the primes: we want to understand as much as we can of their structure.18801881To get a feel for this error term, we shall smooth it out a bit, and look at a few of its graphs.188218831884\chapter{Ces\`aro smoothing\index{Ces\`aro Smoothing}}\label{ces}18851886Often when you hop in a car and reset the trip counter, the car1887will display information about your trip. For instance, it might show you the1888average speed up until now, a number that is1889``sticky'', changing much less erratically than your1890actual speed, and you might use it to make a rough estimate of1891how long until you will reach your destination.1892Your car is computing the {\em Ces\`aro smoothing\index{Ces\`aro Smoothing} }of your speed.1893We can use this same idea to better understand the behavior1894of other things, such as the sums appearing in1895the previous chapter.1896\ill{camaro}{.7}{The ``Average Vehicle Speed,'' as displayed on the dashboard of the 2013 Camaro SS car that one of us drove during the writing of this chapter.}1897%I took this picture of my rental car -- William Stein.18981899Suppose you are trying to say something intelligent about the behavior of a certain quantity that varies with time in what seems to be a somewhat erratic, volatile pattern. Call the quantity $f(t)$ and think of it as a function defined for positive real values of ``time'' $t$. The natural impulse might be to take some sort of ``average value''\footnote{For readers who know calculus: that average would be ${\frac{1}{T}}\int_0^Tf(t)dt$.} of $f(t)$ over a time interval, say from $0$ to $T$. This would indeed be an intelligent thing to do if this average stabilized and depended relatively little on the interval of time over which one is computing this average, e.g., if that interval were large enough. Sometimes, of course, these averages themselves are relatively sensitive to the times $T$ that are chosen, and don't stabilize. In such a case, it usually pays to consider {\it all} these averages as your chosen $T$ varies, as a function (of $T$) in its own right{\footnote{ For readers who know calculus: this is1900$$F(T):= {\frac{1}{T}}\int_0^Tf(t)dt.$$}.1901This new function $F(T)$, called the {\bf Ces\`aro Smoothing}\index{Ces\`aro Smoothing} of the function $f(t)$, is a good indicator of certain eventual trends of the original function $f(t)$ that is less visible directly from the function $f(t)$. The effect of passing from $f(t)$ to $F(T)$ is to ``smooth out" some of the volatile local behavior of the original function, as can be seen in1902Figure~\ref{fig:cesaro-smoothing}.190319041905\ill{cesaro}{.9}{The red plot is the Ces\`aro smoothing\index{Ces\`aro Smoothing} of the blue plot\label{fig:cesaro-smoothing}}1906\chapter{ A view of $|\Li(X) - \pi(X)|$}1907Returning to our mysterious error term, consider Figure \ref{fig:li-minus-pi-250000} where the volatile blue curve in the middle1908is $\Li(X) - \pi(X)$, its Ces\`aro smoothing\index{Ces\`aro Smoothing} is the red relatively smooth curve on the bottom, and the curve on the top1909is the graph of $\sqrt{\frac{2}{\pi}}\cdot \sqrt{X/\log(X)}$ over the range of $X\le 250{,}000$.19101911\ill{li-minus-pi-250000}{.9}{$\Li(X)-\pi(X)$ (blue middle), its Ces\`aro smoothing\index{Ces\`aro Smoothing} (red bottom), and1912$\sqrt{\frac{2}{\pi}}\cdot \sqrt{X/\log(X)}$ (top), all for $X\leq 250{,}000$\label{fig:li-minus-pi-250000}}19131914% TODO: this is from http://seriesdivergentes.wordpress.com/2011/07/21/john-e-littlewood\index{Littlewood, John Edensor }/1915Data such as this graph can be revealing and misleading at the same time. For example, the volatile blue graph (of $\Li(X)-\pi(X)$) {\it seems} to be roughly sandwiched between two rising functions, but this will not continue for all values of $X$. The Cambridge University mathematician, John Edensor Littlewood\index{Littlewood, John Edensor } (see1916Figure~\ref{fig:littlewood}),\index{Littlewood, John Edensor } proved in 1914 that there {\it exists} a real number $X$ for which the quantity $\Li(X)-\pi(X)$ vanishes, and then for slightly larger $X$ crosses over into negative values. This theorem attracted lots of attention at the time (and continues to do so) because of the inaccessibility of achieving good estimates (upper or lower bounds) for the first such number. That ``first" $X$ for which $\Li(X) =\pi(X)$ is called {\bf Skewes Number} in honor of the South African mathematician (a student of Littlewood) Stanley Skewes, who (in 1933) gave the first---fearfully large---upper bound for that number (conditional on the Riemann Hypothesis!). Despite a steady stream of subsequent improvements, we currently can only locate Skewes Number\index{Skewes Number} as being in the range:19171918$$191910^{14}\ \ \ \le\ \ \ {\rm Skewes\ Number}\ \ \ <\ \ \ 10^{317},1920$$ and it is proven that at some values of $X$ fairly close to the upper bound $10^{317}$ $\pi(X)$ is greater than $\Li(X)$. So the trend suggested in Figure \ref{fig:li-minus-pi-250000} will not continue indefinitely.19211922%\begin{wrapfigure}{l}{0.3\textwidth}1923% \includegraphics[width=0.3\textwidth]{illustrations/littlewood\index{Littlewood, John Edensor }}\\1924% John Edensor Littlewood\index{Littlewood, John Edensor } (1885--1977)1925%\end{wrapfigure}19261927\ill{littlewood}{0.35}{John\index{Littlewood, John Edensor } Edensor Littlewood (1885--1977)\label{fig:littlewood}}19281929193019311932\chapter{The Prime Number Theorem\label{sec:pnt}}\index{prime number}19331934Take a look at Figure~\ref{fig:threeplots} again.1935All three functions, $\Li(X)$, $\pi(X)$ and $X/\log(X)$1936are ``going to infinity with $X$'' (this means1937that for any real number $R$, for all sufficiently large $X$,1938the values of these functions at $X$ exceed $R$).19391940Are these functions ``going to infinity'' at {\it the same rate}?19411942To answer such a question, we have to know what we mean by {\it going1943to infinity at the same rate}. So, here's a definition. Two1944functions, $A(X)$ and $B(X)$, that each go to infinity will be said to1945{\bf go to infinity at the same rate} if their {\it ratio}1946$$A(X)/B(X)$$1947tends to $1$ as $X$ goes to infinity.19481949If for example two functions, $A(X)$ and $B(X)$ that take positive whole number values, have the same number of digits for large $X$ and if, for any1950number you give us, say: a million (or a billion, or a trillion) and if $X$ is large enough, then the1951``leftmost'' million (or billion, or trillion) digits of $A(X)$ and $B(X)$ are the same, then $A(X)$ and $B(X)$ {\it go to infinity at the same rate}. For example,1952$$1953\frac{A(X)}{B(X)} =1954\frac{{\color{dredcolor}2810675971}83743525105611755423}{{\color{dredcolor}2810675971}61361511527766294585}1955= 1.00000000007963213762060\ldots1956$$195719581959While we're defining things, let us say that two functions, $A(X)$1960and $B(X)$, that each go to infinity {\bf go to infinity at1961similar rates} if there are two positive constants $c$ and $C$1962such that for $X$ sufficiently large the {\it ratio}1963$$1964A(X)/B(X)1965$$1966is greater than $c$ and smaller than $C$.19671968\ill{similar_rates}{1.0}{The polynomials $A(X)=2 X^{2} + 3 X - 5$ (bottom)1969and $B(X)= 3 X^{2} - 2 X + 1$ (top) go to infinity at similar rates.\label{fig:simrates}}1970197119721973For example, two polynomials in $X$ with positive leading coefficients1974{\it go to infinity at the same rate} if and only if they have the1975same degrees and the same leading coefficient; they {\it go to1976infinity at similar rates} if they have the same degree.1977See Figures~\ref{fig:simrates} and~\ref{fig:samerate}. %where we may take $c=1/2$ and $C=1$.19781979\ill{same_rates}{1.0}{The polynomials $A(X)=X^{2} + 3 X - 5$ (top)1980and $B(X)= X^{2} - 2 X + 1$ (bottom) go to infinity1981at the same rate.\label{fig:samerate}}1982198319841985Now a theorem from elementary calculus\index{calculus} tells us that the ratio of1986$\Li(X)$ to $X/\log(X)$ tends to $1$ as $X$ gets larger and larger.1987That is---using the definition we've just introduced---$\Li(X)$ and1988$X/\log(X)$ go to infinity at the same rate (see1989\bibnote{For a proof of this here's a hint. Compute the difference1990between the derivatives of $\Li(x)$ and of $x/\log x$. The answer1991is $1/\log^2(x)$. So you must show that the ratio of1992$\int_2^X dx/\log^2(x)$ to $\Li(x)= \int_2^Xdx/\log(x)$1993tends to zero as~$x$ goes to infinity, and this is a good1994calculus exercise.}).199519961997Recall (on page~\pageref{pili_vals} above in Chapter~\ref{sec:rh1}) that1998if $X = 10^{24}$, the left-most\label{page:leftmost}1999twelve digits of $\pi(X)$ and $\Li(X)$ are the same:2000both numbers start2001$18{,}435{,}599{,}767{,}3\ldots$. Well, that's a good start. Can we guarantee that for $X$ large enough, the ``left-most'' million (or billion, or trillion) digits of $\pi(X)$ and $\Li(X)$ are the same, i.e., that these two functions go to infinity at the same rate?20022003The \RH{}, as we have just formulated it, would tell us2004that the {\it difference} between $\Li(X)$ and $\pi(X)$ is pretty small2005in comparison with the size of $X$. This information would imply (but2006would be {\it much} more precise information than) the statement that2007the {\it ratio} $\Li(X)/\pi(X)$ tends to $1$, i.e., that $\Li(X)$ and2008$\pi(X)$ go to infinity at the same rate.20092010This last statement gives, of course, a far less precise relationship2011between $\Li(X)$ and $\pi(X)$ than the \RH{} (once it is proved!)2012would give us. The advantage, though, of the less precise statement2013is that it is currently known to be true, and---in fact---has been2014known for over a century. It goes under the name of201520162017\noindent {\bf The Prime Number Theorem:\ \ } $\Li(X)$ and $\pi(X)$ go to infinity at the same rate.\index{prime number}201820192020Since $\Li(X)$ and $X/\log(X)$ go to infinity at the same rate, we could2021equally well have expressed the ``same'' theorem by saying:202220232024\noindent {\bf The Prime Number Theorem:\ \ } $X/\log(X)$ and $\pi(X)$ go to infinity at the same rate.\index{prime number}202520262027This fact is a very hard-won piece of mathematics! It was proved2028in 1896 independently by Jacques Hadamard\index{Hadamard, Jacques} and Charles de la Vall\'{e}e Poussin.\index{de la Vall\'{e}e Poussin, Charles}202920302031A milestone in the history leading up to the proof of the Prime Number\index{prime number}2032Theorem is the earlier work of Pafnuty Lvovich Chebyshev\index{Chebyshev, Pafnuty Lvovich} (see2033\url{http://en.wikipedia.org/wiki/Chebyshev_function})\index{Chebyshev, Pafnuty Lvovich} showing that2034(to use the terminology we introduced) $X/\log(X)$ and $\pi(X)$ go2035to infinity at similar rates.203620372038The elusive \RH{}, however, is much deeper than the Prime2039Number Theorem, and takes its origin from some awe-inspiring,2040difficult to interpret, lines in Bernhard Riemann's\index{Riemann, Bernhard} magnificent 8-page2041paper, ``On the number of primes less than a given magnitude,''2042published in 18592043(see \bibnote{See \url{http://www.maths.tcd.ie/pub/HistMath/People/Riemann/Zeta/}2044for the original German version and an English translation.}).204520462047\ill{riemann_zoom}{1}{From a manuscript of Riemann's 1859 paper written by a contem- porary of Riemann (possibly the mathematician Alfred Clebsch). \label{fig:riemamn}}20482049Riemann's hypothesis, as it is currently interpreted, turns up as2050relevant, as a key, again and again in different parts of the subject:2051if you accept it as {\em hypothesis} you have an immensely powerful2052tool at your disposal: a mathematical magnifying glass that sharpens2053our focus on number theory. But it also has a wonderful protean2054quality---there are many ways of formulating it, any of these2055formulations being provably equivalent to any of the others.20562057\ill{riemann}{.3}{Bernhard Riemann (1826--1866)\index{Riemann, Bernhard}}205820592060The \RH{} remains unproved to this day, and therefore is ``only a2061hypothesis,'' as Osiander said of Copernicus's theory, but one for2062which we have overwhelming theoretical and numerical evidence in its2063support. It is the kind of conjecture that contemporary Dutch2064mathematician Frans Oort might label a {\em suffusing conjecture} in2065that it has unusually broad implications: many, many results are now2066known to follow, if the conjecture, familiarly known as RH, is true.2067A proof of RH would, therefore, fall into the {\em applied} category,2068given our discussion above in Chapter~\ref{ch:pureapplied}. But2069however you classify RH, it is a central concern in mathematics to2070find its proof (or, a counter-example!). RH is one of the weightiest2071statements in all of mathematics.207220732074\chapter[The staircase of primes]{The {\em information} contained in the staircase of primes\label{sec:information}}2075207620772078We have borrowed the phrase ``staircase of primes'' from the popular2079book {\em The Music of the Primes} by Marcus du Sautoy, for we feel that2080it captures the sense that there is a deeply hidden architecture to2081the graphs that compile the number of primes (up to $N$) and also2082because---in a bit---we will be tinkering with this carpentry. Before2083we do so, though, let us review in Figure~\ref{fig:staircases}2084what this staircase looks like for different ranges.20852086\begin{figure}[H]2087\centering2088\includegraphics[width=.4\textwidth]{illustrations/PN_25}2089\includegraphics[width=.4\textwidth]{illustrations/PN_100}\\20902091\includegraphics[width=.4\textwidth]{illustrations/PN_1000}2092\includegraphics[width=.4\textwidth]{illustrations/PN_10000}\\20932094\includegraphics[width=.8\textwidth]{illustrations/PN_100000}209520962097\caption{The Staircase of Primes\label{fig:staircases}}2098\end{figure}20992100The mystery of this staircase is that the {\em information} contained2101within it is---in effect---the full story of where the primes are2102placed. This story seems to elude any simple description. Can we2103``tinker with'' this staircase without destroying this valuable2104information?21052106210721082109\chapter[Tinkering with the staircase of primes]{Tinkering with the carpentry of the staircase of primes\label{sec:tinkering}}211021112112For starters, notice that all the (vertical) {\em risers} of this staircase (Figure~\ref{fig:staircases} above) have2113unit height. That is, they contain no numerical information except for2114their placement on the $x$-axis. So, we could distort our staircase by2115changing (in any way we please) the height of each riser; and as long2116as we haven't brought new risers into---or old risers out2117of---existence, and have not modified their position over the2118$x$-axis, we have retained all the information of our original2119staircase.212021212122A more drastic-sounding thing we could do is to judiciously add new2123steps to our staircase. At present, we have a step at each prime2124number $p$, and no step anywhere else. Suppose we built a staircase2125with a new step not only at $x=p$ for $p$ each prime number\index{prime number} but also at2126$x =1$ and $x=p^n$ where $p^n$ runs through all powers of prime numbers as2127well. Such a staircase would have, indeed, many more steps than our2128original staircase had, but, nevertheless, would retain much of the2129quality of the old staircase: namely it contains within it the full2130story of the placement of primes {\em and their powers}.21312132A final thing we can do is to perform a distortion of the $x$-axis2133(elongating or shortening it, as we wish) in any specific way, as long2134as we can perform the inverse process, and ``undistort'' it if we wish.2135Clearly such an operation may have mangled the staircase, but hasn't destroyed2136information irretrievably.21372138We shall perform all three of these kinds of operations eventually,2139and will see some great surprises as a result. But for now, we will2140perform distortions only of the first two types. We are about to2141build a new staircase that retains the precious information we need,2142but is constructed according to the following architectural plan.21432144\begin{itemize}21452146\item We first build a staircase that has a new step precisely at $x2147=1$, and $ x= p^n$ for every {\em prime power} $p^n$ with $n\geq21481$. That is, there will be a new step at $x= 1,2,3,4,5,7,8,9,11,2149\dots$21502151\item Our staircase starts on the ground at $x=0$ and the height of the2152riser of the step at $x=1$ will be $\log(2\pi)$. The height of the2153riser of the step at $x=p^n$ will not be $1$2154(as was the height of all risers in the old staircase of primes)2155but rather: the step at $x=p^n$ will have the height of its riser2156equal to $\log p$. So for the first few steps listed in the2157previous item, the risers will be of height $\log(2\pi), \log21582,\log 3,\log 2,\log 5,\log 7, \log 2,\log 3,\log 11, \dots$2159Since $\log(p)>1$,2160these vertical dimensions lead to a steeper ascent but no great loss2161of {\em information}.21622163Although we are not quite done with our architectural work, Figure~\ref{fig:psi} shows2164what our new staircase looks like, so far.21652166\illtwoh{psi_9}{psi_100}{.27}{The newly constructed staircase that counts prime powers\label{fig:psi}}216721682169\end{itemize}21702171Notice that this new staircase looks, from afar, as if it were2172nicely approximated by the $45$ degree straight line, i.e., by the2173simple function $X$. In fact, we have---by this new2174architecture---a second {\em equivalent} way of formulating2175Riemann's hypothesis. For this, let $\psi(X)$ denote the2176function2177of $X$ whose graph is2178depicted in Figure~\ref{fig:psi}2179(see \bibnote{We have $$\psi(X) =\sum_{p^n \le X}\log p$$ where the summation is over prime2180powers $p^n$ that are $\le X$.}).21812182\begin{center}2183\shadowbox{ \begin{minipage}{0.9\textwidth}2184\mbox{} \vspace{0.2ex}2185\begin{center}{\bf\large The {\bf \RH{}} (second formulation)}\end{center}2186\medskip21872188This new staircase is essentially square root close to the 45 degree2189straight line; i.e., the function $\psi(X)$ is essentially square root2190close to the function $f(X)=X$. See Figure~\ref{fig:psi_diag_1000}.2191\vspace{1ex}2192\end{minipage}}2193\end{center}21942195\ill{psi_diag_1000}{0.75}{The newly constructed staircase is close to the 45 degree line.\label{fig:psi_diag_1000}}21962197Do not worry if you do not understand why our first and second2198formulations of Riemann's Hypothesis are equivalent. Our aim, in2199offering the second formulation---a way of phrasing Riemann's\index{Riemann, Bernhard} guess2200that mathematicians know to be equivalent to the first one---is to2201celebrate the variety of equivalent ways we have to express Riemann's2202\index{Riemann, Bernhard} proposed answers to the question ``How many primes are there?'', and to2203point out that some formulations would reveal a startling2204simplicity---not immediately apparent---to the behavior of prime2205numbers, no matter how erratic primes initially appear to us to2206be. After all, what could be simpler than a 45 degree straight line?22072208220922102211\chapter[Computer music files and prime numbers]{What do computer music files,2212data compression, and prime numbers have to do with each other?\label{sec:fourier1}}\index{prime number}22132214221522162217Sounds of all sorts---and in particular the sounds of music---travel2218as vibrations of air molecules at roughly 768 miles per hour. These2219vibrations---fluctuations of pressure---are often represented, or2220``pictured,'' by a graph where the horizontal axis corresponds to2221time, and the vertical axis corresponds to pressure at that time. The2222very purest of sounds---a single sustained note---would look2223something like this (called a ``sine wave'') when pictured (see2224Figure~\ref{fig:sine}), so that if you fixed your position somewhere2225and measured air pressure due to this sound at that position, the2226peaks correspond to the times when the varying air pressure is maximal2227or minimal and the zeroes to the times when it is normal pressure.22282229\ill{sin}{.7}{Graph of a sine wave\label{fig:sine}}22302231You'll notice that there are two features to the graph in Figure~\ref{fig:sine}.22322233\begin{enumerate}2234\item {\em The height of the peaks of this sine wave:} This height is2235referred to as the {\bf amplitude} and corresponds to the {\em2236loudness} of the sound.2237\item {\em The number of peaks per second:} This number is referred to2238as the {\bf frequency} and corresponds to the {\em pitch} of the2239sound.2240\end{enumerate}224122422243Of course, music is rarely---perhaps never---just given by a single2244pure sustained note and nothing else. A next most simple example of a2245sound would be a simple chord (say a C and an E played together on2246some electronic instrument that could approximate pure notes). Its2247graph would be just the {\em sum} of the graphs of each of the pure2248notes (see Figures~\ref{fig:sinetwofreq} and \ref{fig:sinetwofreqsum}).224922502251\ill{sin-twofreq}{0.6}{Graph of two sine waves with different frequencies\label{fig:sinetwofreq}}22522253\ill{sin-twofreq-sum}{.6}{Graph of sum of the two sine waves with different frequencies\label{fig:sinetwofreqsum}}22542255So the picture of the changing frequencies of this chord would be2256already a pretty complicated configuration. What we have described in2257these graphs are two sine waves (our C and our E) when they are played2258{\em in phase} (meaning they start at the same time) but we could2259also ``delay'' the onset of the E note and play them with some2260different phase relationship, for example, as illustrated2261in Figures~\ref{fig:sin-twofreq-phase} and \ref{fig:sum-sin-phase}.22622263\ill{sin-twofreq-phase}{.6}{\label{fig:sin-twofreq-phase}Graph of two ``sine'' waves with different phases.}22642265\ill{sin-twofreq-phase-sum}{.6}{Graph of the sum of the two ``sine'' waves with different frequencies2266and phases.\label{fig:sum-sin-phase}}226722682269So, {\em all you need} to reconstruct the chord graphed above is to2270know five numbers:2271\begin{itemize}2272\item the two frequencies---the collection of frequencies that make2273up the sound is called the {\em spectrum}\index{spectrum} of the sound,2274\item the two {\em amplitudes} of each of these two frequencies,2275\item the {\em phase} between them.22762277\end{itemize}22782279Now suppose you came across such a sound as pictured in2280Figure~\ref{fig:sum-sin-phase} and wanted to ``record it.'' Well,2281one way would be to sample the amplitude of the sound at many2282different times, as for example in Figure~\ref{fig:sum-sin-phase-sample}.22832284\ill{sin-twofreq-phase-sum-points}{.6}{Graph of sampling of a sound wave\label{fig:sum-sin-phase-sample}}22852286Then, fill in the rest of the points to obtain Figure~\ref{fig:sum-sin-phase-sample-fill}.22872288\ill{sin-twofreq-phase-sum-fill}{0.6}{Graph obtained from Figure~\ref{fig:sum-sin-phase-sample} by2289filling in the rest of the points\label{fig:sum-sin-phase-sample-fill}}22902291But this sampling would take an enormous amount of storage space, at2292least compared to storing five numbers, as explained above!2293Current audio compact discs do their sampling 44,100 times a second to2294get a reasonable quality of sound.22952296Another way is to simply record the {\em five} numbers: the {\em2297spectrum\index{spectrum}, amplitudes,} and {\em phase}. Surprisingly, this seems to2298be roughly the way our ear processes such a sound when we hear it.\footnote{%2299We recommend downloading Dave Benson's marvelous book2300{\em Music: A Mathematical Offering} from2301\url{https://homepages.abdn.ac.uk/mth192/pages/html/maths-music.html}.2302This is free, and gives a beautiful account of the superb2303mechanism of hearing, and of the mathematics of music.}23042305Even in this simplest of examples (our pure chord: the pure note C2306played simultaneously with pure note E) the {\em efficiency of data2307compression} that is the immediate bonus of analyzing the picture2308of the chords as built {\em just} with the five numbers giving {\em2309spectrum\index{spectrum}, amplitudes,} and {\em phase} is staggering.23102311\ill{fourier}{0.3}{Jean Baptiste Joseph Fourier (1768--1830)}23122313This type of analysis, in general, is called {\em Fourier Analysis}2314and is one of the glorious chapters of mathematics. One way of2315picturing {\em spectrum}\index{spectrum} and {\em amplitudes} of a sound is by a bar2316graph which might be called the {\em spectral picture} of the sound,2317the horizontal axis depicting frequency and the vertical one depicting2318amplitude: the height of a bar at any frequency is proportional to the2319amplitude of that frequency ``in'' the sound.23202321So our CE chord would have the spectral picture in2322Figure~\ref{fig:ce-spectral}.232323242325\illtwo{sound-ce-general_sum}{sound-ce-general_sum-blips}{.45}2326{Spectral picture of a CE chord\label{fig:ce-spectral}}232723282329This spectral picture ignores the phase but is nevertheless a very2330good portrait of the sound. The spectral picture of a graph gets us2331to think of that graph as ``built up by the superposition of a bunch2332of pure waves,'' and if the graph is complicated enough we may very well2333need {\em infinitely} many pure waves to build it up! Fourier analysis\index{Fourier analysis} is a2334mathematical theory that allows us to start with any graph---we are2335thinking here of graphs that picture sounds, but any graph will2336do---and actually compute its spectral picture (and even keep track of2337phases).233823392340The operation that starts with a graph and goes to its spectral2341picture that records the frequencies, amplitudes, and phases of the2342pure sine waves that, together, compose the graph is called the {\em2343Fourier transform} and nowadays there are very fast procedures for2344getting accurate {\em Fourier transforms} (meaning accurate spectral2345pictures including information about phases) by2346computer \bibnote{See \url{http://en.wikipedia.org/wiki/Fast_Fourier_transform}}.234723482349%If you pass to the worksheet at this point, you'll be able to see the2350%Fourier transforms of many different sounds, and experiment for2351%yourself.2352The theory behind this operation (Fourier transform giving2353us a spectral analysis of a graph) is quite beautiful, but equally2354impressive is how---given the power of modern computation---you can2355immediately perform this operation for yourself to get a sense of how2356different wave-sounds can be constructed from the superposition of2357pure tones.23582359The {\em sawtooth} wave in Figure~\ref{fig:sawtooth} has a spectral picture, its Fourier transform, given in Figure~\ref{fig:sawtooth-spectrum}:23602361\ill{sawtooth}{.7}{Graph of sawtooth wave\label{fig:sawtooth}}2362\ill{sawtooth-spectrum}{.7}{The Spectrum\index{spectrum} of the sawtooth wave has a spike of height $1/k$ at2363each integer $k$\label{fig:sawtooth-spectrum}}2364236523662367\ill{complicated-wave}{.7}{A Complicated sound wave\label{fig:complicated-wave}}23682369Suppose you have a complicated sound wave, say as in2370Figure~\ref{fig:complicated-wave}, and you want to record it.2371Standard audio CDs record their data by intensive sampling as we2372mentioned. In contrast, current MP3 audio compression technology uses2373Fourier transforms plus sophisticated algorithms based on2374knowledge of which frequencies the human ear can hear.2375With this, MP3 technology manages to2376get a compression factor of 8--12 with little {\em perceived} loss in2377quality, so that you can fit your entire music collection on your2378phone, instead of just a few of your favorite CDs.23792380\chapter{The word ``Spectrum"}23812382% TODO: this is from wikipedia -- requires attribution.2383\begin{wrapfigure}{r}{0.2\textwidth}2384\includegraphics[width=0.2\textwidth]{illustrations/rainbow}2385\end{wrapfigure}23862387It is worth noting how often this word appears in scientific literature, with an array of different uses and meanings. It comes from Latin where its meaning is ``image," or ``appearance," related to the verb meaning {\it to look} (the older form being {\it specere} and the later form {\it spectare}). In most of its meanings, nowadays, it has to do with some procedure, an analysis, allowing one to see clearly the constituent parts of something to be analyzed, these constituent parts often organized in some continuous scale, such as in the discussion of the previous chapter.23882389The Oxford Dictionary lists, as one of its many uses:23902391\begin{quote} Used to classify something, or suggest that it can be classified, in terms of its position on a scale between two extreme or opposite points.\end{quote}23922393This works well for the color spectrum\index{spectrum}, as initiated by Newton (as in the figure above, sunlight is separated by a prism into a rainbow continuum of colors): an analysis of white light into its components. Or in mass spectrometry, where2394beams of ions are separated (analyzed) according to their2395mass/charge ratio and the mass spectrum\index{spectrum} is recorded on a photographic plate or film. Or in the recording of the various component frequencies, with their corresponding intensities of some audible phenomenon.23962397In mathematics the word has found its use in many different fields, the most basic use occurring in Fourier analysis\index{Fourier analysis}, which has as its goal the aim of either {\it analyzing} a function $f(t)$ as being comprised of simpler (specifically: sine and cosine) functions, or {\it synthesizing} such a function by combining simpler functions to produce it. The understanding here and in the previous chapter, is that an analysis of $f(t)$ as built up of simpler functions is meant to provide a significantly clearer image of the constitution of $f(t)$. If, to take a very particular example, the simpler functions that are needed for the synthesis of $f(t)$ are of the form $a\cos(\theta t)$ (for $a$ some real number which is the {\it amplitude} or size of the peaks of this periodic function)---and if $f(t)$ is given as the limit of $$a_1\cos(\theta_1t) + a_2\cos(\theta_2t) + a_3cos(\theta_3t) \dots$$ for a sequence of real numbers $\theta_1, \theta_2,2398\theta_3,2399\dots$ (i.e., these simpler functions are functions of {\it periods} ${\frac{2\pi}{\theta_1}}, {\frac{2\pi}{\theta_2}}, {\frac{2\pi}{\theta_3}}, \dots$) it is natural to call these $\theta_i$ the {\bf spectrum}\index{spectrum} of $f(t)$.2400This will eventually show up in our discussion below of trigonometric sums and the {\it Riemann spectrum}.240124022403\chapter{Spectra and trigonometric sums \label{sec:trigsums}}24042405As we saw in Chapter~\ref{sec:fourier1}, a pure tone can be represented by a periodic {\it sine wave}---a function of time $f(t)$---the equation of which might be:2406$$f(t)\ \ = \ \ a\cdot \cos(b +\theta t).$$24072408\ill{pure_tone}{.7}{Plot of the periodic sine wave $f(t) = 2\cdot \cos(1+t/2)$}24092410The $\theta$ determines the {\it frequency} of the periodic wave, the2411larger $\theta$ is the higher the ``pitch.'' The coefficient $a$2412determines the envelope of size of the periodic wave, and we call it2413the {\it amplitude} of the periodic wave.24142415Sometimes we encounter functions $F(t)$ that are not pure tones, but2416that can be expressed as (or we might say ``decomposed into'') a finite2417sum of pure tones, for example three of them:24182419$$F(t) = a_1\cdot \cos(b_1 +\theta_1 t) + a_2\cdot \cos(b_2 +\theta_2 t) + a_3\cdot \cos(b_3 +\theta_3 t)$$24202421\ill{mixed_tone3}{.7}{Plot of the sum $5 \cos\left(-t - 2\right) + 2 \cos\left(t/2 + 1\right) + 3 \cos\left(2 t + 4\right)$\label{fig:mixed_tone3}}24222423We refer to such functions $F(t)$ as in Figure~\ref{fig:mixed_tone3}2424as {\it finite trigonometric2425sums}, because---well---they are. In this example, there are three2426frequencies involved---i.e., $\theta_1,\theta_2,\theta_3$---and we2427say that {\it the spectrum\index{spectrum} of $F(t)$} is the set of these frequencies,2428i.e.,24292430$$2431{\rm The\ spectrum \ of \ } F(t) \ \ = \ \ \{\theta_1,\theta_2,\theta_3\}.2432$$24332434More generally we might consider a sum of any finite number of pure2435cosine waves---or in a moment we'll also see some infinite ones as2436well. Again, for these more general trigonometric sums, their {\it2437spectrum}\index{spectrum} will denote the set of frequencies that compose them.24382439\chapter{The spectrum and the staircase of primes\label{sec:fourier_staircase}}24402441\ill{prime_pi_100_aspect1}{0.95}{The Staircase of primes\label{fig:staircase100}}244224432444In view of the amazing data-compression virtues of Fourier analysis\index{Fourier analysis},2445it isn't unnatural to ask these questions:24462447\begin{itemize}2448\item Is there a way of using Fourier analysis\index{Fourier analysis} to better understand2449the complicated picture of the staircase of primes?24502451\item Does this staircase of primes (or, perhaps, some tinkered2452version of the staircase that contains the same basic information)2453have a {\em spectrum}\index{spectrum}?24542455\item If such a {\em spectrum}\index{spectrum} exists, can we compute it conveniently,2456just as we have done for the saw-tooth wave above, or for the major2457third CE chord?245824592460\item Assuming the spectrum\index{spectrum} exists, and is computable, will our2461understanding of this spectrum\index{spectrum} allow us to reproduce all the2462pertinent information about the placement of primes among all whole2463numbers, elegantly and faithfully?24642465\item And here is a most important question: will that spectrum\index{spectrum} show2466us order and organization lurking within the staircase that we would2467otherwise be blind to?24682469\end{itemize}247024712472Strangely enough, it is towards questions like these that Riemann's2473Hypothesis takes us. We began with the simple question about primes:2474how to count them, and are led to ask for profound, and hidden,2475regularities in structure.24762477\chapter{To our readers of Part~\ref{part1}}2478The statement of the \RH{}---admittedly as elusive as2479before---has, at least, been expressed elegantly and more simply,2480given our new staircase that approximates (conjecturally with {\em2481essential square root accuracy}) a 45 degree straight line.24822483We have offered two equivalent formulations of the \RH{},2484both having to do with the manner in which the prime numbers\index{prime number} are2485situated among all whole numbers.24862487In doing this, we hope that we have convinced you that---in the words2488of Don Zagier---primes seem to obey no other law than that of chance2489and yet exhibit stunning regularity. This is the end of Part~\ref{part1} of our2490book, and is largely the end of our main mission, to explain---in2491elementary terms---{\em what is Riemann's Hypothesis?}249224932494For readers who have at some point studied differential calculus,\index{calculus} in2495Part~\ref{part2} we shall discuss Fourier analysis\index{Fourier analysis}, a fundamental tool that will be used in Part~\ref{part3}2496where we show how2497Riemann's hypothesis provides a key to some deeper structure of the2498prime numbers\index{prime number}, and to the nature of the laws that they obey. We will---if not explain---at least2499hint at how the above series of questions have been answered so far,2500and how the \RH{} offers a surprise for the last question in this2501series.2502250325042505%\chapter{To our readers of Part~\ref{part1}}2506%The statement of the \RH{}---admittedly as elusive as2507%before---has, at least, been expressed elegantly and more simply,2508%given our new staircase that approximates (conjecturally with {\em2509% essential square root accuracy}) a 45 degree straight line.25102511%We have offered two equivalent formulations of the \RH{},2512%both having to do with the manner in which the prime numbers are2513%situated among all whole numbers.25142515%In doing this, we hope that we have convinced you that---in the words2516%of Don Zagier---primes seem to obey no other law than that of chance2517%and yet exhibit stunning regularity. This is the end of Part~\ref{part1} of our2518%book, and is largely the end of our main mission, to explain---in2519%elementary terms---{\em what is Riemann's Hypothesis?}252025212522%For readers who have at some point studied Differential calculus,\index{calculus} in2523%Part~\ref{part2} we shall2524%go further and explain how the pursuit of2525%Riemann's hypothesis may provide a key to some deeper structure of the2526%prime numbers, and to the nature of the laws that they obey.252725282529\part{Distribution\index{distribution}s\label{part2}}253025312532\chapter[Slopes of graphs that have no slopes]{How Calculus\index{calculus} manages to2533find the slopes of graphs that have no slopes}\label{ch:calculusmanages}25342535Differential calculus,\index{calculus} initially the creation of Newton and Leibniz2536in the 1680s, acquaints us with {\it slopes} of graphs of functions of2537a real variable. So, to discuss this we should say a word about what2538a {\it function} is, and what its {\it graph} is.25392540\illtwo{newton}{leibniz}{0.25}{Isaac Newton and Gottfried Leibniz created calculus\index{calculus}}254125422543A {\bf function} (let us refer to it in this discussion as $f$) is2544often described as a {\it kind of machine} that for any specific2545input numerical value $a$ will give, as output, a well-defined2546numerical value.25472548This ``output number'' is denoted $f(a)$ and is called {\it the2549``value'' of the function $f$ at $a$}. For example, the {\it2550machine that adds $1$ to any number} can be thought of as the2551function $f$ whose value at any $a$ is given by the equation $f(a) =2552a+1$. Often we choose a letter---say, $X$---to stand for a ``general2553number'' and we denote the function $f$ by the symbol $f(X)$ so that2554this symbolization allows to ``substitute for $X$ any specific number2555$a$'' to get its value $f(a)$.25562557The {\bf graph} of a function provides a vivid visual representation of the2558function in the Euclidean\index{Euclid} plane where over every point $a$ on the2559$x$-axis you plot a point above it of ``height'' equal to the value of2560the function at $a$, i.e., $f(a)$. In Cartesian coordinates, then,2561you are plotting points $(a, f(a))$ in the plane where $a$ runs2562through all real numbers.25632564\ill{graph_aplusone}{0.6}{Graph of the function $f(a)=a+1$\label{fig:graph_aplusone}}25652566In this book we will very often be talking about ``graphs'' when we2567are also specifically interested in the functions---of which they are2568the graphs. We will use these words almost synonymously since we like2569to adopt a very visual attitude towards the behavior of the functions2570that interest us.257125722573\ill{graph_slope_deriv}{1}{\label{fig:graph_slope_deriv}Calculus\index{calculus}}2574257525762577Figure~\ref{fig:graph_slope_deriv} illustrates a function (blue), the2578slope at a point (green straight line), and the derivative (red) of2579the function; the red derivative is the function whose value at a point2580is the slope of the blue function at that point. Differential2581calculus\index{calculus} explains to us how to calculate slopes of graphs, and2582finally, shows us the power that we then have to answer problems we2583could not answer if we couldn't compute those slopes.25842585Usually, in elementary calculus\index{calculus} classes we are called upon to compute2586slopes only of smooth graphs, i.e., graphs that actually {\em have}2587slopes at each of their points, such as in the illustration just2588above. What could calculus\index{calculus} possibly do if confronted with a graph2589that has {\em jumps}, such as in Figure~\ref{fig:jump}:2590$$f(x) = \begin{cases}1 & x \leq 3\\ 2 & x > 3?\end{cases}$$2591(Note that for purely aesthetic reasons, we draw a vertical line at the point where the jump occurs, though technically that vertical line is not part of the graph of the function.)25922593\ill{jump}{0.5}{The graph of the function $f(x)$ above that jumps---it is $1$ up to $3$ and then $2$ after that point.\label{fig:jump}}259425952596The most comfortable way to deal with the graph of such a function is2597to just approximate it by a differentiable function as in2598Figure~\ref{fig:jumpsmooth}. % Note that the function in2599%Figure~\ref{fig:jumpsmooth} is not smooth, since the2600%derivative is continuous but not differentiable; in our2601%discussion all2602%we will need is that our approximating function is once2603%differentiable.260426052606\ill{jump-smooth}{0.5}{A picture of a smooth graph approximating the2607graph that is $1$ up to some point $x$ and then $2$ after that2608point, the smooth graph being flat mostly.\label{fig:jumpsmooth}}260926102611Then take the {\em derivative} of that smooth function. Of course,2612this is just an approximation, so we might try to make a better2613approximation, which we do in each successive graph starting2614with Figure~\ref{fig:djump1} below.261526162617\ill{jump-smooth-deriv-7}{0.6}{A picture of the derivative of2618a smooth approximation to a function that jumps.\label{fig:djump1}}26192620Note that---as you would expect---in the range where the initial2621function is constant, its derivative is zero. In the subsequent2622figures, our initial function will be {\it nonconstant} for smaller2623and smaller intervals about the origin. Note also that, in our series2624of pictures below, we will be successively rescaling the $y$-axis; all2625our initial functions have the value $1$ for ``large'' negative numbers2626and the value $2$ for large positive numbers.26272628\ill{jump-smooth-deriv-2}{0.5}{Second picture of the derivative of2629a smooth approximation to a function that jumps.\label{fig:djump2}}2630\ill{jump-smooth-deriv-05}{0.5}{Third picture of the derivative of2631a smooth approximation to a function that jumps.\label{fig:djump3}}2632\ill{jump-smooth-deriv-01}{0.5}{Fourth picture of the derivative of2633a smooth approximation to a function that jumps.\label{fig:djump4}}263426352636Notice what is happening: as the approximation gets better and2637better, the derivative will be zero mostly, with a blip at the point2638of discontinuity, and the blip will get higher and higher.2639In each of these pictures, for any interval of real numbers $[a,b]$ the total area under the red graph over that interval is equal to2640\begin{center}2641{\it the height of the blue graph at $x=b$}\\2642minus\\2643{\it the height of the blue graph at $x=a$}.2644\end{center}2645This is a manifestation of one of the fundamental facts of life of2646calculus\index{calculus} relating a function to its derivative:26472648\begin{quote} Given any real-valued function $F(x)$---that has a2649derivative---for any interval of real numbers $[a,b]$ the total2650signed\footnote{When $F(x)<0$ we count that area as negative.} area2651between the graph and the horizontal axis2652of the derivative of $F(x)$ over that interval is2653equal to $F(b)-F(a)$.2654\end{quote}2655What happens if we take the series of figures \ref{fig:djump1}--\ref{fig:djump4}, etc.,2656{\it to the limit}? This is quite curious:26572658\begin{itemize}2659\item {\bf the series of red graphs:} these are getting thinner and2660thinner and higher and higher: can we make any sense of what the2661red graph might mean in the limit (even though the only picture of2662it that we have at present makes it infinitely thin and infinitely2663high)?26642665\item {\bf the series of blue graphs:} these are happily looking2666more and more like the tame Figure~\ref{fig:jump}.2667\end{itemize}26682669Each of our red graphs is the derivative of the corresponding blue2670graph. It is tempting to think of the limit of the red2671graphs---whatever we might construe this to be---as standing for2672the derivative of the limit of the blue graphs, i.e., of the graph2673in Figure~\ref{fig:jump}.26742675Well, the temptation is so great that, in fact, mathematicians and2676physicists of the early twentieth century struggled to give a2677meaning to things like {\it the limit of the red graphs}---such2678things were initially called {\bf generalized functions}, which2679might be considered the derivative of {\it the limit of the blue2680graphs}, i.e., of the graph of Figure~\ref{fig:jump}.268126822683Of course, to achieve progress in mathematics, all the concepts2684that play a role in the theory have to be unambiguously defined,2685and it took a while before {\it generalized functions} such as the2686limit of our series of red graphs had been rigorously introduced.26872688But many of the great moments in the development of mathematics2689occur when mathematicians---requiring some concept not yet2690formalized---work with the concept tentatively, dismissing---if2691need be---mental tortures, in hopes that the experience they2692acquire by working with the concept will eventually help to put2693that concept on sure footing. For example, early mathematicians2694(Newton, Leibniz)|in replacing approximate speeds by instantaneous2695velocities by passing to limits|had to wait a while before later2696mathematicians (e.g., Weierstrass) gave a rigorous foundation for2697what they were doing.269826992700% From http://en.wikipedia.org/wiki/Karl_Weierstrass27012702% From http://en.wikipedia.org/wiki/Laurent_Schwartz2703\illtwo{weierstrass}{schwartz}{.3}{Karl Weierstrass (1815--1897) and Laurent Schwartz (1915--2002)\index{Schwartz, Laurent}}27042705Karl Weierstrass\index{Weierstrass, Karl}, who worked during the latter part of the nineteenth2706century, was known as the ``father of modern analysis.'' He oversaw2707one of the glorious moments of rigorization of concepts that were2708long in use, but never before systematically organized. He, and2709other analysts of the time, were interested in providing a rigorous2710language to talk about {\it functions}, and more specifically {\it2711continuous functions} and {\it smooth} (i.e., {\it differentiable})2712functions. They wished to have a firm understanding of limits (i.e.,2713of sequences of numbers, or of functions).271427152716For Weierstrass\index{Weierstrass, Karl} and his companions, even though the functions they2717worked with needn't be smooth, or continuous, at the very least, the2718functions they studied had {\it well-defined---and usually2719finite---values}. But our ``limit of red graphs'' is not so easily2720formalized as the concepts that occupied the efforts of2721Weierstrass.27222723Happily however, this general process of approximating2724discontinuous functions more and more exactly by smooth functions,2725and taking their derivatives to get the blip-functions as we have2726just seen in the red graphs above was eventually given a2727mathematically rigorous foundation; notably, by the French2728mathematician, Laurent Schwartz\index{Schwartz, Laurent} who provided a beautiful theory that2729we will not go into here, that made perfect sense of ``generalized2730functions'' such as our limit of the series of red graphs, and that2731allows mathematicians to work with these concepts with ease. These2732``generalized functions'' are called {\it distributions}\index{distribution} in2733Schwartz's\index{Schwartz, Laurent} theory \bibnote{2734\url{http://en.wikipedia.org/wiki/Distribution_\%28mathematics\%29}2735contains a more formal definition and treatment of distributions. Here is Schwartz's explanation for his choice of the word {\it distribution:} \begin{quote}``Why did we choose the name distribution?2736Because, if $\mu$ is a measure, i.e., a particular kind of2737distribution, it can be considered as a distribution of electric2738charges in the universe. Distributions give more general types of2739electric charges, for example dipoles and magnetic distributions.2740If we consider the dipole placed at the point $a$ having magnetic2741moment $M$, we easily see that it is defined by the distribution2742$-D_M \delta_{(a)}$. These objects occur in physics. Deny's2743thesis, which he defended shortly after, introduced electric2744distributions of finite energy, the only ones which really occur in2745practice; these objects really are distributions, and do not2746correspond to measures. Thus, distributions have two very2747different aspects: they are a generalization of the notion of2748function, and a generalization of the notion of distribution of2749electric charges in space. [...] Both these interpretations of2750distributions are currently used.''\end{quote}}.27512752275327542755\chapter[Distributions]{Distributions:\index{distribution} sharpening our approximating functions even if2756we have to let them shoot out to infinity\label{sec:dist}}2757275827592760The curious {\it limit of the red graphs} of the previous section,2761which you might be tempted to think of as a ``blip-function'' $f(t)$ that2762vanishes for $t$ nonzero and is somehow ``infinite'' (whatever that2763means) at $0$ is an example of a {\it generalized function} (in the2764sense of the earlier mathematicians) or a {\it distribution}\index{distribution} in the2765sense of Laurent Schwartz.\index{Schwartz, Laurent}27662767This particular {\it limit of the red graphs} also goes by another2768name (it is officially called a2769Dirac $\delta$-function\index{$\delta$-function} (see \bibnote{2770David Mumford suggested that we offer the following paragraph from\\2771\url{http://en.wikipedia.org/wiki/Dirac_delta_function}\\2772on the Dirac delta function:2773\begin{quote}2774An infinitesimal formula for an infinitely tall, unit impulse delta function (infinitesimal version of Cauchy distribution) explicitly appears in an 1827 text of Augustin Louis Cauchy. Sim\'{e}on Denis Poisson considered the issue in connection with the study of wave propagation as did Gustav Kirchhoff somewhat later. Kirchhoff and Hermann von Helmholtz also introduced the unit impulse as a limit of Gaussians, which also corresponded to Lord Kelvin's notion of a point heat source. At the end of the 19th century, Oliver Heaviside used formal Fourier series to manipulate the unit impulse. The Dirac delta function as such was introduced as a ``convenient notation'' by Paul Dirac in his influential 1930 book {\em Principles of Quantum Mechanics}. He called it the ``delta function'' since he used it as a continuous analogue of the discrete Kronecker delta.2775\end{quote}}),2776the adjective ``Dirac'' being in honor of the physicist who first worked with this2777concept, the ``$\delta$'' being the symbol he assigned to these2778objects). The noun ``function'' should be in quotation marks for,2779properly speaking, the Dirac $\delta$-function\index{$\delta$-function} is not---as we have2780explained above---a bona fide function but rather a distribution\index{distribution}.27812782\ill{dirac}{0.3}{Paul\index{Dirac, Paul} Adrien Maurice Dirac (1902--1984)}278327842785Now may be a good time to summarize what the major difference is2786between {\it honest functions} and {\it generalized functions}2787or {\it distribution\index{distribution}s}.27882789An honest (by which we mean {\it integrable}) function of a real2790variable $f(t)$ possesses two ``features.''279127922793\begin{itemize}2794\item {\bf It has values.} That is, at any real number $t$, e.g., $t2795=2$ or $t =0$ or $t =\pi$ etc., our function has a definite real2796number value ($f(2)$ or $f(0)$ or $f(\pi)$ etc.) {\it and if we know2797all those values we know the function.}27982799\item {\bf It has areas under its graph.} If we are given any interval2800of real numbers, say the interval between $a$ and $b$, we can talk2801unambiguously about the area ``under'' the graph of the function2802$f(t)$ over the interval between $a$ and $b$. That is, in the2803terminology of integral calculus,\index{calculus} we can talk of {\it the integral of $f(t)$2804from $a$ to $b$}. And in the notation of calculus,\index{calculus} this---thanks2805to Leibniz---is elegantly denoted2806$$\int_a^bf(t)dt.$$28072808\end{itemize}28092810\ill{oo_integral}{.8}{This figure illustrates2811$\int_{-\infty}^{\infty} f(x) dx$, which is the signed area2812between the graph of $f(x)$ and the $x$-axis, where area below2813the $x$-axis (yellow) counts negative, and area above (grey) is2814positive.}281528162817In contrast, a {\it generalized function} or {\it distribution\index{distribution}}~\begin{itemize}2818\item {\bf may not have ``definite values''} at all real numbers if it2819is not an honest function. Nevertheless,28202821\item {\bf It has well-defined areas under portions of its ``graph.''}2822If we are given any interval of real numbers, say the (open)2823interval between $a$ and $b$, we can still talk unambiguously about2824the {\it area ``under'' the graph of the generalized function $D(t)$2825over the interval between $a$ and~$b$} and we will denote2826this--extending what we do in ordinary calculus---by\index{calculus} the symbol2827$$\int_a^bD(t)dt.$$2828\end{itemize}28292830This description is important to bear in mind and it gives us a handy2831way of thinking about ``generalized functions'' (i.e., distribution\index{distribution}s)2832as opposed to functions: when we consider an (integrable) function of2833a real variable, $f(t)$, we may invoke its {\it value} at every real2834number and for every interval $[a,b]$ we may consider the quantity2835$\int_a^bf(t)dt$. BUT when we are given a generalized function $D(t)$2836we {\it only} have at our disposal the latter quantities. In fact, a2837generalized function of a real variable $D(t)$ is (formally) nothing2838more than a {\it rule} that assigns to any finite interval $(a,b]$ ($a2839\le b$) a quantity that we might denote $\int_a^bD(t)dt$ and that {\it2840behaves as if it were the integral of a function} and in2841particular---for three real numbers $a\le b\le c$ we have the2842additivity relation2843$$ \int_a^cD(t)dt \ = \ \int_a^bD(t)dt \ + \ \int_b^cD(t)dt.$$28442845SO, any honest function integrable over finite intervals clearly {\it2846is} a distribution\index{distribution} (forget about its values!) but $\dots$ there are2847many more generalized functions, and including them in our sights2848gives us a very important tool.284928502851It is natural to talk, as well, of Cauchy sequences, and limits, of2852distributions.\index{distribution} We'll say that such a sequence $D_1(t), D_2(t),2853D_3(t),\dots$ is a {\bf Cauchy sequence} if for every interval2854$[a,b]$ the quantities $$\int_a^bD_1(t)dt,\quad2855\int_a^bD_2(t)dt,\quad\int_a^bD_3(t)dt,\dots$$ form a Cauchy sequence2856of real numbers (so for any $\varepsilon>0$ eventually all terms2857in the sequence of real numbers2858are within $\varepsilon$ of each other).2859Now, any Cauchy sequence of distributions\index{distribution} {\it2860converges to a limiting distribution}\index{distribution} $D(t)$ which is defined by2861the rule that for every interval $[a,b]$,28622863$$ \int_a^bD(t)dt\ =\ \lim_{i \to \infty} \int_a^bD_i(t)dt.$$286428652866If, by the way, you have an infinite sequence---say---of honest,2867continuous, functions that converges uniformly to a limit (which2868will again be a continuous function) then that sequence certainly2869converges---in the above sense---to the same limit when these2870functions are viewed as generalized functions. BUT, there are many2871important occasions where your sequence of honest continuous2872functions doesn't have that convergence property and {\it yet} when2873these continuous functions are viewed as generalized functions they do converge to some2874generalized function as a limit. We will see this soon when we get2875back to the ``sequence of the red graphs.'' This sequence {\bf does}2876converge (in the above sense) to the Dirac $\delta$-function\index{$\delta$-function} when2877these red graphs are thought of as a sequence of generalized2878functions.28792880288128822883The integral notation for distribution\index{distribution} is very useful, and allows us2884the flexibility to define, for nice enough---and honest---functions2885$c(t)$ useful expressions such as $$\int_a^bc(t)D(t).$$28862887\ill{dirac_delta}{0.5}{The Dirac $\delta$-``function'' (actually2888distribution),\index{distribution} where we draw a vertical arrow to illustrate the2889delta function with support at a given point.}28902891For example, the Dirac $\delta$-function\index{$\delta$-function} we have been discussing2892(i.e., the limit of the red graphs of Chapter~\ref{ch:calculusmanages})\index{calculus} {\it is}2893an honest function away from $t=3$ and---in fact---is the ``trivial2894function'' zero away from $3$. And at $3$, we may {\it say}2895that it has the ``value'' infinity, in honor of it being the limit of2896blip functions getting taller and taller at $3$. The feature that2897pins it down as a distribution\index{distribution} is given by its behavior relative to2898the second feature above, the area of its graph over the open2899interval $(a,b)$ between $a$ and $b$:29002901\begin{itemize}2902\item If $3$ is2903not in the open interval spanned by $a$ and $b$, then the ``area2904under the graph of our Dirac $\delta$-function''\index{$\delta$-function} over2905the interval $(a,b)$2906is $0$.2907\item If $3$ is in the open interval $(a,b)$, then the ``area under the2908graph of our Dirac $\delta$-function''\index{$\delta$-function} is $1$---in2909notation $$\int_a^b\delta = 1.$$2910\end{itemize}29112912291329142915We sometimes summarize the fact that these areas vanish so long as $3$2916is not included in the interval we are considering by saying2917that the {\bf support} of this $\delta$-function\index{$\delta$-function} is ``at $3$.''291829192920Once you're happy with {\it this} Dirac $\delta$-function,\index{$\delta$-function} you'll also2921be happy with a Dirac $\delta$-function---call\index{$\delta$-function} it $\delta_x$---with2922support concentrated at any specific real number~$x$.2923This $\delta_x$ vanishes for $t \ne x$ and intuitively speaking, has an2924{\it infinite blip} at $t=x$.29252926So, the original delta-function we were discussing, i.e., $\delta(t)$2927would be denoted $\delta_3(t)$.29282929\noindent {\bf A question:} If you've never seen distribution\index{distribution}s2930before, but know the Riemann integral, can you guess at what the2931definition of $\int_a^bc(t)D(t)$ is, and can you formulate hypotheses2932on $c(t)$ that would allow you to endow this expression with a2933definite meaning?293429352936\noindent {\bf A second question:} If you have not seen distribution\index{distribution}s2937before, and have answered the first question above, let2938$c(t)$ be an honest function for which your definition2939of $$\int_a^bc(t)D(t)$$ applies. Now let $x$ be a real number.2940Can you use your definition to compute29412942$$\int_{-{\infty}}^{+{\infty}}c(t)\delta_x(t)?$$29432944The answer to this second question, by the way, is:2945$\int_{-{\infty}}^{+{\infty}}c(t)\delta_x(t)=c(x).$ This will be useful in the later sections!294629472948The theory of distributions\index{distribution} gives a partial answer to the following funny question:294929502951\begin{quote} How in the world can you ``take the derivative'' of a2952function $F(t)$ that doesn't have a derivative?2953\end{quote}29542955The short answer to this question is that {\it this derivative $F'(t)$2956which doesn't exist as a function may exist as a distribution\index{distribution}.}2957What then is the integral of that distribution?\index{distribution} Well, it is given by2958the original function!29592960$$\int_a^bF'(t)dt \ = \ F(b) -F(a).$$29612962Let us practice this with simple staircase functions. For example,2963what is the {\it derivative}---in the sense of the theory of2964distributions---of\index{distribution} the function in Figure~\ref{fig:simple_staircase}?2965{\bf Answer:} $\delta_0 + 2 \delta_1$.296629672968\ill{simple_staircase}{.6}{The staircase function that is $0$ for $t2969\le 0$, $1$ for $0 <t \le 1$ and $3$ for $1< t \le 2$ has derivative2970$\delta_0 + 2\delta_1$.\label{fig:simple_staircase}}297129722973We'll be dealing with much more complicated staircase functions in the2974next chapter, but the general principles discussed here will nicely2975apply there \bibnote{As discussed in\\ \url{http://en.wikipedia.org/wiki/Distribution_\%28mathematics\%29},2976``generalized functions'' were introduced by Sergei2977Sobolev in the 1930s, then later2978independently introduced in the late29791940s by Laurent Schwartz, who developed a comprehensive theory of2980distributions.2981}.298229832984\chapter{Fourier transforms: second visit}29852986In Chapter~\ref{sec:fourier1} above we wrote:29872988\begin{quote} The operation that starts with a graph and goes to its2989spectral picture that records the frequencies, amplitudes, and2990phases of the pure sine waves that, together, compose the graph is2991called the {\bf Fourier transform}.2992\end{quote}299329942995Now let's take a closer look at this operation {\it Fourier transform}.29962997We will focus our discussion on an {\bf even} function $f(t)$ of a2998real variable $t$. ``{\bf Even}'' means that its graph is symmetric2999about the $y$-axis; that is, $f(-t)= f(t)$. See3000Figure~\ref{fig:even_function}.30013002\ill{even_function}{.8}{The graph of an even function is symmetrical3003about the $y$-axis.\label{fig:even_function}}30043005When we get to apply this discussion to the {\it staircase of3006primes} $\pi(t)$ or the {\it tinkered staircase of primes}3007$\psi(t)$, both of which being defined only for positive values of3008$t$, then we would ``lose little information'' in our quest to3009understand them if we simply ``symmetrized their graphs'' by3010defining their values on negative numbers $-t$ via the formulas3011$\pi(-t)=\pi(t)$ and $\psi(-t)=\psi(t)$ thereby turning each of them3012into {\it even functions}.30133014\ill{even_pi}{.8}{Even extension of the staircase of primes.}30153016The idea behind the Fourier transform is to express $f(t)$ as {\it3017made up out of sine and cosine wave functions}. Since we have3018agreed to consider only even functions, we can dispense with the sine3019waves---they won't appear in our Fourier analysis\index{Fourier analysis}---and ask how to3020reconstruct $f(t)$ as a {\it sum} (with coefficients) of cosine3021functions (if only finitely many frequencies occur in the spectrum\index{spectrum} of3022our function) or more generally, as an {\it integral} if the spectrum\index{spectrum}3023is more elaborate. For this work, we need a little machine that tells3024us, for each real number $\theta$, whether or not $\theta$ is in the3025spectrum\index{spectrum} of $f(t)$, and if so, what the amplitude is of the cosine3026function $\cos(\theta t)$ that occurs in the Fourier expansion of3027$f(t)$---this amplitude answers the awkwardly phrased question:30283029{\it3030How much $\cos(\theta t)$ ``occurs in'' $f(t)$?}30313032We will denote this3033amplitude by ${\hat f}(\theta)$, and refer to it as {\bf the Fourier3034transform} of $f(t)$. The {\bf spectrum},\index{spectrum} then, of $f(t)$ is the3035set of all frequencies $\theta$ where the amplitude is nonzero.30363037\ill{fourier_machine}{.6}{The Fourier transform machine, which transforms $f(t)$ into ${\hat f}(\theta)$\label{fig:fourier_machine}}303830393040Now in certain easy circumstances---specifically, if3041$\int_{-{\infty}}^{+{\infty}}|f(t)|dt$ (exists, and)3042is finite---integral calculus\index{calculus} provides us with an easy construction of that3043machine (see Figure~\ref{fig:fourier_machine}); namely:30443045$${\hat f}(\theta) = \int_{-{\infty}}^{+{\infty}}f(t)\cos(-\theta t)dt.$$304630473048This concise machine manages to ``pick out'' just the part of3049$f(t)$ that has frequency $\theta$! It provides for us the {\it3050analysis} part of the Fourier analysis\index{Fourier analysis} of our function3051$f(t)$.30523053But there is a {\it synthesis} part to our work as well,3054for we can reconstruct $f(t)$ from its Fourier transform, by a3055process intriguingly similar to the analysis part; namely: if3056$\int_{-{\infty}}^{+{\infty}}|{\hat f}(\theta)|d\theta$ (exists, and) is finite, we retrieve $f(t)$ by the integral30573058$$f(t) = {\frac{1}{2\pi}}\int_{-{\infty}}^{+{\infty}}{\hat f}(\theta)\cos(\theta t)d\theta.$$30593060We are not so lucky to have3061$\int_{-{\infty}}^{+{\infty}}|f(t)|dt$ finite when we try our3062hand at a Fourier analysis\index{Fourier analysis} of the staircase of primes, but we'll3063work around this!3064306530663067\chapter[Fourier transform of delta]{What is the Fourier transform of a delta function?\label{sec:ftdelta}}30683069Consider the $\delta$-function\index{$\delta$-function} that we denoted $\delta(t)$ (or3070$\delta_0(t)$). This is also the ``generalized function'' that we3071thought of as the ``limit of the red graphs'' in Chapter~\ref{sec:dist}3072above. Even though $\delta(t)$ is a distribution\index{distribution} and {\it not} a bona3073fide function, it is symmetric about the origin, and3074also $$\int_{-{\infty}}^{+{\infty}}|\delta(t)|dt$$ exists, and is3075finite (its value is, in fact, $1$). All this means that,3076appropriately understood, the discussion of the previous section3077applies, and we can {\it feed} this delta-function into our Fourier3078Transform Machine (Figure~\ref{fig:fourier_machine}) to see what3079frequencies and amplitudes arise in our attempt to express---whatever3080this means!---the delta-function as a sum, or an integral, of cosine3081functions.308230833084So what is the Fourier transform, ${\hat \delta_0}(\theta)$, of the delta-function?308530863087Well, the general formula would give us:3088$$ {\hat \delta_0}(\theta) = \int_{-\infty}^{+\infty}\cos(-\theta t)\delta_0(t)dt.$$3089For any nice function $c(t)$ we3090have that the integral of the product of $c(t)$ by the distribution\index{distribution}3091$\delta_x(t)$ is given by the {\it value} of the function $c(t)$ at3092$t=x$. So:30933094$$ {\hat \delta_0}(\theta) = \int_{-\infty}^{+\infty}\cos(-\theta t)\delta_0(t)dt = \cos(0) = 1.$$309530963097In other words, the Fourier transform of $\delta_0(t)$ is the constant3098function $$ {\hat \delta_0}(\theta)=1.$$ One can think of this3099colloquially as saying that the delta-function is a perfect example of3100{\it white noise} in that {\it every} frequency occurs in its Fourier3101analysis and they all occur in equal amounts.31023103To generalize this computation, let us consider for any real number $x$3104the symmetrized delta-function with support at $x$ and $-x$, given3105by $$d_x(t) \ = \ (\delta_x(t) + \delta_{-x}(t))/2$$3106in Figure~\ref{fig:two_delta}.31073108\ill{two_delta}{0.4}{The sum $(\delta_x(t) + \delta_{-x}(t))/2$, where we draw vertical arrows to illustrate the Dirac delta functions.\label{fig:two_delta}}31093110What is the Fourier transform of this $d_x(t)$? The answer is3111given by making the same computation as we've just made:31123113\begin{align*}\label{dx}3114{\hat d_x}(\theta) &= {\frac{1}{2}}\left(\int_{-\infty}^{+\infty}\cos(-\theta t)\delta_x(t)dt + \int_{-\infty}^{+\infty}\cos(-\theta t)\delta_{-x}(t)dt\right)\\3115&= {\frac{1}{2}}\big(\cos(-\theta x)+ \cos(+\theta x)\big)\\3116&= \cos(x\theta)3117\end{align*}311831193120To summarize this in ridiculous (!) colloquial terms: {\it for any3121frequency $\theta$ the amount of $\cos(\theta t)$ you need to build3122up the generalized function $(\delta_x(t) + \delta_{-x}(t))/2$ is3123$\cos(x\theta).$ }312431253126So far, so good, but remember that the theory of the Fourier transform3127has---like much of mathematics---two parts: an {\it analysis part} and3128a {\it synthesis} part. We've just performed the {\it analysis} part3129of the theory for these symmetrized delta functions $(\delta_x(t) +3130\delta_{-x}(t))/2$.31313132Can we synthesize them---i.e., build them up again---from their Fourier transforms?313331343135We'll leave this, at least for now, as a question for you.31363137%\chapter{Supports, Spectra, Blips and Spikes}31383139%(to be written)31403141% Here is where we will3142% be really justifying our heavy discussion of distribution\index{distribution}s,3143% trigonometric sums, etc.31443145\chapter{Trigonometric series}\label{ch:trigseries}3146Given our interest in the ideas of Fourier, it is not surprising that3147we'll want to deal with things like $$F(\theta) = \sum_{k=1}^{\infty}3148a_k\cos(s_k\cdot \theta)$$ where the $s_k$ are real numbers tending3149(strictly monotonically) to infinity. These we'll just call {\bf3150trigonometric series} without asking whether they converge in any3151sense for all values of $\theta$, or even for {\it any} value of $\theta$. The3152$s_k$'s that occur in such a trigonometric series we will call the3153{\bf spectral values} or for short, the {\bf spectrum}\index{spectrum} of the series,3154and the $a_k$'s the (corresponding) {\bf amplitudes}. We repeat that3155we impose no convergence requirements at all. But we also think of3156these things as providing ``cutoff'' finite trigonometric sums, which3157we think of as functions of two variables, $\theta$ and $C$ (the3158``cutoff'') where $$F(\theta,C): = \sum_{s_k\le C} a_k\cos(s_k\cdot \theta).$$ These functions $F(\theta,C)$ are finite trigonometric series and therefore ``honest functions" having finite values everywhere.315931603161%\section{Trigonometric Series as Fourier Transforms}316231633164Recall, as in Chapter \ref{sec:ftdelta}, that for any real number $x$, we considered3165the symmetrized delta-function with support at $x$ and $-x$, given3166by3167$$3168d_x(t) \ = \ (\delta_x(t) + \delta_{-x}(t))/2,3169$$3170and noted that the Fourier transform of this $d_x(t)$ is3171$$3172\label{dx2}{\hat d_x}(\theta) = \cos(x\theta).3173$$3174\ill{two_delta}{0.4}{The sum $(\delta_x(t) + \delta_{-x}(t))/2$,3175where we draw vertical arrows to illustrate the Dirac delta functions.}31763177It follows, of course, that a cutoff finite trigonometric series,3178$F(\theta,C)$ associated to an infinite trigonometric series\index{trigonometric series}3179$$3180F(\theta) = \sum_{k=1}^{\infty} a_k\cos(s_k\cdot \theta)3181$$3182is the Fourier transform of the distribution\index{distribution}3183$$3184D(t,C):=\sum_{s_k \le C}a_kd_{s_k}(t).3185$$3186Given the discreteness of the set of spectral values $s_k$ ($k=1,2,\dots$) we can3187consider the infinite sum3188$$3189D(t):=\sum_{k=1}^{\infty}a_kd_{s_k}(t),3190$$ viewed as distribution\index{distribution} playing the role of3191the ``inverse Fourier transform" of our trigonometric3192series $F(t)$.319331943195%\section{\bf Spike values}31963197\begin{definition} Say3198that a trigonometric series $F(\theta)$ has a {\bf spike} at the real number $\theta=\tau \in {\bf R}$3199if the set of absolute values $|F(\tau,C)|$ as $C$ ranges through positive number cutoffs is3200unbounded. A real number $\tau \in {\bf R}$ is, in contrast, a {\bf3201non-spike} if those values admit a finite upper3202bound. \end{definition}32033204% One mission in this book will be furthered just by paying attention to the %spike values of certain (infinite) trigonometric series.320532063207% The trigonometric sums we study are3208% going to be particularly nice. Under the assumption of the Riemann3209% Hypothesis, they will have the3210% property that the set of their spike values are particularly interesting {\it %discrete}3211% subsets of the real line.32123213In the chapters that follow we will be exhibiting graphs of trigonometric functions, cutoff at various values of $C$, that (in our opinion) strongly hint that as $C$ goes to infinity, one gets convergence to certain (discrete) sequences of very interesting {\it spike values}. To be sure, no finite computation, exhibited by a graph, can {\it prove} that this is in fact the case. But on the one hand, the vividness of those spikes is in itself worth experiencing, and on the other hand, given RH, there is justification that the {\it strong hints} are not misleading; for some theoretical background see the endnotes.3214321532163217% WARNING: I had to hard-code the III in the title, since3218% otherwise it would be ?? in some places!! (Maybe due to limitations3219% of latex.)3220%\chapter{A sneak preview of Part~\ref{part3}}\label{snpr}3221\chapter{A sneak preview of Part~III}\label{snpr}32223223In this chapter, as a striking illustration of the type of phenomena that will be studied in Part~\ref{part3}, we will consider two infinite trigonometric sums---that seem to be related one to the other in that the {\it frequencies} of the terms in the one trigonometric sum give the {\it spike values} of the other, and vice versa: the {\it frequencies} of the other give the {\it spike values} of the one: a kind of duality as in the theory of Fourier transforms. We show this duality by exhibiting the graphs of more and more accurate finite approximations (cutoffs) of these infinite sums. More specifically,32243225\begin{enumerate}\item The first infinite trigonometric sum $F(t)$ is a sum{\footnote{Here we3226make use of the Greek symbol $\sum$ as a shorthand way of3227expressing a sum of many terms. We are not requesting this sum to3228converge.}} of pure cosine waves with frequencies given by {\it logarithms of powers of primes} and with amplitudes given by the formula $$F(t):=3229-\sum_{p^n}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))3230$$ the summation being over all powers $p^n$ of all prime numbers\index{prime number} $p$.32313232The graphs of longer and longer finite truncations of these trigonometric sums, as you will see, have ``higher and higher peaks" concentrated more and more accurately at a {\it certain infinite discrete set of real numbers} that we will be referring to as {\bf the Riemann spectrum}, indicated in our pictures below (Figures~\ref{fig:pnsum5}--\ref{fig:pnsum500}) by the series of vertical red lines.32333234\item In contrast, the second infinite trigonometric sum $H(t)$ is a sum of pure cosine waves with frequencies given by what we have dubbed above {\it the Riemann spectrum} and with amplitudes all equal to $1$.32353236$$3237H(s):=1+\sum_{\theta}\cos(\theta \log(s)).3238$$ These graphs will have ``higher and higher peaks" concentrated more and more accurately at {\bf the logarithms of powers of primes} indicated in our pictures below (see Figure~\ref{fig:phihat}) by the series of vertical blue spikes.32393240\end{enumerate}32413242324332443245That the series of {\it blue lines} (i.e., the logarithms of powers of3246primes) in our pictures below determines---via the trigonometric3247sums we describe---the series of {\it red lines} (i.e., what we are3248calling the spectrum)\index{spectrum} and conversely is a consequence of the Riemann3249Hypothesis.3250\begin{enumerate}32513252\item {\bf Viewing the Riemann spectrum as the spike values of a trigonometric series with3253frequencies equal to (logs of) powers of the primes:}32543255To get warmed up, let's plot the positive values of the following3256sum of (co-)sine waves:32573258\begin{align*}3259f(t) =& -{\frac{\log(2)}{2^{1/2}}}\cos(t\log(2))- {\frac{\log(3)}{3^{1/2}}}\cos(t\log(3))\\3260&\qquad -{\frac{\log(2)}{4^{1/2}}}\cos(t\log(4))-{\frac{\log(5)}{5^{1/2}}}\cos(t\log(5))3261\end{align*}3262326332643265\ill{mini_phihat_even}{1}{Plot of $f(t)$\label{fig:mini_phihat_even}}326632673268Look at the peaks of Figure~\ref{fig:mini_phihat_even}. There is nothing very impressive3269about them, you might think; but wait, for $f(t)$ is just a very ``early''3270piece of the infinite trigonometric sum $F(t)$ described above.3271327232733274Let us truncate the infinite sum $F(t)$ taking only finitely many terms, by3275choosing various ``cutoff values'' $C$ and forming the finite sums3276$$3277F_{\le C}(t):= -\sum_{p^n\leq C}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))3278$$3279and plotting their positive3280values. Figures~\ref{fig:pnsum5}--\ref{fig:pnsum500} show what we get3281for a few values of $C$.328232833284In each of the graphs, we have indicated by red vertical arrows the3285real numbers that give the values of the {\it Riemann spectrum}\index{Riemann spectrum} that we will3286be discussing. These numbers at the red3287vertical arrows in Figures~\ref{fig:pnsum5}--\ref{fig:pnsum500},3288$$3289\theta_1, \theta_2, \theta_3, \dots3290$$3291are {\it spike values}---as described in Chapter~\ref{ch:trigseries}---of the infinite trigonometric series\index{trigonometric series}3292$$3293-\sum_{p^n<C}{\frac{\log(p)}{p^{n/2}}} \cos(t \log(p^n)).3294$$3295They constitute what we are calling the Riemann spectrum\index{Riemann spectrum} and are key to the staircase of primes3296\bibnote{If the \RH{} holds3297they are precisely the {\it imaginary parts of the3298``nontrivial'' zeroes of the Riemann zeta function}.}.32993300\begin{itemize}3301\item {\bf The sum with $p^n \le C=5$}33023303In Figure~\ref{fig:pnsum5} we plot the function $f(t)$ displayed above; it consists of the sum of the first four terms of our infinite sum, and doesn't yet show very much ``structure'':33043305\ill{phihat_even-5}{1}{Plot of $-\sum_{p^n\leq 5}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))$ with3306arrows pointing to the spectrum\index{spectrum} of the primes\label{fig:pnsum5}}33073308\vskip20pt3309\item {\bf The sum with $p^n \le C=20$}33103311Something,3312(don't you agree?) is already beginning to happen3313in the graph in Figure~\ref{phihat_even-20}:3314\ill{phihat_even-20}{1}{Plot of $-\sum_{p^n\leq 20}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))$ with arrows pointing to the spectrum\index{spectrum} of the primes\label{phihat_even-20}}\vskip20pt3315\item {\bf The sum with $p^n \le C=50$}33163317Note that the high peaks in Figure~\ref{fig:phihat_even-50}3318seem to be lining up more accurately with3319the vertical red lines. Note also that the $y$-axis has been3320rescaled.33213322\ill{phihat_even-50}{1}{Plot of $-\sum_{p^n\leq 50}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))$ with arrows pointing to the spectrum\index{spectrum} of the primes\label{fig:phihat_even-50}}\vskip20pt3323\item {\bf The sum with $p^n \le C=500$}33243325Here, the peaks are even sharper, and note that again they are3326higher; that is, we have rescaled the $y$-axis.3327\ill{phihat_even-500}{1}{Plot of $-\sum_{p^n\leq3328500}{\frac{\log(p)}{p^{n/2}}}\cos(t\log(p^n))$ with arrows3329pointing to the spectrum\index{spectrum} of the primes\label{fig:pnsum500}}3330\end{itemize}33313332We will pay attention to:33333334\begin{itemize}3335\item how the spikes ``play out'' as we take the sums of longer and longer3336pieces of the infinite sum of cosine waves above, given by larger3337and larger cutoffs $C$,3338\item how this spectrum\index{spectrum} of red lines more closely matches the high3339peaks of the graphs of the positive values of these finite sums,3340\item how these peaks are climbing higher and higher,3341\item what relationship these peaks have to the Fourier analysis\index{Fourier analysis} of the3342staircase of primes,3343\item and, equally importantly, what these mysterious red lines3344signify.3345\end{itemize}33463347\item{\bf Toward (logs of) powers of the primes, starting from the Riemann spectrum\index{Riemann spectrum}:}33483349Here we will be making use of the series of numbers $$\theta_1,3350\theta_2, \theta_3, \dots$$ comprising what we called the {\it3351spectrum}\index{spectrum}. We consider the infinite trigonometric series\index{trigonometric series}3352$$3353G(t):= 1 + \cos(\theta_1t)+\cos(\theta_2t)+\cos(\theta_3t)+\dots3354$$3355or, using the $\sum$ notation,3356$$3357G(t):= 1+\sum_{\theta}\cos(\theta t)3358$$3359where the summation is over the spectrum\index{spectrum}, $\theta=\theta_1, \theta_2,3360\theta_3, \dots$. Again we will consider finite cutoffs $C$ of this3361infinite trigonometric sum (on a logarithmic scale),3362$$3363H_{\le C}(s):= 1+\sum_{i \leq C}\cos(\log(s) \theta_i)3364$$3365and to see the spikes in $H_{\le 1000}(s)$3366consider Figure~\ref{fig:phihat}.33673368\ill{phi_cos_sum_2_30_1000}{.8}{Illustration of $-\sum_{i=1}^{1000}3369\cos(\log(s)\theta_i)$, where $\theta_1 \sim 14.13, \ldots$ are the3370first $1000$ contributions to the spectrum\index{spectrum}. The red dots3371are at the prime powers $p^n$, whose size is proportional to3372$\log(p)$.\label{fig:phihat}}33733374\end{enumerate}33753376This passage---thanks to the Riemann Hypothesis---from spectrum\index{spectrum} to3377prime powers and back again via consideration of the ``high peaks'' in3378the graphs of the appropriate trigonometric sums provides a kind of3379visual duality emphasizing, for us, that the information inherent in3380the wild spacing of prime powers, is somehow ``packaged'' in the Riemann spectrum\index{Riemann spectrum}, and reciprocally,3381the information given in that series of mysterious numbers is3382obtainable from the sequence of prime powers.3383338433853386\part{The Riemann Spectrum of the Prime Numbers\label{part3}}\index{prime number}338733883389\chapter{On losing no information\label{sec:loseno}}33903391To manage to repackage the ``same'' data in various ways---where each3392way brings out some features that would be kept in the shadows if the3393data were packaged in some different way---is a high art, in3394mathematics. In a sense {\it every} mathematical equation does this,3395for the ``equal sign'' in the middle of the equation tells us that3396even though the two sides of the equation may seem different, or have3397different shapes, they are nonetheless ``the same data.'' For3398example, the equation33993400$$\log(XY) = \log(X) + \log(Y)$$34013402which we encountered earlier in Chapter~\ref{sec:firstguess}, is3403just two ways of looking at the same thing, yet it was the basis for3404much manual calculation for several centuries.34053406340734083409Now, the problem we have been concentrating on, in this book, has3410been---in effect---to understand the pattern, if we can call it3411that, given by the placement of prime numbers\index{prime number} among the natural3412line-up of all whole numbers.3413\ill{primes_line}{1}{Prime numbers up to $37$}34143415There are, of course, many ways for us to present this basic3416pattern. Our initial strategy was to focus attention on the {\it3417staircase of primes} which gives us a vivid portrait, if you wish,3418of the order of appearance of primes among all numbers.3419\ill{PN_38}{.9}{Prime numbers up to $37$}34203421As we have already hinted in the previous sections, however, there3422are various ways open to us to tinker with---and significantly3423modify---our staircase {\it without losing the essential information3424it contains}. Of course, there is always the danger of modifying3425things in such a way that ``retrieval'' of the original data becomes3426difficult. Moreover, we had better remember every change we have3427made if we are to have any hope of retrieving the original data!34283429With this in mind, let us go back to Chapter~\ref{sec:information}3430(discussing the staircase of primes) and Chapter~\ref{sec:tinkering},3431where we tinkered with the original staircase of primes---alias: the3432graph of $\pi(X)$---to get $\psi(X)$ whose risers look---from3433afar---as if they approximated the 45 degree staircase.% (see Figure~\ref{fig:psi38} below).343434353436At this point we'll do some further carpentry on $\psi(X)$ {\it without destroying the valuable3437information it contains}. We will be replacing $\psi(X)$ by a generalized function, i.e., a3438distribution,\index{distribution} which we denote $\Phi(t)$ that has support at all positive integral multiples3439of logs of prime numbers,\index{prime number} and is zero on the complement of that discrete set. Recall that3440by definition, a discrete subset $S$ of real numbers is the {\bf support} of a function,3441or of a distribution,\index{distribution} if the function vanishes on the complement of $S$ and doesn't vanish3442on the complement of any proper subset of $S$.34433444Given the mission of our book, it may be less important for us to elaborate on the construction3445of $\Phi(t)$ than it is {\bf (a)} to note that $\Phi(t)$ contains all the valuable information3446that $\psi(X)$ has and {\bf (b)} to pay close attention to the spike values of the trigonometric3447series that is the Fourier transform of $\Phi(t)$.34483449For the definition of the distribution\index{distribution} $\Phi(t)$ see the end-note3450\bibnote{3451{\it The construction of $ \Phi(t)$ from $\psi(X)$:}34523453Succinctly, for positive real numbers $t$, $$\Phi(t): \ = \ e^{- t/2}\Psi'(t),$$ where $\Psi(t) = \psi(e^t)$3454(see Figure~\ref{fig:psi_200}), and $\Psi'$ is the derivative of $\Psi(t)$, viewed as a distribution. We extend this function to all real arguments $t$ by requiring $\Phi(t)$ to be an even function of $t$, i.e., $\Phi(-t)= \Phi(t)$.3455But, to review this at a more leisurely pace,3456\begin{enumerate}34573458\item Distort the $X$-axis of our staircase by replacing the3459variable $X$ by $e^t$ to get the function $$\Psi(t):=3460\psi(e^t).$$ No harm is done by this for we can retrieve our3461original $\psi(X)$ as $$\psi(X) = \Psi(\log(X)).$$ Our distorted3462staircase has risers at ($0$ and) all positive integral multiples3463of logs of prime numbers.346434653466% \illtwo{psi_38}{bigPsi_38}{.4}{The two staircases $\psi$ (left) and $\Psi$ (right). Notice that the plot of $\Psi$ on the right is just the plot of $\psi$ but with the $X$-axis plotted on a logarithmic scale.\label{fig:psi38}}346734683469\item Now we'll do something that might seem a bit more brutal: {\it3470take the derivative of this distorted staircase $\Psi(t)$.} This3471derivative $\Psi'(t)$ is a {\it generalized} function with support3472at all nonnegative integral multiples of logs of prime numbers.34733474\vskip20pt3475\ill{bigPsi_prime}{1}{$\Psi'(t)$ is a (weighted) sum of Dirac delta functions at the logarithms of prime powers $p^n$ weighted by $\log(p)$ (and by $\log(2\pi)$ at $0$). The taller the arrow, the larger the weight.}34763477\item Now---for normalization purposes---multiply $\Psi'(t)$ by the3478function $e^{-t/2}$ which has no effect whatsoever on the support.3479\end{enumerate}3480\vskip20pt\ill{psi_200}{.4}{Illustration of the staircase $\psi(X)$ constructed in Chapter~\ref{sec:tinkering} that counts weighted prime powers.\label{fig:psi_200}}34813482%So what happens when we take the derivative---in the sense of3483%distributions---of a complicated staircase? For example, see3484%%Figure~\ref{fig:psi_200}. Well, we would have blip-functions (alias:3485%%Dirac $\delta$-functions) at each point of discontinuity of $\Psi(X)$;3486%that is, at $X=$ any power of a prime.348734883489In summary: The generalized function that resulted from the above carpentry:3490$$\Phi(t) = e^{-t/2}\Psi'(t),$$ retains the information we want (the placement of primes) even if in a slightly different format.3491}.% end bibnote349234933494A distribution\index{distribution} that has a discrete set of real numbers as its3495support---as $\Phi(t)$ does---we sometimes like to call {\bf spike3496distributions}\index{distribution} since the pictures of functions approximating it3497tend to look like a series of spikes.34983499We have then before us a spike distribution\index{distribution} with support at integral3500multiples of logarithms of prime numbers\index{prime number}, and this generalized3501function retains the essential information about the placement of3502prime numbers among all whole numbers, and will be playing a major3503role in our story: knowledge of the placement of the ``blips'' constituting this distribution\index{distribution} (its3504support), being at integral multiples of logs of prime numbers, would allow us to reconstruct the position of the prime3505numbers among all numbers. Of course there are many other ways to package this vital information, so we3506must explain our motivation for subjecting our poor initial staircase3507to the particular series of brutal acts of distortion that we3508described, which ends up with the distribution\index{distribution} $\Phi(t)$.35093510351135123513351435153516\chapter[From primes to the Riemann spectrum]{Going from3517the primes to the Riemann spectrum}3518\label{ch:prime-to-spectrum}\index{Riemann spectrum}3519%\todo{This chapter has not been fully written, there are some3520% mathematical issues that we don't yet explain, and we will need to3521% work out in order to do a satisfactory job here.}3522352335243525We discussed the nature of the Fourier transform of (symmetrized)3526$\delta$-functions\index{$\delta$-function} in Chapter~\ref{sec:ftdelta}. In particular, recall3527the ``spike function'' $$d_x(t) \ = \ (\delta_x(t) +3528\delta_{-x}(t))/2$$ that has support at the points $x$ and $-x$. We3529mentioned that its Fourier transform, ${\hat d}_x(\theta)$, is equal3530to $\cos(x\theta)$ (and gave some hints about why this may be true).353135323533Our next goal is to work with the much more interesting ``spike3534function'' $$\Phi(t) \ = \ e^{- t/2}\Psi'(t),$$ which was one of the3535generalized functions that we engineered in Chapter~\ref{sec:loseno},3536and that has support at all nonnegative integral multiples of3537logarithms of prime numbers\index{prime number}.35383539354035413542As with any function---or generalized function---defined for3543non-negative values of $t$, we can ``symmetrize it'' (about the3544$t$-axis) which means that we can define it on negative real numbers3545by the equation3546$$\Phi(-t) = \Phi(t).$$3547Let us make that convention, thereby turning $\Phi(t)$ into an {\it3548even} generalized function, as illustrated in Figure~\ref{fig:bigPhi}. (An {\bf even} function on the real line is a function that takes the same value on any real number and its negative as in the formula above.)35493550%\vskip20pt{\bf William: sum of Dirac delta functions Figure goes here}\vskip20pt3551\ill{bigPhi}{1}{$\Phi(t)$ is a sum of Dirac delta functions at the logarithms of prime powers $p^n$ weighted by $p^{-n/2}\log(p)$ (and $\log(2\pi)$ at $0$).\label{fig:bigPhi}}355235533554We may want to think of $\Phi(t)$ as a limit of this sequence of distribution\index{distribution}s,3555$$\Phi(t)\ = \ \lim_{C \to {\infty}}\Phi_{\le C}(t)$$3556where $\Phi_{\le C}(t)$ is the following finite linear3557combination of (symmetrized) $\delta$-functions\index{$\delta$-function} $d_x(t)$:3558$$\Phi_{\le C}(t)\quad :=\quad 2\sum_{{\rm prime\ powers \ }p^n \le C} p^{-n/2}\log(p)d_{n\log(p)}(t).$$3559Since the Fourier transform of $d_x(t)$ is $\cos(x\theta)$, the Fourier transform of each $d_{n\log(p)}(t)$ is3560$\cos(n\log(p)\theta)$, so the Fourier transform of3561$\Phi_{\le C}(t)$ is35623563$${\hat \Phi}_{\leq C}(\theta):= 2\sum_{{\rm prime\ powers\ }p^n \leq{} C} p^{-n/2} \log(p)\cos(n\log(p)\theta).$$35643565So, following the discussion in Chapter~\ref{ch:trigseries} above, we are dealing with the cutoffs at finite points $C$ of the {\it trigonometric series}{\footnote{The trigonometric series in the text---whose spectral values are the logarithms of prime powers---may also be written as3566$$\sum_{m=2}^{\infty}\Lambda(m)m^{-s} + \sum_{m=2}^{\infty}\Lambda(m)m^{-{\bar s}}$$ for $s={\frac{1}{2}}+i\theta$, where $\Lambda(m)$ is the von-Mangoldt function.}}35673568$${\hat \Phi}(\theta):= 2\sum_{{\rm prime\ powers\ }p^n } p^{-n/2} \log(p)\cos(n\log(p)\theta).$$35693570357135723573For example, when $C=3$, we have the rather severe cutoff of these trigonometric series: ${\hat \Phi}_{\leq 3}(\theta)$ takes account {\it only} of the primes $p=2$ and $p=3$:3574$$3575{\hat \Phi}_{\leq 3}(\theta) =3576\frac{2}{\sqrt{2}} \log(2)\cos(\log(2)\theta)3577+3578\frac{2}{\sqrt{3}} \log(3)\cos(\log(3)\theta),3579$$3580which we plot in Figure~\ref{fig:theta3}.358135823583\ill{theta_3_intro-1}{.85}{Plot of ${\hat \Phi}_{\leq 3}(\theta)$\label{fig:theta3}}358435853586We will be interested in the values of $\theta$ that correspond to higher and higher {\it peaks} of our trigonometric series ${\hat \Phi}_{\leq C}(\theta)$ as $C\to \infty$. For example, the value of $\theta$ that provides the first peak of $3587{\hat \Phi'}_{\leq 3}(\theta)$ such that $3588|{\hat \Phi'}_{\leq 3}(\theta)| > 2$ is $$\theta=14.135375354\ldots.$$35893590So in Figure~\ref{fig:theta3} we begin this exploration by plotting ${\hat \Phi}_{\leq 3}(\theta)$, together with its derivative, highlighting the zeroes of the derivative.359135923593\ill{theta_3_intro-2}{.85}{Plot of ${\hat \Phi}_{\leq 3}(\theta)$ in blue and its derivative in grey\label{fig:theta4}}35943595As we shall see in subsequent figures of this chapter, there seems to be an eventual convergence of the values of $\theta$ that correspond to higher and higher {\it peaks}, the red dots in the figure above, as $C$ tends to $\infty$. These ``limit" $\theta$-values we now insert as the endpoints of red vertical lines into Figure~\ref{fig:theta5} comparing them with the red dots for our humble cutoff $C=3$.3596359735983599\ill{theta_C-3}{0.9}{${\hat \Phi}_{\leq 3}(\theta)$ \label{fig:theta5}}3600360136023603We give a further sample of graphs for a few higher cutoff values $C$ (introducing a few more primes into the game!).36043605Figures~\ref{fig:theta_C-5-10}--\ref{fig:theta_C-200-500}3606contain graphs of various cutoffs of ${\hat \Phi}_{\leq C}(\theta)$.3607As $C$ increases a sequence of spikes down emerge3608which we indicate with red vertical arrows.3609361036113612\illtwo{theta_C-5}{theta_C-10}{0.45}{Plot of ${\hat \Phi}_{\leq C}(\theta)$ for $C=5$ and $10$\label{fig:theta_C-5-10}}36133614Given the numerical-experimental approach we have been adopting in this book, it is a particularly fortunate (and to us: surprising) thing that the convergence to those vertical red lines can already be illustrated using {\it such small} cutoff values $C$. One might almost imagine making hand computations that exhibit this phenomenon! Following in this spirit, see David Mumford's blog post {\url{http://www.dam.brown.edu/people/mumford/blog/2014/RiemannZeta.html}}.36153616To continue:36173618\illtwo{theta_C-20}{theta_C-100}{0.45}{Plot of ${\hat \Phi}_{\leq C}(\theta)$ for $C=10$ and $100$\label{fig:theta_C-10-100}}36193620\illtwo{theta_C-200}{theta_C-500}{0.45}{Plot of ${\hat \Phi}_{\leq C}(\theta)$ for $C=200$ and $500$\label{fig:theta_C-200-500}}362136223623For a theoretical discussion of these spikes, see endnote~\bibnote{3624A version of the Riemann--von Mangoldt explicit formula gives some theoretical affirmation of the phenomena we are seeing here. We thank Andrew Granville for a discussion about this.36253626\ill{granville}{0.25}{Andrew Granville}362736283629Even though the present endnote is not the place to give anything like a full account, we can't resist setting down a few of Granville's comments that might be helpful to people who wish to go further. (This discussion can be modified to analyze what happens unconditionally, but we will be assuming the \RH{} below.) The function ${\hat \Phi}_{\le C}(\theta)$ that we are graphing in this chapter can be written as:3630$${\hat \Phi}_{\le C}(\theta)= \sum_{n\le C}\Lambda(n)n^{-w}$$ where $w = {\frac{1}{2}}+i\theta$. This function, in turn, may be written (by Perron's formula) as3631\begin{align*}3632&{\frac{1}{2\pi i }}\lim_{T \to \infty}\int_{s=\sigma_o-iT}^{s=\sigma_o+iT}\sum_{n}\Lambda(n)n^{-w}\left({\frac {C}{n}}\right)^{s}{\frac{ds}{s}}\\3633&= {\frac{1}{2\pi i }}\lim_{T \to \infty}\int_{s=\sigma_o-iT}^{s=\sigma_o+iT}\sum_{n}\Lambda(n)n^{-w-s}C^{s}{\frac{ds}{s}}\\3634&= -{\frac{1}{2\pi i }}\lim_{T \to \infty}\int_{s=\sigma_o-iT}^{s=\sigma_o+iT}\left({\frac{\zeta'}{\zeta}}\right)(w+s){\frac{C^{s}}{s}}ds. \end{align*}36353636Here, we assume that $\sigma_o$ is sufficiently large and $C$ is not a prime power.36373638One proceeds, as is standard in the derivation of Explicit Formulae, by moving the line of integration to the left, picking up residues along the way. Fix the value of $w= {\frac{1}{2}}+i\theta$ and consider3639$$K_w(s,C):=\ \ {\frac{1}{2\pi i }}\left(\frac{\zeta'}{\zeta}\right)(w+s){\frac{C^{s}}{s}},$$ which has poles at $$s= 0, \ \ \ 1-w,\ \ \ {\rm and }\ \ \ \rho-w, $$ for every zero $\rho$ of $\zeta(s)$.3640We distinguish five cases, giving descriptive names to each:36413642\begin{enumerate} \item {\it Singular pole:} $s= 1-w$.3643\item {\it Trivial poles:} $s= \rho-w$ with $\rho$ a trivial zero of $\zeta(s)$.3644\item {\it Oscillatory poles:} $s= \rho-w = i(\gamma-\theta) \ne 0$ with $\rho= 1/2 + i \gamma (\ne w)$ a nontrivial zero of $\zeta(s)$. (Recall that we are assuming the \RH{}, and our variable $w= {\frac{1}{2}}+i\theta$ runs through complex numbers of real part equal to ${\frac{1}{2}}$. So, in this case, $s$ is purely imaginary.)3645\item {\it Elementary pole:} $s=0$ when $ w$ is not a nontrivial zero of $\zeta(s)$---i.e., when $0=s\ne\rho-w$ for any nontrivial zero $\rho$.3646\item {\it Double pole:} $s=0$ when $ w$ is a nontrivial zero of $\zeta(s)$---i.e., when $0=s=\rho-w$ for some nontrivial zero $\rho$. This, when it occurs, is indeed a double pole, and the residue is given by $m\cdot \log C +\epsilon$. Here $m$ is the multiplicity of the zero $\rho$ (which we expect always---or at least usually---to be equal to $1$) and $\epsilon$ is a constant (depending on $\rho$, but not on $C$).36473648\end{enumerate}36493650The standard technique for the ``Explicit formula'' will provide us with a formula for our function of interest ${\hat \Phi}_{\le C}(\theta)$. The formula has terms resulting from the residues of each of the first three types of poles, and of the {\it Elementary} or the {\it Double} pole---whichever exists. Here is the shape of the formula, given with terminology that is meant to be evocative:365136523653$$ {\bf(1)}\ \ \ {\hat \Phi}_{\le C}(\theta) = {\Sing}_{\le C}(\theta) + {\Triv}_{\le C}(\theta) + {\Osc}_{\le C}(\theta) + {\Elem}_{\le C}(\theta)$$36543655Or:36563657$${\bf(2)}\ \ \ {\hat \Phi}_{\le C}(\theta) = {\Sing}_{\le C}(\theta) + {\Triv}_{\le C}(\theta) + {\Osc}_{\le C}(\theta) + {\Double}_{\le C}(\theta),$$36583659\noindent the first if $w$ is not a nontrivial zero of $\zeta(s)$ and the second if it is.36603661The good news is that the functions ${\Sing}_{\le C}(\theta), {\Triv}_{\le C}(\theta)$ (and also ${\Elem}_{\le C}(\theta)$ when it exists) are smooth (easily describable) functions of the two variables $C$ and $\theta$; for us, this means that they are3662not that related to the essential information-laden {\it discontinuous} structure of $ {\hat \Phi}_{\le C}(\theta)$. Let us bunch these three contributions together and call the sum ${\Smooth}(C,\theta)$, and rewrite the above two formulae as:3663$$ {\bf(1)}\ \ \ {\hat \Phi}_{\le C}(\theta) ={\Smooth}(C,\theta) + {\Osc}_{\le C}(\theta) $$36643665Or:36663667$${\bf(2)}\ \ \ {\hat \Phi}_{\le C}(\theta) = {\Smooth}(C,\theta)+ {\Osc}_{\le C}(\theta) + m\cdot \log C +\epsilon,$$36683669depending upon whether or not $w$ is a nontrivial zero of $\zeta(s)$.367036713672We now focus our attention on the Oscillatory term, ${\Osc}_{\le C}(\theta) $, approximating it by a truncation: $$\Osc_w(C,X):= 2\sum_{|\gamma| <3673X} {{\frac{e^{i\log C\cdot (\gamma-\theta)}}{i(\gamma-\theta)}}}.$$3674Here if a zero has multiplicity $m$, then we count it $m$ times in the sum.3675Also, in this formula we have relegated the ``$\theta$" to the status of a subscript3676(i.e., $w = {\frac{1}{2}} +i\theta$) since we are keeping it constant, and we view the3677two variables $X$ and $C$ as linked in that we want the cutoff ``$X$" to be sufficiently3678large, say $X \gg C^2$, so that the error term can be controlled.36793680At this point, we can perform a ``multiplicative version'' of a Ces\`aro summation---i.e., the operator $F(c) \mapsto ({\Ces} F)(C):= \int_1^CF(c) dc/c.$3681This has the effect of forcing the oscillatory term to be bounded as $C$ tends to infinity.36823683This implies that for any fixed $\theta$,3684\begin{itemize}3685\item ${\Ces}{\hat \Phi}_{\le C}(\theta)$ is bounded independent of $C$ if $\theta$ is not the imaginary part of a nontrivial zero of $\zeta(s)$ and3686\item ${\Ces}{\hat \Phi}_{\le C}(\theta)$ grows as ${\frac{m}{2}}\cdot (\log C)^2 +O(\log C)$ if $\theta$ is the imaginary part of a nontrivial zero of $\zeta(s)$ of multiplicity $m$,3687% WARNING: above item is wrong; grows as should be different: ``we replace "m log(C)+O(1)" by "m/2 log(C)^2 + m gamma log(C)"?''3688\end{itemize}3689giving a theoretical confirmation of the striking feature of the3690graphs of our Chapter~\ref{ch:prime-to-spectrum}.}.369136923693% \ill{phihat_even_all}{1}{Plots of $-\frac{1}{2}{\hat \Phi}_{\le C}(\theta)$ for $C=5$ (top), 20, 50, and 500 (bottom).\label{fig:phic}}3694369536963697%\vskip20pt3698%{\bf William: Figure goes here}3699%\vskip20pt3700% \ill{phihat_even_all}{1}{Plots of $-\frac{1}{2}{\hat \Phi}_{\le C}(\theta)$ for $C=5$ (top), 20, 50, and 500 (bottom).\label{fig:phic}}37013702%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%37033704The $\theta$-coordinates of these spikes {\it seem to be}3705vaguely clustered about a discrete set of positive real numbers.3706%In3707%fact, this is already a hint of what happens to the graphs of these3708%functions ${\frac{1}{C}}{\hat\Phi}_{\le C}(\theta)$ as we allow the cut-off $C$ to3709%tend to infinity.3710These ``spikes" are our first glimpse of a3711certain infinite set of positive real numbers3712$$\theta_1,\theta_2,\theta_3,\dots$$ which constitute the {\bf Riemann spectrum\index{Riemann spectrum} of3713the primes}. If the \RH{} holds, these numbers would3714be the key to the placement of primes on the number line.371537163717371837193720%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3721372237233724By tabulating these peaks we compute---at least very approximately---$\dots$3725\begin{eqnarray*}3726\theta_1 &=& 14.134725 \dots\\3727\theta_2 &=& 21.022039 \dots\\3728\theta_3 &=& 25.010857 \dots\\3729\theta_4 &=& 30.424876 \dots\\3730\theta_5 &=& 32.935061 \dots\\3731\theta_6 &=& 37.586178 \dots3732\end{eqnarray*}37333734Riemann\index{Riemann, Bernhard} defined this sequence of numbers in his 1859 article in a3735manner somewhat different from the treatment we have given. In that3736article these $\theta_i$ appear as ``imaginary parts of the3737nontrivial zeroes\index{nontrivial zeroes} of his zeta function;''\index{zeta function} we will discuss this3738briefly in Part \ref{part4}, Chapter~\ref{ch:envision} below.37393740374137423743\chapter{How many $\theta_i$'s are there?}37443745The Riemann spectrum\index{Riemann spectrum}, $\theta_1, \theta_2, \theta_3, \dots$ clearly deserves to be well understood! What do we know about this sequence of positive real numbers?37463747Just as we did with the prime numbers\index{prime number} before, we can count3748these numbers $\theta_1=14.1347\ldots$,3749$\theta_2=21.0220\ldots$, etc., and form the staircase3750of Figure~\ref{fig:staircase-riemann-spectrum-30},3751with a step up at $\theta_1$, a step up at $\theta_2$, etc.37523753\ill{staircase-riemann-spectrum-30}{0.8}{The staircase of the Riemann spectrum\index{Riemann spectrum}\label{fig:staircase-riemann-spectrum-30}}37543755Again, just as with the staircase of primes, we might hope3756that as we plot this staircase from a distance as in3757Figures~\ref{fig:staircase-riemann-spectrum-50} and \ref{fig:staircase-riemann-spectrum-1000} that it will3758look like a beautiful smooth curve.375937603761In fact, we know, conditional on RH, the staircase of real numbers $\theta_1, \theta_2, \theta_3,\dots$ is very closely approximated by the3762curve $${\frac{T}{2\pi}}\log {\frac{T}{2\pi e}},$$ (the error term being bounded by a constant times $\log T$).3763376437653766\illtwo{staircase-riemann-spectrum-50}{staircase-riemann-spectrum-100}{0.45}{The staircase of the Riemann spectrum\index{Riemann spectrum} and the curve ${\frac{T}{2\pi}}\log {\frac{T}{2\pi e}}$\label{fig:staircase-riemann-spectrum-50}}37673768\ill{staircase-riemann-spectrum-1000}{0.95}{The staircase of the Riemann spectrum\index{Riemann spectrum} looks like a smooth curve\label{fig:staircase-riemann-spectrum-1000}}376937703771377237733774Nowadays, these mysterious numbers $\theta_i$, these spectral lines for the3775staircase of primes, are known in great abundance and to great accuracy. Here is the smallest3776one, $\theta_1$, given with over $1{,}000$ digits of its decimal3777expansion:37783779\vskip20pt3780{\small3781$14.134725141734693790457251983562470270784257115699243175685567460149\newline37829634298092567649490103931715610127792029715487974367661426914698822545\newline37838250536323944713778041338123720597054962195586586020055556672583601077\newline37843700205410982661507542780517442591306254481978651072304938725629738321\newline37855774203952157256748093321400349904680343462673144209203773854871413783\newline37861735639699536542811307968053149168852906782082298049264338666734623320\newline37870787587617920056048680543568014444246510655975686659032286865105448594\newline37884432062407272703209427452221304874872092412385141835146054279015244783\newline37893835425453344004487936806761697300819000731393854983736215013045167269\newline37906838920039176285123212854220523969133425832275335164060169763527563758\newline37919695376749203361272092599917304270756830879511844534891800863008264831\newline37922516911271068291052375961797743181517071354531677549515382893784903647\newline37934709727019948485532209253574357909226125247736595518016975233461213977\newline37943160053541259267474557258778014726098308089786007125320875093959979666\newline379560675378381214891908864977277554420656532052405$}37963797\vskip20pt3798379938003801\noindent{}and if, by any chance, you wish to peruse the first3802$2{,}001{,}052$3803of these $\theta_i$'s calculated to an accuracy3804within $3\cdot 10^{-9}$, consult Andrew Odlyzko's tables:3805\begin{center}3806\url{http://www.dtc.umn.edu/~odlyzko/zeta_tables}3807\end{center}3808380938103811\chapter{Further questions about the Riemann spectrum\index{Riemann spectrum}}38123813Since people have already computed\footnote{See \url{http://numbers.computation.free.fr/Constants/constants.html} for details.}3814the first $10$ trillion $\theta$'s3815and have never found one with3816multiplicity $>1$, it is generally expected that the multiplicity of all3817the $\theta$'s in the Riemann spectrum\index{Riemann spectrum} is $1$.38183819But, independent of3820that expectation, our convention in what follows will be that we {\it3821count} each of the elements in the Riemann spectrum\index{Riemann spectrum} repeated as many3822times as their multiplicity. So, if it so happens that $\theta_n$3823occurs with multiplicity two, we view the Riemann spectrum\index{Riemann spectrum} as being3824the series of numbers3825$$\theta_1, \theta_2, \dots, \theta_{n-1}, \theta_n, \theta_n, \theta_{n+1}, \dots$$38263827It has been conjectured that there are no infinite arithmetic progressions among these numbers. More broadly, one might expect that there is no visible correlation between the $\theta_i$'s and translation, i.e., that the distribution\index{distribution} of $\theta_i$'s modulo any positive number $T$ is random, as in Figure~\ref{fig:zero-spacing}.3828382938303831\illtwo{zero-spacing-mod2pi}{zero-spacing-mod1}{0.4}{Frequency histogram of Odlyzko's computation of the Riemann spectrum\index{Riemann spectrum} modulo $2\pi$ (left) and modulo 1 (right)\label{fig:zero-spacing}}38323833In analogy with the discussion of prime gaps in Chapter~\ref{ch:further}, we might compute {\it pair correlation functions} and the statistics of gaps between successive $\theta$'s in the Riemann spectrum\index{Riemann spectrum} (see {\url{http://www.baruch.cuny.edu/math/Riemann_Hypothesis/zeta.zero.spacing.pdf}). This study was begun by H.\thinspace{}L. Montgomery and3834F.\thinspace{}J. Dyson.3835As Dyson noticed, the distributions\index{distribution} one gets from the Riemann spectrum\index{Riemann spectrum} bears a similarity to the distributions of eigenvalues of a random unitary matrix.{\footnote{Any connection of the Riemann spectrum to eigenvalues of matrices would indeed be exciting for our understanding of the \RH{}, in view of what is known as the {\it Hilbert-P{\'o}lya Conjecture} (see {\url{http://en.wikipedia.org/wiki/Hilbert\%E2\%80\%93P\%C3\%B3lya_conjecture}}).}} This has given rise to what is know as the {\it random matrix heuristics}, a powerful source of conjectures for number theory and other branches of mathematics.38363837Here is a histogram of the distribution\index{distribution} of differences $\theta_{i+1}-\theta_i$:3838\383938403841\ill{riemann_spectrum_gaps}{0.95}{Frequency histogram of the3842first 99{,}999 gaps in the Riemann spectrum\label{fig:riemann_spectrum_gaps}}38433844\chapter[From the Riemann spectrum to primes]{Going3845from the Riemann spectrum to the primes}\index{Riemann spectrum}38463847To justify the name {\it Riemann spectrum of primes} we will3848investigate graphically whether in an analogous manner we can use this3849spectrum to get information about the placement of prime numbers\index{prime number}. We3850might ask, for example, if there is a trigonometric function with3851frequencies given by this collection of real3852numbers, $$\theta_1,\theta_2,\theta_3,\dots$$ that somehow pinpoints3853the prime powers, just as our functions $${\hat \Phi}(\theta)_{\le C}=38542\sum_{{\rm prime\ powers\ } p^n \le C}3855p^{-m/2}\log(p)\cos(n\log(p)\theta)$$ for large $C$ pinpoint the3856spectrum\index{spectrum} (as discussed in the previous two chapters).38573858To start the return game, consider this sequence of trigonometric functions that have ($zero$ and) the $\theta_i$ as spectrum\index{spectrum}38593860$$G_C(x):= 1+ \sum_{i < C}\cos(\theta_i\cdot x).$$38613862As we'll see presently it is best to view these functions on a logarithmic scale so we will make the substitution of variables $x = \log(s)$ and write38633864$$H_C(s):= G_C(\log(s))= 1+ \sum_{i < C}\cos(\theta_i\cdot \log(s)).$$386538663867The theoretical story behind the phenomena that we will3868see graphically in this chapter is a manifestation of3869Riemann's explicit formula. For modern text references that discuss this general subject, see endnote \bibnote{ A reference for this is:38703871\vskip10pt3872{\bf [I-K]: }H. Iwaniec; E. Kowalski, {\bf Analytic Number Theory}, {\em American Mathematical Society Colloquium Publications} {\bf 53} (2004).38733874(See also the bibliography there.)3875\vskip10pt3876Many3877ways of seeing the explicit relationship are given in Chapter 53878of {\bf [I-K]}. For example, consider Exercise 5 on page 109.38793880$$\sum_{\rho}{\hat \phi}(\rho) = -\sum_{n \ge 1}\Lambda(n)\phi(n) + I(\phi),$$38813882where \begin{itemize} \item $\phi$ is any smooth complex-valued3883function on $[1,+\infty)$ with compact support,\item ${\hat \phi}$ is3884its Mellin transform $${\hat \phi}(s):=3885\int_0^{\infty}\phi(x)x^{s-1}dx,$$ \item the last term on the right,3886$I(\phi)$, is just3887$$I(\phi):= \int_1^{\infty}(1-{\frac{1}{x^3-x}})\phi(x)dx$$ (coming from the pole at $s=1$ and the ``trivial zeroes''). \item The more serious summation on the left hand side of the equation is over the nontrivial zeroes $\rho$, noting that if $\rho$ is a nontrivial zero so is~${\bar \rho}$. \end{itemize}3888388938903891Of course, this ``explicit formulation'' is not {\it immediately}3892applicable to the graphs we are constructing since we cannot naively3893take ${\hat \phi}$ to be a function forcing the left hand side to be3894$G_C(x)$.38953896See also Exercise 7 on page 112, which discusses the sums3897$$3898x - \sum_{|\theta| \le C} {\frac{x^{{\frac{1}{2}} +i\theta}-1}{{\frac{1}{2}} +i\theta}}.3899$$}390039013902\ill{phi_cos_sum_2_30_1000}{.8}{Illustration of $-\sum_{i=1}^{1000}3903\cos(\log(s)\theta_i)$, where $\theta_1 \sim 14.13, \ldots$ are the3904first $1000$ contributions to the Riemann spectrum\index{Riemann spectrum}. The red dots3905are at the prime powers $p^n$, whose size is proportional to3906$\log(p)$.}39073908\ill{phi_cos_sum_26_34_1000}{.8}{Illustration of $-\sum_{i=1}^{1000}3909\cos(\log(s)\theta_i)$ in the neighborhood of a twin prime. Notice3910how the two primes $29$ and $31$ are separated out by the Fourier3911series, and how the prime powers $3^3$ and $2^5$ also appear.}39123913\ill{phi_cos_sum_1010_1026_15000}{.7}{Fourier series from $1,000$ to3914$1,030$ using 15,000 of the numbers $\theta_i$. Note the twin3915primes $1{,}019$ and $1{,}021$ and that $1{,}024=2^{10}$.}391639173918\part{Back to Riemann\label{part4}}391939203921\chapter[Building $\pi(X)$ from the spectrum]{How to build $\pi(X)$ from the spectrum (Riemann's way)}3922We have been dealing in Part~\ref{part3} of our book with $\Phi(t)$, a3923distribution\index{distribution} that---we said---contains all the essential information3924about the placement of primes among numbers. We have given a clean3925restatement of Riemann's hypothesis, the third restatement so far,3926in term of this $\Phi(t)$. But $\Phi(t)$ was the effect of a series3927of recalibrations and reconfigurings of the original untampered-with3928staircase of primes. A test of whether we have strayed from our3929original problem---to understand this staircase---would be whether3930we can return to the original staircase, and ``reconstruct it'' so to3931speak, solely from the information of $\Phi(t)$---or equivalently,3932assuming the \RH{} as formulated in Chapter~\ref{sec:tinkering}---can3933we construct the staircase of primes $\pi(X)$ solely3934from knowledge of the sequence of real numbers $\theta_1,3935\theta_2,\theta_3,\dots$?39363937393839393940The answer to this is yes (given the \RH{}), and is discussed very3941beautifully by Bernhard Riemann\index{Riemann, Bernhard} himself in his famous 1859 article.39423943Bernhard Riemann used the spectrum\index{spectrum} of the prime numbers\index{prime number} to provide3944an exact analytic formula that analyzes and/or synthesizes the3945staircase of primes. This formula is motivated by Fourier's3946analysis of functions as constituted out of cosines.3947%Riemann\index{Riemann, Bernhard} started3948%with a specific smooth function, which we will refer to as $R(X)$, a3949% function that Riemann offered, just as Gauss\index{Gauss, Carl Friedrich} offered his $\Li(X)$,3950% as a candidate smooth function approximating the staircase of3951%primes.3952Recall from Chapter~\ref{sec:rh1} that Gauss's guess is3953$\Li(X)= \int_2^{X}dt/{\rm log}(t).$ To continue this discussion, we do need some familiarity with complex numbers, for the definition of Riemann's\index{Riemann, Bernhard} exact formula3954requires extending3955the definition of the function $\Li(X)$ to make sense for3956complex numbers $X=a+bi$. In fact, more naturally, one might work with the path integral $\li(X):= \int_0^{X}dt/{\rm log}(t).$39573958Riemann \index{Riemann, Bernhard} begins his discussion3959(see Figure~\ref{fig:riemann_RX}) by defining3960$$3961R(X) = \sum_{n=1}^{\infty}{\mu(n)\over n} \li(X^{1\over n})= \lim_{N\to \infty} R^{(N)}(X):= \lim_{N\to \infty} \sum_{n=1}^{N}{\mu(n)\over n} \li(X^{1\over n}),3962$$3963where $R^{(N)}(X)$ denotes the truncated sum, which one can compute as an approximation.3964\ill{riemann_RX}{1}{Riemann\index{Riemann, Bernhard} defining $R(X)$ in his manuscript\label{fig:riemann_RX}}3965In all the discussion of this section the order of summation is3966important. For such considerations and issues regarding actual3967computation we refer to Riesel-Gohl (see \url{http://wstein.org/rh/rg.pdf}).39683969Here $\mu(n)$ is the M\"{o}bius function\index{M\"{o}bius function}3970which is defined by3971$$3972\mu(n) = \begin{cases}39731 &3974\mbox{\begin{minipage}{0.6\textwidth}if $n$ is a square-free3975positive integer with an even number of distinct prime3976factors,\end{minipage}}\vspace{1em}\\3977-1& \mbox{\begin{minipage}{0.6\textwidth}if $n$ is a square-free3978positive integer with an odd number of distinct3979prime factors,\end{minipage}}\vspace{1em}\\39800 & \mbox{if $n$ is not square-free.}3981\end{cases}3982$$3983See Figure~\ref{fig:moebius} for a plot of the M\"{o}bius function.\index{M\"{o}bius function}39843985\ill{moebius}{1}{The blue dots plot the values of the M\"{o}bius function\index{M\"{o}bius function} $\mu(n)$, which is only defined at integers.\label{fig:moebius}}398639873988In Chapter~\ref{sec:pnt} we encountered the Prime Number Theorem,\index{prime number} which3989asserts that $X/\log(X)$ and $\Li(X)$ are both approximations for3990$\pi(X)$, in the sense that both go to infinity at the same rate. That is, the ratio of any two of these three functions tends to $1$ as $X$ goes to $\infty$.3991Our first formulation of the \RH{} (see page~\pageref{rh:first}) was3992that $\Li(X)$ is an essentially square root accurate approximation of3993$\pi(X)$. Figures~\ref{fig:guess100}--\ref{fig:guess10000} illustrate3994that Riemann's function $R(X)$ appears to be an even better3995approximation to $\pi(X)$ than anything we have seen before.39963997\illtwo{pi_riemann_gauss_100}{pi_riemann_gauss_1000}{0.47}{Comparisons of $\Li(X)$ (top), $\pi(X)$ (middle), and $R(X)$ (bottom, computed using 100 terms)\label{fig:guess100}}399839994000\ill{pi_riemann_gauss_10000-11000}{0.8}{Closeup comparison of $\Li(X)$ (top), $\pi(X)$ (middle), and $R(X)$ (bottom, computed using 100 terms)\label{fig:guess10000}}400140024003Think of Riemann's smooth curve $R(X)$ as the {\em fundamental}4004approximation to $\pi(X)$.4005Riemann\index{Riemann, Bernhard} offered much more than just a (conjecturally) better4006approximation to $\pi(X)$ in his wonderful 1859 article4007(see Figure~\ref{fig:riemann_Rk}).4008He found a way to construct what looks vaguely like a Fourier series,4009but with $\sin(X)$ replaced by $R(X)$ and its spectrum\index{spectrum} the $\theta_i$, which4010conjecturally equals $\pi(X)$ (with a slight correction if the number $X$ is itself a prime).4011\ill{riemann_Rk}{1}{Riemann's\index{Riemann, Bernhard} analytic formula for $\pi(X)$\label{fig:riemann_Rk}}4012In this manner, Riemann gave an infinite sequence of improved guesses, beginning with $R_0(X)$ (see equation (18) of Riesel-Gohl at \url{http://wstein.org/rh/rg.pdf}) a modification of $R(X)$ that takes account of the pole and the trivial zeroes of the Riemann zeta-function, and then considered a sequence:401340144015$$4016R_0(X),\quad R_1(X), \quad R_2(X), \quad R_3(X), \quad \ldots4017$$4018and he hypothesized that one and all of them were all4019essentially square root approximations to $\pi(X)$,4020and that the sequence of these better and better approximations converge to give an exact formula4021for $\pi(X)$.40224023Thus not only did Riemann\index{Riemann, Bernhard} provide a ``fundamental'' (that is, a smooth curve4024that is astoundingly close to $\pi(X)$) but he viewed this as just a4025starting point, for he gave the recipe for providing an infinite4026sequence of corrective terms---call them Riemann's {\em harmonics}\index{harmonics}; we4027will denote the first of these ``harmonics'' $C_1(X)$, the second4028$C_2(X)$, etc. Riemann\index{Riemann, Bernhard} gets his first corrected curve, $R_1(X)$, from4029$R_0(X)$ by adding this first harmonic to the fundamental, $$R_1(X) =4030R_0(X) + C_1(X),$$ he gets the second by correcting $R_1(X)$ by adding4031the second harmonic $$R_2(X) = R_1 (X) + C_2(X),$$ and so on $$R_3(X)4032= R_2 (X) + C_3(X),$$ and in the limit provides us with an exact fit.403340344035The \RH{}, if true, would tell us that these correction4036terms $C_1(X), C_2(X), C_3(X),\dots$ are all {\em square-root small}4037%,4038%and all the successively corrected smooth curves $$R_0(X), R_1(X),4039%R_2(X), R_3(X),\dots$$ are good approximations to $\pi(X)$.4040%Moreover,4041%$$4042%\pi(X) = R_0(X) + \sum_{k=1}^{\infty} C_k(X).4043%$$40444045The elegance of Riemann's\index{Riemann, Bernhard} treatment of this problem is that the4046corrective terms $C_k(X)$ are all {\em modeled on} the fundamental4047$R(X)$ and are completely described if you know the sequence of real4048numbers $\theta_1, \theta_2, \theta_3,\dots$ of the last section.404940504051Assuming the \RH{}, the Riemann\index{Riemann, Bernhard} correction4052terms $C_k(X)$ are defined to be4053$$4054C_k(X)= -R(X^{\frac{1}{2} + i\theta_k}) -R(X^{\frac{1}{2} - i\theta_k}),4055$$4056where $\theta_1 = 14.134725\dots, \theta_2 = 21.022039\dots$, etc.,4057is the spectrum\index{spectrum} of the prime numbers\index{prime number} \bibnote{You may well ask how we propose to order these correction terms if RH4058is false. Order them in terms of4059(the absolute value of) their imaginary part, and in the unlikely4060situation that there is more than one zero with the same imaginary4061part, order zeroes of the same imaginary part by their real parts,4062going from right to left.}.40634064In sum, Riemann\index{Riemann, Bernhard} provided an extraordinary recipe that allows us to work4065out the harmonics\index{harmonics}, $$C_1(X),\quad C_2(X),\quad C_3(X),\quad \dots$$ without our having4066to consult, or compute with, the actual staircase of primes. As with4067Fourier's modus operandi where both {\em fundamental} and all {\em4068harmonics}\index{harmonics} are modeled on the sine wave, but appropriately4069calibrated, Riemann\index{Riemann, Bernhard} fashioned his higher harmonics\index{harmonics}, modeling them all4070on a single function, namely $R(X)$.40714072The convergence of $R_k(X)$ to $\pi(X)$ is strikingly illustrated4073in the plots in Figures~\ref{fig:rkfirst}--\ref{fig:rklast} of $R_k$ for various values of $k$.407440754076\ill{Rk_1_2_100}{.9}{The function $R_{1}$ approximating the staircase of primes up to $100$\label{fig:rkfirst}}40774078\ill{Rk_10_2_100}{.9}{The function $R_{10}$ approximating the staircase of primes up to $100$}40794080\ill{Rk_25_2_100}{.9}{The function $R_{25}$ approximating the staircase of primes up to $100$}40814082\ill{Rk_50_2_100}{.9}{The function $R_{50}$ approximating the staircase of primes up to $100$}408340844085\ill{Rk_50_2_500}{.9}{The function $R_{50}$ approximating the staircase of primes up to $500$}40864087\ill{Rk_50_350_400}{.9}{The function $\Li(X)$ (top, green), the function $R_{50}(X)$ (in blue), and the staircase of primes on the interval from 350 to 400.\label{fig:rklast}}408840894090\chapter[As Riemann envisioned it]{As Riemann envisioned it, the zeta function relates the staircase of primes to its Riemann\index{Riemann, Bernhard} spectrum\index{Riemann spectrum}\label{ch:envision}}4091In the previous chapters we have described---using Riemann's4092Hypothesis---how to obtain the {\it4093spectrum}\index{spectrum} $$\theta_1,\theta_2,\theta_3,\dots$$ from the staircase of4094primes, and hinted at how to go back. Roughly speaking, we were4095performing ``Fourier transformations'' to make this transit. But4096Riemann\index{Riemann, Bernhard}, on the very first page of his 1859 memoir, construes the4097relationship we have been discussing, between spectrum\index{spectrum} and staircase,4098in an even more profound way.40994100To talk about this extraordinary insight of Riemann\index{Riemann, Bernhard}, one would need to4101do two things that might seem remote from our topic, given our4102discussion so far.41034104\begin{itemize} \item We will discuss a key idea that Leonhard Euler\index{Euler, Leonhard}4105had (circa 1740).4106\item To follow the evolution of this idea in the hands of Riemann\index{Riemann, Bernhard}, we would have to assume familiarity with basic complex analysis.4107\end{itemize}41084109We will say only a few words here about this, in hopes of giving at4110least a shred of a hint of how marvelous Riemann's idea is. We will be drawing, at this point, on some further mathematical background. For4111readers who wish to pursue the themes we discuss, here is a list of sources,4112that are our favorites among those meant to be read by a somewhat4113broader audience than people very advanced in the subject. We list4114them in order of ``required background.''41154116\begin{enumerate}4117\item John Derbyshire's {\it Prime Obsession: Bernhard Riemann and4118the Greatest Unsolved Problem in Mathematics} (2003). We have already4119mentioned this book in our foreword, but feel that it is so4120good, that it is worth a second mention here.4121\item The Wikipedia entry for Riemann's Zeta Function\index{zeta function}4122(\url{http://en.wikipedia.org/wiki/Riemann\_zeta\_function}). It is4123difficult to summarize who wrote this, but we feel that it is a4124gift to the community in its clarity. Thanks authors!4125\item Enrico Bombieri's article \bibnote{Bombieri, {\em The Riemann4126Hypothesis}, available at4127\url{http://www.claymath.org/sites/default/files/official_problem_description.pdf}}.4128To comprehend all ten pages of this excellent and fairly thorough4129account may require significant background, but try your hand at4130it; no matter where you stop, you will have seen good things in4131what you have read.4132\end{enumerate}4133\vskip20pt4134{\bf Leonhard Euler's\index{Euler, Leonhard} idea ($\simeq$1740):}4135As readers of Jacob Bernoulli's {\it Ars Conjectandi} (or of the works4136of John Wallis) know, there was in the early 18th century4137already a rich mathematical theory of4138finite sums of (non-negative $k$-th powers) of consecutive integers.4139This sum, $$S_k(N) = 1^k+2^k+3^k+\cdots +(N-1)^k,$$ is a polynomial4140in $N$ of degree $k+1$ with no constant term, a leading term equal to4141${1\over k+1}N^{k+1}$, and a famous linear term. The coefficient of4142the linear term of the polynomial $S_k(N)$ is the {\it Bernoulli4143number} $B_{k}$:4144\begin{align*}4145S_1(N) &= 1+2+3+ \dots + (N-1) \ = \ {N(N-1)\over 2} \ = \ {N^2\over 2}\ -\ {1\over 2}\cdot N,\\4146S_2(N) &= 1^2+2^2+3^2+ \dots + (N-1)^2 \ = \ {N^3\over 3} +\cdots -\ {1\over 6}\cdot N,\\4147S_3(N) &= 1^3+2^3+3^3+ \dots + (N-1)^3 \ = \ {N^4\over 4} +\cdots -\ 0\cdot N,\\4148S_4(N) &= 1^4+2^4+3^4+ \dots + (N-1)^4 \ = \ {N^5\over 5} +\cdots -\ {1\over 30}\cdot N,4149\end{align*}4150etc.4151For odd integers $k4152> 1$ this linear term vanishes. For even integers $2k$ the Bernoulli number4153$B_{2k}$ is the rational number given by the coefficient4154of $\frac{x^{2k}}{(2k)!}$ in the power series expansion4155\ \newline \bigskip \ \newline4156$${x\over e^x-1} \ = \ 1 - {x\over 2} + \sum_{k=1}^{\infty}4157(-1)^{k+1}B_{2k}{x^{2k}\over{(2k)}!}.$$4158\ \newline \bigskip \ \newline4159So4160$$ B_2 = {1\over 6},\ \ \ \ \ \ \ \ B_4 = {1\over 30},\ \ \ \ \ \ \ \ B_6 = {1\over 42},\ \ \ \ \ \ \4161\ B_8 = {1\over 30},$$ and to convince you that the numerator of these numbers is not always $1$, here are4162a few more:4163$$B_{10} = {5\over 66},\ \ \ \ \ \ \ \ B_{12}= {691\over 2730},\ \ \ \ \ \ \ \ B_{14} = {7\over 6}.$$4164\ \newline \bigskip \ \newline4165If you turn attention to sums of negative $k$-th powers of consecutive4166integers, then when $k= -1$, $$S_{-1}(N)= {\frac{1}{1}}+4167{\frac{1}{2}}+ {\frac{1}{3}}+ \cdots+ {\frac{1}{N}}$$ tends to infinity4168like $\log(N)$, but for $k < -1$ we are facing the sum of reciprocals4169of powers (of exponent $>1$) of consecutive whole numbers, and4170$S_k(N)$ converges. This is the first appearance of the zeta function\index{zeta function}4171$\zeta(s)$ for arguments $s=2,3,4,\dots$ So let us denote these limits4172by notation that has been standard, after Riemann\index{Riemann, Bernhard}:4173$$\zeta(s):= {\frac{1}{1^s}}+ {\frac{1}{2^s}}+ {\frac{1}{3^s}}+ \cdots$$4174The striking reformulation that Euler\index{Euler, Leonhard} discovered was the expression of4175this infinite sum as an infinite product of factors associated to the4176prime numbers:\index{prime number}4177$$4178\zeta(s)=\sum_n{\frac{1}{n^s}} =4179\prod_{p \ {\rm prime}}{\frac{1}{1-p^{-s}}},4180$$4181where the infinite sum on the left and the infinite product on the4182right both converge (and are equal) if $s > 1$. He also evaluated these4183sums at even positive integers, where---surprise---the Bernoulli4184numbers come in again; and they and $\pi$ combine to yield the values of the zeta function\index{zeta function} at even positive integers:41854186\begin{align*}4187\zeta(2) &= {\frac{1}{1^2}}+{\frac{1}{2^2}}+\cdots = \pi^2/6 \simeq 1.65\dots\\4188\zeta(4) &= {\frac{1}{1^4}}+{\frac{1}{2^4}}+\cdots = \pi^4/90\simeq 1.0823\dots4189\end{align*}4190and, in general,4191$$\zeta(2n) = {\frac{1}{1^{2n}}}+{\frac{1}{2^{2n}}}+\cdots \ \ \ = \ \ \ (-1)^{n+1}B_{2n}\pi^{2n}\cdot {\frac{2^{2n-1}}{(2n)!}}.$$41924193A side note to Euler's\index{Euler, Leonhard} formulas comes from the fact (only known much4194later) that no power of $\pi$ is rational: do you see how to use4195this to give a proof that there are infinitely many primes,4196combining Euler's\index{Euler, Leonhard} infinite product expansion displayed above with4197the formula for $\zeta(2)$, or with the formula for $\zeta(4)$, or,4198in fact, for the formulas for $\zeta(2n)$ for {\it any} $n$ you4199choose?42004201{\bf Pafnuty Lvovich Chebyshev's\index{Chebyshev, Pafnuty Lvovich} idea ($\simeq$1845):} The second moment4202in the history of evolution of this function $\zeta(s)$ is when4203Chebyshev\index{Chebyshev, Pafnuty Lvovich} used the {\it same formula} as above in the extended range4204where $s$ is allowed now to be a real variable---not just an4205integer---greater than $1$. Making use of this extension of the4206range of definition of Euler's\index{Euler, Leonhard} sum of reciprocals of powers of4207consecutive whole numbers, Chebyshev\index{Chebyshev, Pafnuty Lvovich} could prove that for large $x$4208the ratio of $\pi(x)$ and $x/\log(x)$ is bounded above and below by4209two explicitly given constants. He also proved that there exists a4210prime number\index{prime number} in the interval bounded by $n$ and $2n$ for any positive4211integer $n$ (this was called {\it Bertrand's postulate}; see \url{http://en.wikipedia.org/wiki/Proof_of_Bertrand%27s_postulate}).4212\vskip20pt42134214{\bf Riemann's idea (1859):} It is in the third step of the4215evolution of $\zeta(s)$ that something quite surprising4216happens. Riemann\index{Riemann, Bernhard} extended the range of Chebyshev's\index{Chebyshev, Pafnuty Lvovich} sum of4217reciprocals of positive real powers of consecutive whole numbers4218allowing the argument $s$ to range over the entire complex plane $s$4219(avoiding $s=1$). Now this is a more mysterious extension of4220Euler's\index{Euler, Leonhard} function, and it is deeper in two ways:42214222\begin{itemize}4223\item The formula $$\zeta(s):= {\frac{1}{1^s}}+ {\frac{1}{2^s}}+4224{\frac{1}{3^s}}+ \cdots$$ does converge when the real part of the4225exponent $s$ is greater than $1$ (i.e., this allows us to use the4226same formula, as Chebyshev\index{Chebyshev, Pafnuty Lvovich} had done, for the right half plane4227in the complex plane determined by the condition $s=x+iy$ with $x>1$4228but not beyond this). You can't simply use the same formula for the4229extension.\item So you must face the fact that if you wish to4230``extend'' a function beyond the natural range in which its defining4231formula makes sense, there may be many ways of doing it.4232\end{itemize}42334234To appreciate the second point, the theory of a complex variable is4235essential. The {\it uniqueness} (but not yet the {\it existence}) of4236Riemann's extension of $\zeta(s)$ to the entire complex plane is4237guaranteed by the phenomenon referred to as {\it analytic4238continuation}. If you are given a function on any infinite subset4239$X$ of the complex plane that contains a limit point, and if you are4240looking for a function on the entire complex plane{\footnote{ or to4241any connected open subset that contains $X$}} that is4242differentiable in the sense of complex analysis, there may be no4243functions at all that have that property, but if there is one, that4244function is {\it unique}. But Riemann\index{Riemann, Bernhard} succeeded: he was indeed able4245to extend Euler's\index{Euler, Leonhard} function to the entire complex plane except for the4246point $s=1$, thereby defining what we now call {\it Riemann's zeta4247function}. \vskip20pt42484249Those ubiquitous Bernoulli numbers reappear yet again as4250values of this {\it extended} zeta function\index{zeta function} at negative integers:4251$$\zeta(-n) = -B_{n+1}/(n+1)$$4252so since the Bernoulli numbers indexed by odd integers $>1$ all4253vanish, the extended zeta function\index{zeta function} $\zeta(s)$ actually vanishes at4254{\it all} even negative integers.42554256The even integers $-2, -4, -6, \dots$ are often called the {\bf4257trivial zeroes} of the Riemann zeta function.\index{zeta function} There are indeed4258other zeroes of the zeta function,\index{zeta function} and those other zeroes could---in no way---be4259dubbed ``trivial,'' as we shall shortly see.4260\vskip20pt4261It is time to consider these facts:42624263\begin{enumerate}\item {\bf Riemann's zeta function\index{zeta function} codes the4264placement of prime powers among all numbers.} The key here is to4265take the logarithm and then the derivative of $\zeta(s)$ (this boils4266down to forming ${\frac{d{\zeta}}{ds}}(s)/\zeta(s)$). Assuming4267that the real part of $s$ is $>1$, taking the logarithm of4268$\zeta(s)$---using Euler's\index{Euler, Leonhard} infinite product formulation---gives4269us $$\log\zeta(s)= \sum_{p \ {\rm4270prime}}-\log{\big(1-p^{-s}\big)},$$ and we can do this4271term-by-term, since the real part of $s$ is $>1$. Then taking the4272derivative gives us:42734274\vskip20pt4275$${\frac{d{\zeta}}{ds}}(s)/\zeta(s)= -\sum_{n=1}^{\infty}\Lambda(n)n^{-s}$$4276where4277$$4278\Lambda(n) :=4279\begin{cases}4280\log(p) & \text{when $n= p^k$ for $p$ a prime number and $k >0$, and}\\42810 & \text{if $n$ is not a power of a prime number.}4282\end{cases}4283$$4284In particular, $\Lambda(n)$ ``records" the placement of prime powers.42854286%\todo{Put in log deriv of zeta.}4287%$$ {\frac{d{\zeta}}{ds}}(s)/\zeta(s)=-\sum_{p \ {\rm prime}}\log(p)p^{-s}.$$ For any particular $s$ this is an infinite sum which tallies the primes $p$, each weighted by the quantity $\log(p)p^{-s}$ and is absolutely convergent if the real part of $s$ is $>1$. % note : this forgets the prime powers.42884289\vskip20pt42904291\item {\bf You know lots about an analytic function if you know its zeroes and4292poles.} For example for polynomials, or4293even rational functions: if someone told you that a4294certain rational function $f(s)$ vanishes to order $1$ at $0$ and at4295$\infty$; and that it has a double pole at $s=2$ and at all other4296points has finite nonzero values, then you can immediate say that4297this mystery function is a nonzero constant times $s/(s-2)^2$.42984299Knowing the zeroes and poles (in the complex plane) alone of the4300Riemann zeta function\index{zeta function} doesn't entirely pin it down---you have to4301know more about its behavior at infinity since---for example,4302multiplying a function by $e^z$ doesn't change the structure of its4303zeroes and poles in the finite plane. But a complete understanding4304of the zeroes and poles of $\zeta(s)$ will give all the information4305you need to pin down the placement of primes among all numbers.43064307So here is the score:43084309\begin{itemize} \item As for poles, $\zeta(s)$ has only one pole. It4310is at $s=1$ and is of order $1$ (a ``simple pole'').\item As for4311zeroes, we have already mentioned the trivial zeroes (at negative4312even integers), but $\zeta(s)$ also has infinitely many {\it4313nontrivial} zeroes. These nontrivial zeroes\index{nontrivial zeroes} are known to lie in4314the vertical strip $$0\ < \ {\rm real\ part\ of\ } s\ < \43151.$$\end{itemize}43164317\end{enumerate}43184319And here is yet another equivalent statement of Riemann's4320Hypothesis---this being the formulation closest to the one given in4321his 1859 memoir:43224323\begin{center}4324\shadowbox{ \begin{minipage}{0.9\textwidth}4325\mbox{} \vspace{0.2ex}4326\begin{center}{\bf\large The {\bf \RH{}} (fourth formulation)}\end{center}4327\medskip43284329All the nontrivial zeroes\index{nontrivial zeroes} of $\zeta(s)$ lie on the vertical4330line in the complex plane consisting of the4331complex numbers with real part equal to $1/2$.4332These zeroes are none other than4333${\frac{1}{2}}\pm i\theta_1,{\frac{1}{2}}\pm i\theta_2,4334{\frac{1}{2}}\pm i\theta_3,\dots,$ where $\theta_1, \theta_2,4335\theta_3, \dots$ comprise the spectrum\index{spectrum} of primes we talked4336about in the earlier chapters.43374338\vspace{1ex}4339\end{minipage}}4340\end{center}43414342The ``${\frac{1}{2}}$" that appears in this formula is directly related to the fact---correspondingly conditional on RH---that $\pi(X)$ is ``square-root accurately"\index{square-root accurate} approximated by $\Li(X)$. That is, the error term is bounded by $X^{{\bf{\frac{1}{2}}}+\epsilon}.$ It has been conjectured that {\it all} the zeroes of $\zeta(s)$ are simple zeroes.434343444345Here is how Riemann\index{Riemann, Bernhard} phrased RH:43464347\begin{quote}4348``One now finds indeed approximately this number of real roots4349within these limits, and it is very probable that all roots are4350real. Certainly one would wish for a stricter proof here; I have4351meanwhile temporarily put aside the search for this after some4352futile attempts, as it appears unnecessary for the next objective of4353my investigation.''4354\end{quote}43554356In the above quotation, Riemann's roots are the $\theta_i$'s and the statement that they are ``real" is equivalent to RH.43574358The zeta function,\index{zeta function} then, is the vise, that so elegantly clamps4359together information about the placement of primes and their4360spectrum\index{spectrum}!436143624363That a simple geometric property of these zeroes (lying on a line!) is4364directly equivalent to such profound (and more difficult to express)4365regularities among prime numbers\index{prime number} suggests that these zeroes and the4366parade of Riemann's corrections governed by them---when we truly4367comprehend their message---may have lots more to teach us, may4368eventually allow us a more powerful understanding of arithmetic. This4369infinite collection of complex numbers, i.e., the nontrivial zeroes\index{nontrivial zeroes} of4370the Riemann zeta function,\index{zeta function} plays a role with respect to $\pi(X)$ rather4371like the role the {\em spectrum}\index{spectrum} of the Hydrogen atom plays in4372Fourier's theory. Are the primes themselves no more than an4373epiphenomenon, behind which there lies, still veiled from us, a4374yet-to-be-discovered, yet-to-be-hypothesized, profound conceptual key4375to their perplexing orneriness? Are the many innocently posed, yet4376unanswered, phenomenological questions about numbers---such as in the4377ones listed earlier---waiting for our discovery of this deeper level4378of arithmetic? Or for layers deeper still? Are we, in fact, just at4379the beginning?43804381438243834384These are not completely idle thoughts, for a tantalizing analogy4385relates the number theory we have been discussing to an already4386established branch of mathematics---due, largely, to the work of4387Alexander Grothendieck,\index{Grothendieck, Alexandre} and Pierre Deligne\index{Deligne, Pierre}---where the corresponding4388analogue of Riemann's hypothesis has indeed been proved\ldots.4389439043914392\chapter{Companions to the zeta function\index{zeta function}}\label{ch:companions}43934394Our book, so far, has been exclusively about Riemann's $\zeta(s)$ and4395its zeroes. We have been discussing how (the placement of) the zeroes4396of $\zeta(s)$ in the complex plane contains the information needed to4397understand (the placement of) the primes in the set of all whole4398numbers; and conversely.43994400It would be wrong---we think---if we don't even mention that4401$\zeta(s)$ fits into a broad family of similar functions that connect to other problems in number theory.440244034404% for every {\it quadratic number field} $K:={\bf Q}({\sqrt4405% d})$, there is a corresponding ``$L$-function'' $L(K,s)$ with an4406%expansion of the form $$L(K,s)= \sum_{N=1}^{\infty} a_n(K)\cdot4407%n^{-s}$$ convergent for complex numbers $s$ with real part $> 1$---and4408%where the coefficients $a_n(K)$ are either $0$ or $\pm 1$. The4409%product $\zeta(s)\cdot L(K,s)$ is called the {\it zeta function\index{zeta function} of4410% $K$} and is sometimes denoted $\zeta_K(s)$ to distinguish it from4411%the Riemann zeta function.\index{zeta function} The ``nontrivial zeroes''\index{nontrivial zeroes} of $\zeta_K(s)$4412%contains the information needed to understand the (placement of) the4413%norms of prime ideals of the ring of integers of $K$.44144415For example---instead of the ordinary integers---consider the {\it4416Gaussian integers}.\index{Gaussian integers} This is the collection of numbers4417$$ \{ a+bi\}$$4418where $i = {\sqrt{-1}}$ and $a,b$ are ordinary integers. We can add4419and multiply two such numbers and get another of the same form. The4420only ``units'' among the Gaussian integers\index{Gaussian integers} (i.e., numbers whose4421inverse is again a Gaussian integer) are the four numbers $\pm 1, \pm4422i$ and if we multiply any Gaussian integer $a+bi$ by any of these4423four units, we get the collection $\{a+bi, -a-bi, -b+ai, b -ai\}$.4424We measure the {\it size} of a Gaussian integer by the square of its4425distance to the origin, i.e., $$|a+bi|^2 = {a^2+b^2}.$$ This size function is called the {\bf norm} of the Gaussian integer $a+bi$ and can also be thought of as the product of $a+bi$ and its ``conjugate'' $a-bi$. Note that4426the norm is a nice multiplicative function on the set of Gaussian4427integers, in that the norm of a product of two Gaussian integers\index{Gaussian integers} is4428the product of the norms of each of them.44294430We have a natural notion of {\bf prime Gaussian integer}, i.e., one4431with $a>0$ and $b\geq 0$ that cannot be factored\index{factor} as the product of4432two Gaussian integers\index{Gaussian integers} of smaller size.4433Given that every nonzero Gaussian integer is uniquely expressible as a unit times a product of prime Gaussian integers, can you prove that if a Gaussian integer is a prime4434Gaussian integer, then its size must either be an ordinary prime4435number, or the square of an ordinary prime number?\index{prime number}44364437Figure~\ref{fig:gaussian-10} contains a plot of the first few Gaussian4438primes as they display themselves amongst complex numbers:44394440\ill{gaussian_primes-10}{.8}{Gaussian primes with coordinates up to 10\label{fig:gaussian-10}}44414442Figure~\ref{fig:gaussian-100} plots a much larger number of Gaussian primes:44434444\ill{gaussian_primes-100}{1}{Gaussian primes with coordinates up to 100\label{fig:gaussian-100}}44454446Figures~\ref{fig:gaussian-staircase-14}--\ref{fig:gaussian-staircase-10000} plot the number4447of Gaussian primes up to each norm:44484449\ill{gaussian_staircase_14}{.8}{Staircase of Gaussian primes of norm up to 14\label{fig:gaussian-staircase-14}}4450\ill{gaussian_staircase_100}{.8}{Staircase of Gaussian primes of norm up to 100}4451\ill{gaussian_staircase_1000}{.8}{Staircase of Gaussian primes of norm up to 1000}4452\ill{gaussian_staircase_10000}{.8}{Staircase of Gaussian primes of norm up to 10000\label{fig:gaussian-staircase-10000}}44534454The natural question to ask, then, is: how are the Gaussian prime numbers\index{prime number} distributed? Can one provide as close an estimate to their distribution\index{distribution} and structure, as one has for ordinary primes? The answer, here is yes: there is a companion theory, with an analogue to the Riemann zeta function\index{zeta function} playing a role similar to the prototype $\zeta(s)$. And, it seems as if its ``nontrivial zeroes''\index{nontrivial zeroes} behave similarly: as far as things have been computed, they all have the property that4455their real part is equal to ${\frac{1}{2}}$. That is, we4456have a companion to the \RH{}.44574458This is just the beginning of a much larger story related to what has been come to be called the ``Grand Riemann Hypotheses" and connects to analogous problems, some of them actually solved, that give some measure of evidence for the truth of these hypotheses. For example, for any system4459of polynomials in a fixed number of variables (with integer4460coefficients, say) and for each prime number $p$ there are4461``zeta-type'' functions that contain all the information needed to4462count the number of simultaneous solutions in finite fields of4463characteristic $p$. That such counts can be well-approximated with a4464neatly small error term is related to the placement of the zeroes of4465these ``zeta-type'' functions. There is then an analogous ``\RH{}''4466that prescribes precise conditions on the real parts of their4467zeroes---this prescription being called the ``\RH{} for4468function fields.'' Now the beauty of this analogous hypothesis is that4469it has, in fact, been proved!44704471Is this yet another reason to believe the Grand \RH{}?447244734474%4475%\chapter{Glossary}4476%4477%Here we give all the connections with the standard literature and4478%conventional terminology that we restrained4479% ourselves from giving in the text itself.4480%4481% For the moment the list of entries is the following but it will4482% expand.4483%4484% $\pi(X)= \pi_0(X)$, $Q(X)= \psi_0(X)$, $\log, \exp$, $\delta$,4485% distributions,\index{distribution} RSA cryptography, Mersenne prime, $\Li(x)$, random4486% walk, spectrum, harmonic, fundamental, frequency, phase, amplitude,4487% band-pass, complex numbers, complex plane, Riemann Zeta function\index{zeta function},4488% zeroes of zeta.4489%4490%4491% Also mention Brian Conrey's Notices article on RH as ``among the4492% best 12 pages of RH survey material that there is---at least for an4493% audience of general mathematicians.'' And mention4494% mention Sarnak and Bombieri articles at the CMI website on RH.4495%44964497\backmatter449844994500%\eject4501%\addcontentsline{toc}{chapter}{Index}4502%\printindex45034504%\label{lastpage}45054506% no longer in previous chapter, so we increase the counter4507% so that figure and other labels aren't for the previous chapter.4508\stepcounter{chapter}4509\addcontentsline{toc}{chapter}{Endnotes}45104511\renewcommand\bibname{Endnotes}45124513\bibliography{biblio}451445154516\printindex451745184519\end{document}452045214522%sagemathcloud={"zoom_width":115,"latex_command":"killall pdflatex; makeindex rh; pdflatex -synctex=1 -interact=nonstopmode rh.tex; bibtex rh"}452345244525