CoCalc -- note.tex

📚 The CoCalc Library - books, templates and other resources
Project: 📚 The Library - Shared Public Version
Path: cocalc-examples / think-stats-2ed / book / note.tex
Views: ⁹⁶¹⁴¹
License: OTHER
1
\documentclass[12pt]{article}
2

3
\usepackage{mathtools}
4

5
\title{Bayes's theorem and logistic regression}
6
\author{Allen B. Downey}
7

8
\newcommand{\logit}{\mathrm{logit}}
9
\renewcommand{\P}{\mathrm{P}}
10
\renewcommand{\O}{\mathrm{O}}
11
\newcommand{\LR}{\mathrm{LR}}
12
\newcommand{\LO}{\mathrm{LO}}
13
\newcommand{\LLR}{\mathrm{LLR}}
14
\newcommand{\OR}{\mathrm{OR}}
15
\newcommand{\LOR}{\mathrm{LOR}}
16
\newcommand{\IF}{\mathrm{if}}
17
\newcommand{\notH}{\neg H}
18

19
\setlength{\headsep}{3ex}
20
\setlength{\parindent}{0.0in}
21
\setlength{\parskip}{1.7ex plus 0.5ex minus 0.5ex}
22

23
\begin{document}
24

25
\maketitle
26

27
\begin{abstract}
28
My two favorite topics in probability and statistics are
29
Bayes's theorem and logistic regression.  Because there are
30
similarities between them, I have always assumed that there is
31
a connection.  In this note, I demonstrate the
32
connection mathematically, and (I hope) shed light on the
33
motivation for logistic regression and the interpretation of
34
the results.
35
\end{abstract}
36

37

38
\section{Bayes's theorem}
39

40
I'll start by reviewing Bayes's theorem, using an example that came up
41
when I was in grad school.  I signed up for a class on Theory of
42
Computation.  On the first day of class, I was the first to arrive.  A
43
few minutes later, another student arrived.  Because I was expecting
44
most students in an advanced computer science class to be male, I was
45
mildly surprised that the other student was female.  Another female
46
student arrived a few minutes later, which was sufficiently
47
surprising that I started to think I was in the wrong room.  When
48
another female student arrived, I was confident I was in the wrong
49
place (and it turned out I was).
50

51
As each student arrived, I used the observed data to update my
52
belief that I was in the right place.  We can use Bayes's theorem to
53
quantify the calculation I was doing intuitively.
54

55
I'll us $H$ to represent the hypothesis that I was in the right
56
room, and $F$ to represent the observation that the first other
57
student was female.  Bayes's theorem provides an algorithm for
58
updating the probability of $H$:
59

60
\[ \P(H|F) = \P(H)~\frac{\P(F|H)}{P(F)}\]
61

62
Where
63

64
\begin{itemize}
65

66
\item $\P(H)$ is the prior probability of $H$ before the other
67
student arrived.
68

69
\item $\P(H|F)$ is the posterior probability of $H$, updated based
70
on the observation $F$.
71

72
\item $\P(F|H)$ is the likelihood of the data, $F$, assuming that
73
the hypothesis is true.
74

75
\item $P(F)$ is the likelihood of the data, independent of $H$.
76
 
77
\end{itemize}
78

79
Before I saw the other students, I was confident I was in the right
80
room, so I might assign $\P(H)$ something like 90\%.
81

82
When I was in grad school most advanced computer science classes were
83
90\% male, so if I was in the right room, the likelihood of the
84
first female student was only 10\%.  And the likelihood of three
85
female students was only 0.1\%.
86

87
If we don't assume I was in the right room, then the likelihood of
88
the first female student was more like 50\%, so the likelihood
89
of all three was 12.5\%.
90

91
Plugging those numbers into Bayes's theorem yields $\P(H|F) = 0.64$
92
after one female student, $\P(H|FF) = 0.26$ after the second,
93
and $\P(H|FFF) = 0.07$ after the third.
94

95

96
\section{Logistic regression}
97

98
Logistic regression is based on the following functional form:
99

100
\[ \logit(p) = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n \]
101

102
where the dependent variable, $p$, is a probability,
103
the $x$s are explanatory variables, and the $\beta$s are
104
coefficients we want to estimate.  The $\logit$ function is the
105
log-odds, or
106

107
\[ \logit(p) = \ln \left( \frac{p}{1-p} \right) \]
108

109
When you present logistic regression like this, it raises
110
three questions:
111

112
\begin{itemize}
113

114
\item Why is $\logit(p)$ the right choice for the dependent
115
variable?
116

117
\item Why should we expect the relationship between $\logit(p)$
118
and the explanatory variables to be linear?
119

120
\item How should we interpret the estimated parameters?
121

122
\end{itemize}
123

124
The answer to all of these questions turns out to be Bayes's
125
theorem.  To demonstrate that, I'll use a simple example where
126
there is only one explanatory variable.  But the derivation
127
generalizes to multiple regression.
128

129
On notation: I'll use $\P(H)$ for the probability
130
that some hypothesis, $H$, is true.  $\O(H)$ is the odds of the same
131
hypothesis, defined as
132

133
\[ \O(H) = \frac{\P(H)}{1 - \P(H)} \]
134

135
I'll use $\LO(H)$ to represent the log-odds of $H$:
136

137
\[ \LO(H) = \ln \O(H) \]
138

139
I'll also use $\LR$ for a likelihood ratio, and $\OR$ for an odds
140
ratio.  Finally, I'll use $\LLR$ for a log-likelihood ratio, and
141
$\LOR$ for a log-odds ratio.
142

143

144

145

146

147
\section{Making the connection}
148

149
To demonstrate the connection between Bayes's theorem and
150
logistic regression, I'll start with the odds form
151
of Bayes's theorem.  Continuing the previous example,
152
I could write
153

154
\begin{equation} \label{A}
155
\O(H|F) = \O(H)~\LR(F|H)
156
\end{equation}
157

158
where
159

160
\begin{itemize}
161

162
\item $\O(H)$ is the prior odds that I was in the right room,
163

164
\item $\O(H|F)$ is the posterior odds after seeing one female student,
165

166
\item $\LR(F|H)$ is the likelihood ratio of the data, given
167
the hypothesis.
168

169
\end{itemize}
170

171
The likelihood ratio of the data is:
172

173
\[ \LR(F|H) = \frac{\P(F|H)}{\P(F|\notH)} \]
174

175
where $\notH$ means $H$ is false.
176

177
Noticing that logistic regression is expressed in terms of
178
log-odds, my next move is to write the log-odds form of
179
Bayes's theorem by taking the log of Eqn~\ref{A}:
180

181
\begin{equation} \label{B}
182
\LO(H|F) = \LO(H) + \LLR(F|H)
183
\end{equation}
184

185
If the first student to arrive had been male, we would write
186

187
\begin{equation} \label{C} \nonumber
188
\LO(H|M) = \LO(H) + \LLR(M|H)
189
\end{equation}
190

191
Or more generally if we use $X$ as a variable to represent
192
the sex of the observed student, we would write
193

194
\begin{equation} \label{D}
195
\LO(H|X) = \LO(H) + \LLR(X|H)
196
\end{equation}
197

198
I'll assign $X=0$ if the observed student is female and
199
$X=1$ if male.  Then I can write:
200

201
\begin{equation} \label{E} \nonumber
202
\LLR(X|H) = \left\{
203
  \begin{array}{lr}
204
    \LLR(F|H) & \IF ~X = 0\\
205
    \LLR(M|H) & \IF ~X = 1
206
  \end{array}
207
\right.
208
\end{equation}
209

210
Or we can collapse these two expressions into one by using
211
$X$ as a multiplier:
212

213
\begin{equation} \label{F}
214
\LLR(X|H) = \LLR(F|H) + X [\LLR(M|H) - \LLR(F|H)]
215
\end{equation}
216

217

218
\section{Odds ratios}
219

220
The next move is to recognize that 
221
the part of Eqn~\ref{F} in brackets is the log-odds ratio
222
of $H$.  To see that, we need to look more closely at odds ratios.
223

224
Odds ratios are often used in medicine to describe the association
225
between a disease and a risk factor.  In the example scenario, we
226
can use an odds ratio to express the odds of the hypothesis
227
$H$ if we observe a male student, relative to the odds if we
228
observe a female student:
229

230
\[ \OR_X(H) = \frac{\O(H|M)}{\O(H|F)} \]
231

232
I'm using the notation $\OR_X$ to represent the odds ratio
233
associated with the variable $X$.
234

235
Applying Bayes's theorem to
236
the top and bottom of the previous expression yields
237

238
\[ \OR_X(H) = \frac{\O(H)~\LR(M|H)}{\O(H)~\LR(F|H)} = 
239
\frac{\LR(M|H)}{\LR(F|H)}\]
240

241
Taking the log of both sides yields
242

243
\begin{equation} \label{G}
244
\LOR_X(H) = \LLR(M|H) - \LLR(F|H)
245
\end{equation}
246

247
This result should look familiar, since it appears in
248
Eqn~\ref{F}.
249

250

251
\section{Conclusion}
252

253
Now we have all the pieces we need; we just have to assemble them.
254
Combining Eqns~\ref{F} and \ref{G} yields  
255

256
\begin{equation} \label{H}
257
\LLR(H|X) = \LLR(F) + X~\LOR(X|H)
258
\end{equation}
259

260
Combining Eqns~\ref{D} and \ref{H} yields
261

262
\begin{equation} \label{I}
263
\LO(H|X) = \LO(H) + \LLR(F|H) + X~\LOR(X|H)
264
\end{equation}
265

266
Finally, combining Eqns~\ref{B} and \ref{I} yields
267

268
\[ \LO(H|X) = \LO(H|F) + X~\LOR(X|H) \]
269

270
We can think of this equation as the log-odds form of Bayes's theorem,
271
with the update term expressed as a log-odds ratio.  Let's compare
272
that to the functional form of logistic regression:
273

274
\[ \logit(p) = \beta_0 + X \beta_1 \]
275

276
The correspondence between these equations suggests the following
277
interpretation:
278

279
\begin{itemize}
280

281
\item The predicted value, $\logit(p)$, is the posterior log
282
odds of the hypothesis, given the observed data.
283

284
\item The intercept, $\beta_0$, is the log-odds of the
285
hypothesis if $X=0$.
286

287
\item The coefficient of $X$, $\beta_1$, is a log-odds ratio
288
that represents odds of $H$ when $X=1$, relative to
289
when $X=0$.
290

291
\end{itemize}
292

293
This relationship between logistic regression and Bayes's theorem
294
tells us how to interpret the estimated coefficients.  It also
295
answers the question I posed at the beginning of this note:
296
the functional form of logistic regression makes sense because
297
it corresponds to the way Bayes's theorem uses data to update
298
probabilities.
299

300
\end{document}
301

302
Product

Resources

Company