Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

📚 The CoCalc Library - books, templates and other resources

Views: 96141
License: OTHER
1
\documentclass[12pt]{article}
2
3
\usepackage{mathtools}
4
5
\title{Bayes's theorem and logistic regression}
6
\author{Allen B. Downey}
7
8
\newcommand{\logit}{\mathrm{logit}}
9
\renewcommand{\P}{\mathrm{P}}
10
\renewcommand{\O}{\mathrm{O}}
11
\newcommand{\LR}{\mathrm{LR}}
12
\newcommand{\LO}{\mathrm{LO}}
13
\newcommand{\LLR}{\mathrm{LLR}}
14
\newcommand{\OR}{\mathrm{OR}}
15
\newcommand{\LOR}{\mathrm{LOR}}
16
\newcommand{\IF}{\mathrm{if}}
17
\newcommand{\notH}{\neg H}
18
19
\setlength{\headsep}{3ex}
20
\setlength{\parindent}{0.0in}
21
\setlength{\parskip}{1.7ex plus 0.5ex minus 0.5ex}
22
23
\begin{document}
24
25
\maketitle
26
27
\begin{abstract}
28
My two favorite topics in probability and statistics are
29
Bayes's theorem and logistic regression. Because there are
30
similarities between them, I have always assumed that there is
31
a connection. In this note, I demonstrate the
32
connection mathematically, and (I hope) shed light on the
33
motivation for logistic regression and the interpretation of
34
the results.
35
\end{abstract}
36
37
38
\section{Bayes's theorem}
39
40
I'll start by reviewing Bayes's theorem, using an example that came up
41
when I was in grad school. I signed up for a class on Theory of
42
Computation. On the first day of class, I was the first to arrive. A
43
few minutes later, another student arrived. Because I was expecting
44
most students in an advanced computer science class to be male, I was
45
mildly surprised that the other student was female. Another female
46
student arrived a few minutes later, which was sufficiently
47
surprising that I started to think I was in the wrong room. When
48
another female student arrived, I was confident I was in the wrong
49
place (and it turned out I was).
50
51
As each student arrived, I used the observed data to update my
52
belief that I was in the right place. We can use Bayes's theorem to
53
quantify the calculation I was doing intuitively.
54
55
I'll us $H$ to represent the hypothesis that I was in the right
56
room, and $F$ to represent the observation that the first other
57
student was female. Bayes's theorem provides an algorithm for
58
updating the probability of $H$:
59
60
\[ \P(H|F) = \P(H)~\frac{\P(F|H)}{P(F)}\]
61
62
Where
63
64
\begin{itemize}
65
66
\item $\P(H)$ is the prior probability of $H$ before the other
67
student arrived.
68
69
\item $\P(H|F)$ is the posterior probability of $H$, updated based
70
on the observation $F$.
71
72
\item $\P(F|H)$ is the likelihood of the data, $F$, assuming that
73
the hypothesis is true.
74
75
\item $P(F)$ is the likelihood of the data, independent of $H$.
76
77
\end{itemize}
78
79
Before I saw the other students, I was confident I was in the right
80
room, so I might assign $\P(H)$ something like 90\%.
81
82
When I was in grad school most advanced computer science classes were
83
90\% male, so if I was in the right room, the likelihood of the
84
first female student was only 10\%. And the likelihood of three
85
female students was only 0.1\%.
86
87
If we don't assume I was in the right room, then the likelihood of
88
the first female student was more like 50\%, so the likelihood
89
of all three was 12.5\%.
90
91
Plugging those numbers into Bayes's theorem yields $\P(H|F) = 0.64$
92
after one female student, $\P(H|FF) = 0.26$ after the second,
93
and $\P(H|FFF) = 0.07$ after the third.
94
95
96
\section{Logistic regression}
97
98
Logistic regression is based on the following functional form:
99
100
\[ \logit(p) = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n \]
101
102
where the dependent variable, $p$, is a probability,
103
the $x$s are explanatory variables, and the $\beta$s are
104
coefficients we want to estimate. The $\logit$ function is the
105
log-odds, or
106
107
\[ \logit(p) = \ln \left( \frac{p}{1-p} \right) \]
108
109
When you present logistic regression like this, it raises
110
three questions:
111
112
\begin{itemize}
113
114
\item Why is $\logit(p)$ the right choice for the dependent
115
variable?
116
117
\item Why should we expect the relationship between $\logit(p)$
118
and the explanatory variables to be linear?
119
120
\item How should we interpret the estimated parameters?
121
122
\end{itemize}
123
124
The answer to all of these questions turns out to be Bayes's
125
theorem. To demonstrate that, I'll use a simple example where
126
there is only one explanatory variable. But the derivation
127
generalizes to multiple regression.
128
129
On notation: I'll use $\P(H)$ for the probability
130
that some hypothesis, $H$, is true. $\O(H)$ is the odds of the same
131
hypothesis, defined as
132
133
\[ \O(H) = \frac{\P(H)}{1 - \P(H)} \]
134
135
I'll use $\LO(H)$ to represent the log-odds of $H$:
136
137
\[ \LO(H) = \ln \O(H) \]
138
139
I'll also use $\LR$ for a likelihood ratio, and $\OR$ for an odds
140
ratio. Finally, I'll use $\LLR$ for a log-likelihood ratio, and
141
$\LOR$ for a log-odds ratio.
142
143
144
145
146
147
\section{Making the connection}
148
149
To demonstrate the connection between Bayes's theorem and
150
logistic regression, I'll start with the odds form
151
of Bayes's theorem. Continuing the previous example,
152
I could write
153
154
\begin{equation} \label{A}
155
\O(H|F) = \O(H)~\LR(F|H)
156
\end{equation}
157
158
where
159
160
\begin{itemize}
161
162
\item $\O(H)$ is the prior odds that I was in the right room,
163
164
\item $\O(H|F)$ is the posterior odds after seeing one female student,
165
166
\item $\LR(F|H)$ is the likelihood ratio of the data, given
167
the hypothesis.
168
169
\end{itemize}
170
171
The likelihood ratio of the data is:
172
173
\[ \LR(F|H) = \frac{\P(F|H)}{\P(F|\notH)} \]
174
175
where $\notH$ means $H$ is false.
176
177
Noticing that logistic regression is expressed in terms of
178
log-odds, my next move is to write the log-odds form of
179
Bayes's theorem by taking the log of Eqn~\ref{A}:
180
181
\begin{equation} \label{B}
182
\LO(H|F) = \LO(H) + \LLR(F|H)
183
\end{equation}
184
185
If the first student to arrive had been male, we would write
186
187
\begin{equation} \label{C} \nonumber
188
\LO(H|M) = \LO(H) + \LLR(M|H)
189
\end{equation}
190
191
Or more generally if we use $X$ as a variable to represent
192
the sex of the observed student, we would write
193
194
\begin{equation} \label{D}
195
\LO(H|X) = \LO(H) + \LLR(X|H)
196
\end{equation}
197
198
I'll assign $X=0$ if the observed student is female and
199
$X=1$ if male. Then I can write:
200
201
\begin{equation} \label{E} \nonumber
202
\LLR(X|H) = \left\{
203
\begin{array}{lr}
204
\LLR(F|H) & \IF ~X = 0\\
205
\LLR(M|H) & \IF ~X = 1
206
\end{array}
207
\right.
208
\end{equation}
209
210
Or we can collapse these two expressions into one by using
211
$X$ as a multiplier:
212
213
\begin{equation} \label{F}
214
\LLR(X|H) = \LLR(F|H) + X [\LLR(M|H) - \LLR(F|H)]
215
\end{equation}
216
217
218
\section{Odds ratios}
219
220
The next move is to recognize that
221
the part of Eqn~\ref{F} in brackets is the log-odds ratio
222
of $H$. To see that, we need to look more closely at odds ratios.
223
224
Odds ratios are often used in medicine to describe the association
225
between a disease and a risk factor. In the example scenario, we
226
can use an odds ratio to express the odds of the hypothesis
227
$H$ if we observe a male student, relative to the odds if we
228
observe a female student:
229
230
\[ \OR_X(H) = \frac{\O(H|M)}{\O(H|F)} \]
231
232
I'm using the notation $\OR_X$ to represent the odds ratio
233
associated with the variable $X$.
234
235
Applying Bayes's theorem to
236
the top and bottom of the previous expression yields
237
238
\[ \OR_X(H) = \frac{\O(H)~\LR(M|H)}{\O(H)~\LR(F|H)} =
239
\frac{\LR(M|H)}{\LR(F|H)}\]
240
241
Taking the log of both sides yields
242
243
\begin{equation} \label{G}
244
\LOR_X(H) = \LLR(M|H) - \LLR(F|H)
245
\end{equation}
246
247
This result should look familiar, since it appears in
248
Eqn~\ref{F}.
249
250
251
\section{Conclusion}
252
253
Now we have all the pieces we need; we just have to assemble them.
254
Combining Eqns~\ref{F} and \ref{G} yields
255
256
\begin{equation} \label{H}
257
\LLR(H|X) = \LLR(F) + X~\LOR(X|H)
258
\end{equation}
259
260
Combining Eqns~\ref{D} and \ref{H} yields
261
262
\begin{equation} \label{I}
263
\LO(H|X) = \LO(H) + \LLR(F|H) + X~\LOR(X|H)
264
\end{equation}
265
266
Finally, combining Eqns~\ref{B} and \ref{I} yields
267
268
\[ \LO(H|X) = \LO(H|F) + X~\LOR(X|H) \]
269
270
We can think of this equation as the log-odds form of Bayes's theorem,
271
with the update term expressed as a log-odds ratio. Let's compare
272
that to the functional form of logistic regression:
273
274
\[ \logit(p) = \beta_0 + X \beta_1 \]
275
276
The correspondence between these equations suggests the following
277
interpretation:
278
279
\begin{itemize}
280
281
\item The predicted value, $\logit(p)$, is the posterior log
282
odds of the hypothesis, given the observed data.
283
284
\item The intercept, $\beta_0$, is the log-odds of the
285
hypothesis if $X=0$.
286
287
\item The coefficient of $X$, $\beta_1$, is a log-odds ratio
288
that represents odds of $H$ when $X=1$, relative to
289
when $X=0$.
290
291
\end{itemize}
292
293
This relationship between logistic regression and Bayes's theorem
294
tells us how to interpret the estimated coefficients. It also
295
answers the question I posed at the beginning of this note:
296
the functional form of logistic regression makes sense because
297
it corresponds to the way Bayes's theorem uses data to update
298
probabilities.
299
300
\end{document}
301
302