Area under the receiver operating curve
Copyright 2019 Allen Downey
Area under ROC
As a way of understanding AUC ROC, let's look at the relationship between AUC and Cohen's effect size.
Cohen's effect size, d
, expresses the difference between two groups as the number of standard deviations between the means.
As d
increases, we expect it to be easier to distinguish between groups, so we expect AUC to increase.
I'll start in one dimension and then generalize to multiple dimensions.a
Here are the means and standard deviations for two hypothetical groups.
I'll generate two random samples with these parameters.
If we put a threshold at the midpoint between the means, we can compute the fraction of Group 0 that would be above the threshold.
I'll call that the false positive rate.
And here's the fraction of Group 1 that would be below the threshold, which I'll call the false negative rate.
Plotting misclassification
To see what these overlapping distributions look like, I'll plot a kernel density estimate (KDE).
Here's what it looks like with the threshold at 0. There are many false positives, shown in blue, and few false negatives, in orange.
With a higher threshold, we get fewer false positives, at the cost of more false negatives.
The receiver operating curve
The receiver operating curve (ROC) represents this tradeoff.
To plot the ROC, we have to compute the false positive rate (which we saw in the figure above), and the true positive rate (not shown in the figure).
The following function computes these metrics.
When the threshold is high, the false positive rate is low, but so is the true positive rate.
As we decrease the threshold, the true positive rate increases, but so does the false positive rate.
The ROC shows this tradeoff over a range of thresholds.
I sweep thresholds from high to low so the ROC goes from left to right.
Here's the ROC for the samples.
With d=1
, the area under the curve is about 0.75. That might be a good number to remember.
Now let's see what that looks like for a range of d
.
This function computes AUC as a function of d
.
The following figure shows AUC as a function of d
.
Not suprisingly, AUC increases as d
increases.
Multivariate distributions
Now let's see what happens if we have more than one variable, with a difference in means along more than one dimension.
First, I'll generate a 2-D sample with d=1
along both dimensions.
The mean of sample1
should be near 0 for both features.
And the mean of sample2
should be near 1.
The following scatterplot shows what this looks like in 2-D.
Some points are clearly classifiable, but there is substantial overlap in the distributions.
We can see the same thing if we estimate a 2-D density function and make a contour plot.
Classification with logistic regression
To see how distinguishable the samples are, I'll use logistic regression.
To get the data into the right shape, I'll make two DataFrames, label them, concatenate them, and then extract the labels and the features.
0 | 1 | label | |
---|---|---|---|
count | 1000.000000 | 1000.000000 | 1000.0 |
mean | 0.012044 | -0.051937 | 1.0 |
std | 0.971861 | 0.976814 | 0.0 |
min | -3.580857 | -3.061129 | 1.0 |
25% | -0.596927 | -0.696824 | 1.0 |
50% | 0.071937 | -0.044057 | 1.0 |
75% | 0.655457 | 0.615113 | 1.0 |
max | 3.053507 | 3.292066 | 1.0 |
0 | 1 | |
---|---|---|
0 | 1.000000 | 0.021376 |
1 | 0.021376 | 1.000000 |
0 | 1 | label | |
---|---|---|---|
count | 1000.000000 | 1000.000000 | 1000.0 |
mean | 0.979477 | 1.023589 | 2.0 |
std | 0.983136 | 0.967058 | 0.0 |
min | -2.231272 | -2.027548 | 2.0 |
25% | 0.291482 | 0.417082 | 2.0 |
50% | 1.008545 | 1.008277 | 2.0 |
75% | 1.670930 | 1.647037 | 2.0 |
max | 3.869119 | 4.138071 | 2.0 |
0 | 1 | |
---|---|---|
0 | 1.00000 | -0.04433 |
1 | -0.04433 | 1.00000 |
X
is the array of features; y
is the vector of labels.
Now we can fit the model.
And compute the AUC.
With two features, we can do better than with just one.
AUC as a function of rho
The following function contains the code from the previous section, with rho
as a parameter.
Now we can sweep a range of values for rho
.
AUC as a function of d
The following function contains the code from the previous section, generalized to handle more than 2 dimensions.
Confirming what we have seen before:
Now we can sweep a range of effect sizes.
And plot the results.
With more features, the AUC gets better, assuming the features are independent.