import platform
print(platform.python_version())
3.7.5
Run you jupyter notebook
positioned on the stackoverflow
project folder.
The the following notebook cell will append to your path the current folder where the jupyter notebook is runnig, in order to be able to import auc_delong_xu.py
script for this example.
import os
import sys
import pandas as pd
import numpy as np
from sklearn import datasets
notebook_folder_path = !pwd
prj_path = os.path.abspath(os.path.join(notebook_folder_path[0], '', ''))
sys.path.append(prj_path)
print('Append to path: %s' % prj_path)
Append to path: /Users/lsanchez/roc_curve_with_confidence_intervals
I used the iris dataset to create a binary classification task where the possitive class corresponds to the setosa
class.
The y_score
is simply the sepal length
feature rescaled between [0, 1]
.
data = pd.DataFrame(
datasets.load_iris().data,
columns=datasets.load_iris().feature_names)
target = pd.Series([
datasets.load_iris().target_names[x]
for x in datasets.load_iris().target])
data.head()
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
2 | 4.7 | 3.2 | 1.3 | 0.2 |
3 | 4.6 | 3.1 | 1.5 | 0.2 |
4 | 5.0 | 3.6 | 1.4 | 0.2 |
The AUC and Delong Confidence Interval is calculated via the Yantex's implementation of Delong (see script: auc_delong_xu.py
for further details)
from auc_delong_xu import auc_ci_Delong
y_true = (target != 'setosa').astype(int)
y_score = data['sepal length (cm)']
y_score = (y_score - y_score.min()) / (y_score.max() - y_score.min())
y_pred = (y_score > .5).astype(int)
auc, auc_var, ci = auc_ci_Delong(
y_true=y_true,
y_scores=y_score)
print('ROC AUC: %s, Conf.' % auc)
print('Confidence Interval: %s (95%% confidence)' % str(ci))
ROC AUC: 0.9586000000000001, Conf.
Confidence Interval: [0.93020851 0.98699149] (95% confidence)
version
_
platform x86_64-apple-darwin13.4.0
arch x86_64
os darwin13.4.0
system x86_64, darwin13.4.0
status
major 3
minor 6.1
year 2019
month 07
day 05
svn rev 76782
language R
version.string R version 3.6.1 (2019-07-05)
nickname Action of the Toes
I used the iris dataset to create a binary classification task where the possitive class corresponds to the setosa
class.
The y_score
is simply the sepal length
feature rescaled between [0, 1]
.
library(pROC)
library(datasets)
data(iris)
y_true = as.integer(iris$Species == 'setosa')
y_score = iris$Sepal.Length
y_score = (y_score - min(y_score)) / (max(y_score) - min(y_score))
y_pred = as.integer(y_score > .5)
roc = roc(y_true, y_score)
roc
Type 'citation("pROC")' for a citation.
Attaching package: ‘pROC’
The following objects are masked from ‘package:stats’:
cov, smooth, var
Setting levels: control = 0, case = 1
Setting direction: controls > cases
Call:
roc.default(response = y_true, predictor = y_score)
Data: y_score in 100 controls (y_true 0) > 50 cases (y_true 1).
Area under the curve: 0.9586
print(ci(roc))
95% CI: 0.9302-0.987 (DeLong)