You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been attempting to plot ROC curves with so.Line, and unless I'm missing something, it appears that it's plotting some line segments out of order.
It appears that these problems arise when two consecutive datapoints have the same x value but different y values.
(Note: I'm using Numpy data .... perhaps so.Plot() is not yet intended to be used with Numpy Data?)
First, here's the code to create the data that I've notice the problem with.
# roc curve and auc
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
import seaborn.objects as so
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
# GENERATE DATASET WITH 2 CLASSES
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
# SPLIT DATA INTO TRAIN/TEST
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
# FIT MODEL
my_logistic_reg = LogisticRegression()
my_logistic_reg.fit(X_train, y_train)
# predict probabilities
probabilities_logistic_reg = my_logistic_reg.predict_proba(X_test)
# keep probabilities for the positive outcome only
probabilities_logistic_posclass = probabilities_logistic_reg[:, 1]
# calculate roc curves
falseposrate_logistic, trueposrate_logistic, _ = roc_curve(y_test, probabilities_logistic_posclass)
Next, we plot the data as a line chart, with the Seaborn Objects interface:
# PLOT: SEABORN OBJECTS
# - the line segments between the 11th, 12th, 13th, and 14th datapoints seem to be wrong
(so.Plot()
.add(so.Line(color = 'red'),x =falseposrate_logistic, y = trueposrate_logistic)
.add(so.Dot(color = 'red', pointsize = 4),x = falseposrate_logistic, y = trueposrate_logistic)
.layout(size = (10,7))
)
OUT:
You'll notice that there's a zig-zag pattern from the 11th to the 14th points and in a few other places. This appears to be incorrect.
We'd expect it to have a stair-step pattern, like this (plotted with Matplotlib):
Additionally, if we subset the data down to try to isolate the issue in the 11th to the 14th points, you'll see that the original mis-plotting resolves, and a new issue appears between points 7 and 10:
# NOW, PLOT A SUBSET
# – with a different subset,
# the original issue dissappears
# but a new issue arrises between points 7 and 10
subset_size = 17
true_positive_subset = trueposrate_logistic[0:subset_size]
false_positive_subset = falseposrate_logistic[0:subset_size]
(so.Plot()
.add(so.Line(color = 'red'),x =false_positive_subset , y = true_positive_subset)
.add(so.Dot(color = 'red'),x =false_positive_subset , y = true_positive_subset)
)
OUT:
Unless I'm missing something obvious, it appears to be a bug in the order of how so.Line plots line segments between points.
Additionally, I'll note that if you try to plot the data with Plotly express, or sns.lineplot, the plots look fine, like the Matplotlib plot.
# PLOT: PLOTLY
# works fine!
# pio.renderers.default = 'svg'
px.line(x = falseposrate_logistic, y = trueposrate_logistic)
# PLOT: TRADITIONAL SEABORN
# works fine!
plt.figure(figsize = (10,7))
sns.lineplot(x = falseposrate_logistic
,y = trueposrate_logistic
,estimator = None
)
The text was updated successfully, but these errors were encountered:
Hi, thanks for raising and the reproducible example.
What's happening here is that Line sort observations according to the x (or actually, orient) variable. The return values from roc_curve are already sorted (by definition) but the weird behavior you're seeing is happening on ties, i.e., where the FPR is identical but you have different TPRs.
I think a reasonable action item here is for Line to use mergesort, rather than the default quicksort, which is unstable.
In the meantime, you could use Path, which does not sort but is fine for something like an ROC curve where the data is already sorted. BTW you don't need to pass data multiple times to Plot.add (and you can use a second Dot layer, but you don't to):
I have been attempting to plot ROC curves with
so.Line
, and unless I'm missing something, it appears that it's plotting some line segments out of order.It appears that these problems arise when two consecutive datapoints have the same x value but different y values.
(Note: I'm using Numpy data .... perhaps
so.Plot()
is not yet intended to be used with Numpy Data?)First, here's the code to create the data that I've notice the problem with.
Next, we plot the data as a line chart, with the Seaborn Objects interface:
OUT:
You'll notice that there's a zig-zag pattern from the 11th to the 14th points and in a few other places. This appears to be incorrect.
We'd expect it to have a stair-step pattern, like this (plotted with Matplotlib):
OUT:
Additionally, if we subset the data down to try to isolate the issue in the 11th to the 14th points, you'll see that the original mis-plotting resolves, and a new issue appears between points 7 and 10:
OUT:
Unless I'm missing something obvious, it appears to be a bug in the order of how
so.Line
plots line segments between points.Additionally, I'll note that if you try to plot the data with Plotly express, or
sns.lineplot
, the plots look fine, like the Matplotlib plot.The text was updated successfully, but these errors were encountered: