Roc curve multiclass random forest python

I was trying to plot ROC curve with classifiers other than svm.SVC which is provided in the documentation. My code works good for svm.SVC; however, after I switched to KNeighborsClassifier, MultinomialNB, and DecisionTreeClassifier, the system keeps telling me check_consistent_length(y_true, y_score)andFound input variables with inconsistent numbers of samples: [26632, 53264] My CSV file looks like this

And here is my code

import pandas as pd import numpy as np import matplotlib.pyplot as plt from itertools import cycle import sys from sklearn import svm, datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from scipy import interp from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import MultinomialNB from sklearn.tree import DecisionTreeClassifier # Import some data to play with df = pd.read_csv("E:\\autodesk\\Hourly and weather categorized2.csv") X =df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].values y = df['TTI_Category'].as_matrix() y=y.reshape(-1,1) # Binarize the output y = label_binarize(y, classes=['Good','Bad']) n_classes = y.shape[1] # shuffle and split training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0) # Learn to predict each class against the other classifier = OneVsRestClassifier(DecisionTreeClassifier(random_state=0)) y_score = classifier.fit(X_train, y_train).predict_proba(X_test) # Compute ROC curve and ROC area for each class fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i]) roc_auc[i] = auc(fpr[i], tpr[i]) # Compute micro-average ROC curve and ROC area fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) roc_auc["micro"] = auc(fpr["micro"], tpr["micro"]) plt.figure() lw = 1 plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0]) plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic example') plt.legend(loc="lower right") plt.show()

I'm suspecting that the error occurs at this line fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) roc_auc["micro"] = auc(fpr["micro"], tpr["micro"]),but I'm a beginner to this ROC curve, so could someone kindly guide me through this traceback. Thanks a lot for your time and help.Here is another question regarding ROC curve from me By the way here is the whole traceback. Hopefully my explanation is clear enough. `

Traceback (most recent call last): File "<ipython-input-1-16eb0db9d4d9>", line 1, in <module> runfile('C:/Users/Think/Desktop/Python Practice/ROC with decision tree.py', wdir='C:/Users/Think/Desktop/Python Practice') File "C:\Users\Think\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace) File "C:\Users\Think\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc) File "C:/Users/Think/Desktop/Python Practice/ROC with decision tree.py", line 47, in <module> fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) File "C:\Users\Think\Anaconda2\lib\site-packages\sklearn\metrics\ranking.py", line 510, in roc_curve y_true, y_score, pos_label=pos_label, sample_weight=sample_weight) File "C:\Users\Think\Anaconda2\lib\site-packages\sklearn\metrics\ranking.py", line 302, in _binary_clf_curve check_consistent_length(y_true, y_score) File "C:\Users\Think\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 173, in check_consistent_length " samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [26632, 53264]

While working through my first modeling project as a Data Scientist, I found an excellent way to compare my models was using a ROC Curve! However, I ran into a bit of a glitch because for the first time I had to create a ROC Curve using a dataset with multiclass predictions instead of binary predictions. I also had to learn how to create a ROC Curve using a Random Forest Classifier for the first time. Since it took me an entire afternoon googling to figure these things out, I thought I would blog about them to hopefully help someone in the future, that being you!

Let’s begin!

After running my random forest classifier, I realized there is no .decision function to develop the y_score, which is what I thought I needed to produce my ROC Curve. However, for a random forest classifier I learned you must instead use .predict_proba instead.

#construct baseline pipeline pipe_rf = Pipeline([('clf', RandomForestClassifier(random_state=123))])

# Fit the model model = pipe_rf.fit(X_train, y_train)

#Calculate the y_score y_score = model.predict_proba(X_test)

Using .predict_proba provides you with a y_score that will need to be binarized using label_binarize from sklearn.preprocessing. In my case, I had 7 classes ranging from 1-7.

#Binarize the output y_test_bin = label_binarize(y_test, classes=[1, 2, 3, 4, 5, 6,7]) n_classes = y_test_bin.shape[1]

Now you can finally create a ROC Curve (and calculate your AUC values) for your multiple classes using the code below!

fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test_bin[:, i], y_score[:, i]) plt.plot(fpr[i], tpr[i], color='darkorange', lw=2) print('AUC for Class {}: {}'.format(i+1, auc(fpr[i], tpr[i]))) plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver Operating Characteristic Curves') plt.show()

And that’s it! I hope this saved you an afternoon of googling!

How do you plot a ROC curve for multiclass in Python?

To plot the multi-class ROC use label_binarize function and the following code. Adjust and change the code depending on your application. In this example, you can print the y_score .

Can ROC curve be used for multiclass classification?

Area under ROC for the multiclass problem roc_auc_score function can be used for multi-class classification.

How do you use the AUC ROC curve for the multi

How do AUC ROC plots work for multiclass models? For multiclass problems, ROC curves can be plotted with the methodology of using one class versus the rest. Use this one-versus-rest for each class and you will have the same number of curves as classes. The AUC score can also be calculated for each class individually.

Is it possible to perform ROC analysis for a multiclass classification problem?

ROC curve is commonly used to compare the performance of models. It is usually used in binary classification, but it can also be used in multiclass classification using averaging methods.

Chủ đề