Roc curve multiclass random forest python

I was trying to plot ROC curve with classifiers other than svm.SVC which is provided in the documentation. My code works good for svm.SVC; however, after I switched to KNeighborsClassifier, MultinomialNB, and DecisionTreeClassifier, the system keeps telling me check_consistent_length(y_true, y_score)andFound input variables with inconsistent numbers of samples: [26632, 53264] My CSV file looks like this

And here is my code

import pandas as pd import numpy as np import matplotlib.pyplot as plt from itertools import cycle import sys from sklearn import svm, datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from scipy import interp from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import MultinomialNB from sklearn.tree import DecisionTreeClassifier # Import some data to play with df = pd.read_csv("E:\\autodesk\\Hourly and weather categorized2.csv") X =df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].values y = df['TTI_Category'].as_matrix() y=y.reshape(-1,1) # Binarize the output y = label_binarize(y, classes=['Good','Bad']) n_classes = y.shape[1] # shuffle and split training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0) # Learn to predict each class against the other classifier = OneVsRestClassifier(DecisionTreeClassifier(random_state=0)) y_score = classifier.fit(X_train, y_train).predict_proba(X_test) # Compute ROC curve and ROC area for each class fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i]) roc_auc[i] = auc(fpr[i], tpr[i]) # Compute micro-average ROC curve and ROC area fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) roc_auc["micro"] = auc(fpr["micro"], tpr["micro"]) plt.figure() lw = 1 plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0]) plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic example') plt.legend(loc="lower right") plt.show()

I'm suspecting that the error occurs at this line fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) roc_auc["micro"] = auc(fpr["micro"], tpr["micro"]),but I'm a beginner to this ROC curve, so could someone kindly guide me through this traceback. Thanks a lot for your time and help.Here is another question regarding ROC curve from me By the way here is the whole traceback. Hopefully my explanation is clear enough. `

Traceback (most recent call last): File "<ipython-input-1-16eb0db9d4d9>", line 1, in <module> runfile('C:/Users/Think/Desktop/Python Practice/ROC with decision tree.py', wdir='C:/Users/Think/Desktop/Python Practice') File "C:\Users\Think\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace) File "C:\Users\Think\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc) File "C:/Users/Think/Desktop/Python Practice/ROC with decision tree.py", line 47, in <module> fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) File "C:\Users\Think\Anaconda2\lib\site-packages\sklearn\metrics\ranking.py", line 510, in roc_curve y_true, y_score, pos_label=pos_label, sample_weight=sample_weight) File "C:\Users\Think\Anaconda2\lib\site-packages\sklearn\metrics\ranking.py", line 302, in _binary_clf_curve check_consistent_length(y_true, y_score) File "C:\Users\Think\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 173, in check_consistent_length " samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [26632, 53264]

While working through my first modeling project as a Data Scientist, I found an excellent way to compare my models was using a ROC Curve! However, I ran into a bit of a glitch because for the first time I had to create a ROC Curve using a dataset with multiclass predictions instead of binary predictions. I also had to learn how to create a ROC Curve using a Random Forest Classifier for the first time. Since it took me an entire afternoon googling to figure these things out, I thought I would blog about them to hopefully help someone in the future, that being you!

Let’s begin!

After running my random forest classifier, I realized there is no .decision function to develop the y_score, which is what I thought I needed to produce my ROC Curve. However, for a random forest classifier I learned you must instead use .predict_proba instead.

#construct baseline pipeline pipe_rf = Pipeline([('clf', RandomForestClassifier(random_state=123))])

# Fit the model model = pipe_rf.fit(X_train, y_train)

#Calculate the y_score y_score = model.predict_proba(X_test)

Using .predict_proba provides you with a y_score that will need to be binarized using label_binarize from sklearn.preprocessing. In my case, I had 7 classes ranging from 1-7.

#Binarize the output y_test_bin = label_binarize(y_test, classes=[1, 2, 3, 4, 5, 6,7]) n_classes = y_test_bin.shape[1]

Now you can finally create a ROC Curve (and calculate your AUC values) for your multiple classes using the code below!

fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test_bin[:, i], y_score[:, i]) plt.plot(fpr[i], tpr[i], color='darkorange', lw=2) print('AUC for Class {}: {}'.format(i+1, auc(fpr[i], tpr[i]))) plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver Operating Characteristic Curves') plt.show()

And that’s it! I hope this saved you an afternoon of googling!

How do you plot a ROC curve for multiclass in Python?

To plot the multi-class ROC use label_binarize function and the following code. Adjust and change the code depending on your application. In this example, you can print the y_score .

Can ROC curve be used for multiclass classification?

Area under ROC for the multiclass problem roc_auc_score function can be used for multi-class classification.

How do you use the AUC ROC curve for the multi

How do AUC ROC plots work for multiclass models? For multiclass problems, ROC curves can be plotted with the methodology of using one class versus the rest. Use this one-versus-rest for each class and you will have the same number of curves as classes. The AUC score can also be calculated for each class individually.

Is it possible to perform ROC analysis for a multiclass classification problem?

ROC curve is commonly used to compare the performance of models. It is usually used in binary classification, but it can also be used in multiclass classification using averaging methods.

Roc curve multiclass random forest python

How do you plot a ROC curve for multiclass in Python?

Can ROC curve be used for multiclass classification?

How do you use the AUC ROC curve for the multi

Is it possible to perform ROC analysis for a multiclass classification problem?

Bài Viết Liên Quan

Convert text to dataframe python

Hướng dẫn php 7 error log

Hướng dẫn python request recv

Hướng dẫn dùng update cookies trong PHP

Hướng dẫn php api list

Tuyển tập nhạc trẻ hay 2023

Hướng dẫn code trắc nghiệm javascript

Hướng dẫn json php extension ubuntu

Hướng dẫn python mask image region

Filter vowels from string python

Toplist

Top 30 bài tập bổ trợ tiếng anh 6 i learn smart world 2022

Top 10 giáo án tự nhiên xã hội lớp 3 cả năm môi nhất violet 2022

Top 9 download mẫu phong bì mừng đám cưới 2022

Top 9 gia đình và con cái ông nguyễn phú trọng 2022

Top 29 lời dân chương trình bài hát gửi về quan họ 2022

Top 10 giáo án i learn smart world violet 2022

Top 9 đề thi vào lớp 6 trường lê lợi hà đông môn toán 2022

Top 10 thủ tục giám đốc thẩm và tái thẩm trong tố tụng hành chính 2022

Top 9 lễ cô sáu ở công viên tuổi trẻ 2022

Bài mới nhất

What is the eye appearing top scrren samsung năm 2024

Phương thức thanh toán quốc tế an toàn nhất năm 2024

Chưa yêu lần nào biết ra làm sao remix năm 2024

Giải bài 27 trang 16 sgk toán 9 tập 1 năm 2024

Kẹo sâm hàn quốc loại nào tốt nhất năm 2024

Lào cai có địa điểm du lịch nào năm 2024

Lỗi blade and soul a debugger has been found năm 2024

Đề kiểm tra năng lực giáo viên thpt môn toán năm 2024

Bài tập ngữ văn 11 tập 2 trang 114 năm 2024

Lỗi khoogn đọc được file pdb trên visual năm 2024

Chủ đề