K fold cross validation python code from scratch github

    • Explore
    • All features
    • Documentation
    • GitHub Skills
    • Changelog

    • By Size
    • Enterprise
    • Teams
    • Compare all
    • By Solution
    • CI/CD & Automation
    • DevOps
    • DevSecOps
    • Case Studies
    • Customer Stories
    • Resources

    • GitHub Sponsors

      Fund open source developers

    • The ReadME Project

      GitHub community articles

    • Repositories
    • Topics
    • Trending
    • Collections

  • Pricing

Deep Learning Convolutional Neural Network (CNN) using PyTorch and train it to recognize five different classes: (1) Person without a face mask, (2) Person with a “community” (cloth) face mask, (3) Person with a “surgical” (procedural) mask, (4) Person with a “FFP2/N95/KN95”-type mask (you do not have to distinguish between them), and (5) Person with a FFP2/N95/KN95 mask that has a valve. You do not have to consider other mask types (e.g., FFP3), face shields, full/half-face respirators, PPEs, or images that do not show a single face (e.g., groups of people).

kNN From Scratch

Introduction

This repository consists of code and example implementations for my medium article on building k-Nearest Neighbors from scratch and evaluating it using k-Fold Cross validation which is also built from scratch

For PyPI package version please refer to this repository

K fold cross validation python code from scratch github

Neighbors (Image Source: Freepik)

k-Nearest Neighbors

k-Nearest Neighbors, kNN for short, is a very simple but powerful technique used for making predictions. The principle behind kNN is to use “most similar historical examples to the new data.”

k-Nearest Neighbors in 4 easy steps

  • Choose a value for k
  • Find the distance of the new point to each record of training data
  • Get the k-Nearest Neighbors
  • Making Predictions
    • For classification problem, the new data point belongs to the class that most of the neighbors belong to.
    • For regression problem, the prediction can be average or weighted average of the label of k-Nearest Neighbors

Finally, we evaluate the model using k-Fold Cross Validation technique

k-Fold Cross Validation

This technique involves randomly dividing the dataset into k-groups or folds of approximately equal size. The first fold is kept for testing and the model is trained on remaining k-1 folds.

K fold cross validation python code from scratch github

5 fold cross validation. Blue block is the fold used for testing. (Image Source: sklearn documentation)

Datasets Used

The datasets used here are taken from UCI Machine Learning Repository

  • Hayes-Roth Dataset
  • Car Evaluation Dataset
  • Breast Cancer Dataset

Car Evaluation and Breast cancer datasets contain text attributes. As we cannot run the classifier on text attributes, we need to convert categorical input features. This is done using LabelEncoder of sklearn.preprocessing. LabelEncoder can be applied on a dataframe or a list. LabelEncoder encodes labels with value between 0 and n_classes-1.

Applying LabelEncoder on entire dataframe

from sklearn import preprocessing

df = pd.DataFrame(data)
df = df.apply(preprocessing.LabelEncoder().fit_transform)

Applying LabelEncoder on a list

labels = preprocessing.LabelEncoder().fit_transform(inputList)

References

  • More info on Cross Validation can be seen here
  • kNN
  • kFold Cross Validation

Pada project ini, akan dilakukan identifikasi nilai mata uang rupiah dengan menggabungkan metode ekstrasi ciri Local Binary Pattern dan metode klasifikasi Naïve Bayes. Serta untuk pengukuran akurasi identifikasi dilakukan dengan metode evaluasi K-Fold Cross Validation. Dataset yang digunakan berupa citra dengan rincian terdapat 120 citra yang terdiri dari 15 citra uang kertas Rp1.000, 15 citra uang kertas Rp2.000, 15 citra uang kertas Rp5.000, 15 citra uang kertas Rp10.000, 15 citra uang kertas Rp20.000, 15 citra uang kertas Rp50.000, 15 citra uang kertas Rp75.000, dan 15 citra uang kertas Rp100.000

How do you do k

Below are the steps for it:.
Randomly split your entire dataset into k”folds”.
For each k-fold in your dataset, build your model on k – 1 folds of the dataset. ... .
Record the error you see on each of the predictions..
Repeat this until each of the k-folds has served as the test set..

How do you do k

k-Fold cross-validation.
Pick a number of folds – k. ... .
Split the dataset into k equal (if possible) parts (they are called folds).
Choose k – 1 folds as the training set. ... .
Train the model on the training set. ... .
Validate on the test set..
Save the result of the validation..
Repeat steps 3 – 6 k times..

What is the best K for k

Sensitivity Analysis for k. The key configuration parameter for k-fold cross-validation is k that defines the number folds in which to split a given dataset. Common values are k=3, k=5, and k=10, and by far the most popular value used in applied machine learning to evaluate models is k=10.

How do you calculate k

The general procedure is as follows:.
Shuffle the dataset randomly..
Split the dataset into k groups..
For each unique group: Take the group as a hold out or test data set. Take the remaining groups as a training data set. ... .
Summarize the skill of the model using the sample of model evaluation scores..