K fold cross validation python code from scratch github

Question

- Actions
  Nội dung chính Show
  Automate any workflow
- Packages
  Host and manage packages
- Security
  Find and fix vulnerabilities
- Codespaces
  Instant dev environments
- Copilot
  Write better code with AI
- Code review
  Manage code changes
- Issues
  Plan and track work
- Discussions
  Collaborate outside of code
- Explore
- All features
- Documentation
- GitHub Skills
- Changelog
- By Size
- Enterprise
- Teams
- Compare all
- By Solution
- CI/CD & Automation
- DevOps
- DevSecOps
- Case Studies
- Customer Stories
- Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
- Repositories
- Topics
- Trending
- Collections
Pricing

Deep Learning Convolutional Neural Network (CNN) using PyTorch and train it to recognize five different classes: (1) Person without a face mask, (2) Person with a “community” (cloth) face mask, (3) Person with a “surgical” (procedural) mask, (4) Person with a “FFP2/N95/KN95”-type mask (you do not have to distinguish between them), and (5) Person with a FFP2/N95/KN95 mask that has a valve. You do not have to consider other mask types (e.g., FFP3), face shields, full/half-face respirators, PPEs, or images that do not show a single face (e.g., groups of people).

kNN From Scratch

Introduction

This repository consists of code and example implementations for my medium article on building k-Nearest Neighbors from scratch and evaluating it using k-Fold Cross validation which is also built from scratch

For PyPI package version please refer to this repository

Neighbors (Image Source: Freepik)

k-Nearest Neighbors

k-Nearest Neighbors, kNN for short, is a very simple but powerful technique used for making predictions. The principle behind kNN is to use “most similar historical examples to the new data.”

k-Nearest Neighbors in 4 easy steps

Choose a value for k
Find the distance of the new point to each record of training data
Get the k-Nearest Neighbors
Making Predictions
- For classification problem, the new data point belongs to the class that most of the neighbors belong to.
- For regression problem, the prediction can be average or weighted average of the label of k-Nearest Neighbors

Finally, we evaluate the model using k-Fold Cross Validation technique

k-Fold Cross Validation

This technique involves randomly dividing the dataset into k-groups or folds of approximately equal size. The first fold is kept for testing and the model is trained on remaining k-1 folds.

5 fold cross validation. Blue block is the fold used for testing. (Image Source: sklearn documentation)

Datasets Used

The datasets used here are taken from UCI Machine Learning Repository

Hayes-Roth Dataset
Car Evaluation Dataset
Breast Cancer Dataset

Car Evaluation and Breast cancer datasets contain text attributes. As we cannot run the classifier on text attributes, we need to convert categorical input features. This is done using LabelEncoder of sklearn.preprocessing. LabelEncoder can be applied on a dataframe or a list. LabelEncoder encodes labels with value between 0 and n_classes-1.

Applying LabelEncoder on entire dataframe

from sklearn import preprocessing

df = pd.DataFrame(data)
df = df.apply(preprocessing.LabelEncoder().fit_transform)

Applying LabelEncoder on a list

labels = preprocessing.LabelEncoder().fit_transform(inputList)

References

More info on Cross Validation can be seen here
kNN
kFold Cross Validation

Pada project ini, akan dilakukan identifikasi nilai mata uang rupiah dengan menggabungkan metode ekstrasi ciri Local Binary Pattern dan metode klasifikasi Naïve Bayes. Serta untuk pengukuran akurasi identifikasi dilakukan dengan metode evaluasi K-Fold Cross Validation. Dataset yang digunakan berupa citra dengan rincian terdapat 120 citra yang terdiri dari 15 citra uang kertas Rp1.000, 15 citra uang kertas Rp2.000, 15 citra uang kertas Rp5.000, 15 citra uang kertas Rp10.000, 15 citra uang kertas Rp20.000, 15 citra uang kertas Rp50.000, 15 citra uang kertas Rp75.000, dan 15 citra uang kertas Rp100.000

How do you do k

Below are the steps for it:.

Randomly split your entire dataset into k”folds”.

For each k-fold in your dataset, build your model on k – 1 folds of the dataset. ... .

Record the error you see on each of the predictions..

Repeat this until each of the k-folds has served as the test set..

How do you do k

k-Fold cross-validation.

Pick a number of folds – k. ... .

Split the dataset into k equal (if possible) parts (they are called folds).

Choose k – 1 folds as the training set. ... .

Train the model on the training set. ... .

Validate on the test set..

Save the result of the validation..

Repeat steps 3 – 6 k times..

What is the best K for k

Sensitivity Analysis for k. The key configuration parameter for k-fold cross-validation is k that defines the number folds in which to split a given dataset. Common values are k=3, k=5, and k=10, and by far the most popular value used in applied machine learning to evaluate models is k=10.