EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Scikit Learn Tutorial Scikit Learn Cross-Validation
Secondary Sidebar
Scikit Learn Tutorial
  • Scikit Learn Basic and Advanced
    • Scikit learn
    • Scikit Learn Logistic Regression
    • Scikit Learn SVM
    • Scikit Learn KMeans
    • Scikit Learn Clustering
    • Scikit Learn Decision Tree
    • Scikit Learn Cross-Validation
    • Scikit Learn KNN
    • Scikit Learn Naive Bayes
    • Scikit Learn Classification
    • Scikit Learn Classifiers
    • Scikit Learn XGBoost
    • Scikit Learn Linear Regression
    • Scikit Learn PCA
    • Scikit Learn Random Forest
    • Scikit Learn Cheat Sheet
    • Scikit Learn Train Test Split
    • Scikit Learn Neural Network
    • Scikit Learn Datasets
    • Scikit Learn Pipeline
    • Scikit Learn LDA
    • Scikit Learn Metrics
    • Scikit Learn Examples
    • Scikit Learn t-SNE

Scikit Learn Cross-Validation

Scikit Learn Cross-Validation

Introduction to Scikit Learn Cross-Validation

Scikit learn cross-validation is the technique that was used to validate the performance of our model. This technique is evaluating the models into a number of chunks for the data set for the set of validation. By using scikit learn cross-validation we are dividing our data sets into k-folds. In this k will represent the number of folds from which we want to split our cross-validation data.

Key Takeaways

  • The k-fold cross validation is a popular technique in scikit learn. Class is configured by using a number of folds then we are calling split function by passing the dataset.
  • The split function result will be enumerated by the given row indexes for the test and train set of each fold.

Overview of Scikit Learn Cross Validation

The model which was repeating the labels for the samples is a perfect score but it will fail to predict anything on to the unseen data, this situation in scikit learn is called overfitting. To avoid this situation the best common practice at the time of performing machine learning we need to hold the part of data into the test set.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

At the time of evaluating the different types of setting for the estimators such as a C then we need to manually set the SVM. At the time of partitioning the data which was available in three sets. We can reduce the number of samples which was used for learning the model and the same result depends on the random choice for the set of pairs.

The measure of performance is reported by using k-fold cross validation after it will average the values which were computed into the loop, this approach is computationally expensive but it is not wasting the data, it is a major advantage for the problems such as an inference where samples are small.

Scikit Learn Cross Validation Performance

Setting the data set for testing and training is an essential and basic task when we are getting the model of machine learning. For determining whether our model is overfitting or not then we need to test the data which was unseen. If suppose the given model is not performing well then the validation set is not performing well at the time of dealing with the real data. This notion will make cross-validation the most important concept which will ensure the model stability.

We are using the automatic configuration method instead of using a specific configuration. There is a final model which is the best configuration model found by using the procedure.

Below example shows scikit learn cross-validation performance as follows:

Code:

from sklearn import metrics
from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
X, y = datasets.load_iris (return_X_y = True)
X.shape, y.shape

Output:

Scikit Learn Cross-Validation 1

In the below example, we are defining the array condition of a specified number as follows. We are validating the performance as follows.

Code:

from sklearn import metrics
from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.model_selection import cross_val_score
learn = svm.SVC (kernel='linear', C = 3, random_state = 48)
scikit = cross_val_score(learn, X, y, cv = 7)
scikit

Output:

Scikit Learn Cross-Validation 2

Scikit Learn Cross Validation Metrics

The very simple way to use cross-validation is to call the cross_val_score helper function onto the dataset and estimator. In the following example, we can see how we are estimating the accuracy of kernel support which was linear in a vector machine onto the dataset which is iris by using consecutive time as 10.

Code:

from sklearn import metrics
from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.model_selection import cross_val_score
cross = svm.SVC(kernel='linear', C = 2, random_state = 52)
metrics = cross_val_score(cross, X, y, cv = 10)
metrics

Output:

Scikit Learn Cross-Validation 3

In the below example, we are printing the scikit learn cross-validation metrics accuracy as follows.

Code:

from sklearn import metrics
from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.model_selection import cross_val_score
cross = svm.SVC(kernel='linear', C = 2, random_state = 52)
metrics = cross_val_score(cross, X, y, cv = 10)
metrics
print ("%0.2f accuracy %0.2f" % (metrics.mean(), metrics.std()))

Output:

Scikit Learn Cross-Validation 4

In the below example, we are defining the scikit learn cross-validation metrics of the random state as follows.

Code:

from sklearn import metrics
from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.model_selection import ShuffleSplit
sea = svm.SVC(kernel = 'linear', C = 1, random_state=46)
scikit = X.shape[0]
metrics = ShuffleSplit (n_splits=5, test_size=0.3, random_state=0)
cross_val_score(sea, X, y, cv = metrics)

Output:

metrics of random state

Scikit Learn Cross Validation Iterators

We assume that some data is independent and is identically distributed for making the assumption to sample them from a generative process to assume the memory of past generated samples.

Below example shows 2 fold cross-validation of the dataset by using 4 samples as follows:

Code:

import numpy as np
from sklearn.model_selection import KFold
scikit = ["M", "A", "O", "Q"]
learn = KFold (n_splits=2)
for train, test in learn.split (scikit):
  print("%s %s" % (train, test))

Output:

Scikit Learn Cross-Validation 6

In the below example, we are using the repeated k-fold. The repeated k-fold will repeat the k-fold n times. It is used when we require to run the k-fold n number of times.

Code:

import numpy as np
from sklearn.model_selection import RepeatedKFold
iter = np.array([[21, 32], [13, 24], [31, 42], [53, 64]])
scikit = 14527
learn = RepeatedKFold(n_splits = 3, n_repeats = 3, random_state = scikit)
for train, test in learn.split(iter):
  print ("%s %s" % (train, test))

Output:

repeated k-fold

Examples of Scikit Learn Cross-Validation

Given below are the examples mentioned:

Example #1

Below is the example of scikit learn cross-validation as follows. In the below example, we are defining k-fold cross validation as follows.

Code:

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LogisticRegression
learn = load_iris()
A1 = learn.data
A2 = learn.target
exp = LogisticRegression()
scikit = KFold(n_splits=5)
sci = cross_val_score(exp, A1, A2, cv = scikit)
print("Cross Validation {}".format(sci))
print("Average :{}".format(sci.mean()))

Output:

Scikit Learn Cross-Validation 8

Example #2

In the below example, we are defining the cross-validation scheme which combines the stratified k-fold and group k-fold as follows.

Code:

from sklearn.model_selection import StratifiedGroupKFold
P1 = list(range(18))
P2 = [1] * .. * 12
sci = []
learn = StratifiedGroupKFold(n_splits=3)
for train, test in learn.split(P1, P2, groups = sci):
print("%s %s" % (train, test))

Output:

k-fold and group k-fold

FAQ

Given below are the FAQs mentioned:

Q1. What is the use of scikit learn cross-validation in python?

Answer:

The scikit learn cross-validation is used to validate the performance of a model, it will evaluate the chunk’s performance.

Q2. Which libraries do we need to use while working with scikit learn cross-validation?

Answer:

We need to use the matplotlib, numpy, and pandas library at the time of using scikit learn cross-validation.

Q3. What is the use of time series split in scikit learn cross-validation?

Answer:

The time series split is a variation of k-fold which was returning the k-fold as a test set.

Conclusion

While evaluating the different types of setting for estimators such as C then we need to manually set the SVM. Scikit learn cross-validation is the technique that was used to validate the performance of our model. By using scikit learn cross-validation we are dividing our data sets into k-folds.

Recommended Articles

This is a guide to Scikit Learn Cross-Validation. Here we discuss the introduction, performance & metrics, iterators, examples, and FAQ. You may also have a look at the following articles to learn more –

  1. Scikit Learn Logistic Regression
  2. Scikit Learn Version
  3. Scikit Learn SVM
  4. Scikit learn
Popular Course in this category
Scikit-learn Course (3 Courses, 1 Project)
  3 Online Courses |  1 Hands-on Projects |  30+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more