Hướng dẫn ttest_ind python

Question

In this Python tutorial, we will learn about the “Python Scipy Ttest_ind” to evaluate one or more populations’ means through hypothesis testing and how to implement it using Python Scipy. Additionally, cover the following topics.

Nội dung chính Show

What is a T-test in the Statistic
Python Scipy ttest_ind alternative
Python Scipy ttest_ind nan
Python Scipy ttest_ind output
Python Scipy ttest_ind axis
Python Scipy ttest_ind equal_var
Python Scipy ttest_ind statistic
Python Scipy ttest_ind degrees of freedom

What is a T-test in the Statistic
Python Scipy ttest_ind
Python Scipy ttest_ind alternative
Python Scipy ttest_ind nan
Python Scipy ttest_ind output
Python Scipy ttest_ind equal_var
Python Scipy ttest_ind axis
Python Scipy ttest_ind statistic
Python Scipy ttest_ind degrees of freedom

What is a T-test in the Statistic

What is a T-test in the Statistic
Python Scipy ttest_ind alternative
Python Scipy ttest_ind nan
Python Scipy ttest_ind output
Python Scipy ttest_ind axis
Python Scipy ttest_ind equal_var
Python Scipy ttest_ind statistic
Python Scipy ttest_ind degrees of freedom

When comparing the means of two groups and their relationships, a t-test is an inferential statistic used to assess whether there is a significant
difference. When data sets have a normal distribution and unknown variances, t-tests are utilized.

When evaluating a hypothesis, the t-test uses the t-statistic, the values of the t-distribution, and the degrees of freedom to assess statistical significance. The t-test establishes the problem statement mathematically by taking a sample from each of the two sets. The two means being equal is taken as the null hypothesis.

Three essential data values are needed to calculate a t-test.
They consist of the mean difference, the standard deviation of each group, and the total number of data values for each group, as well as the difference between the mean values from each data set.

The difference’s effect on chance and whether it is outside that range of chance are both determined by this comparison. The t-test investigates if the difference between the groups is a genuine difference in the study or merely a chance difference.

In this tutorial, we will compute the
T-test of the independent samples using the method of Python Scipy.

Also, check: Python Scipy Stats Norm

To compute the T-test using the means of two independent scoring samples. The Python Scipy has a method ttest_ind() in a module scipy.stats. This is a test of the null hypothesis that the average values of the two independent samples are the same. This test takes for granted
that the populations’ variances are identical.

The syntax is given below.

scipy.stats.ttest_ind(a, b, axis=0, nan_policy='propagate', equal_var=False,  permutations=None, random_state=None, trim=0, alternative="two-sided")

Where the parameters are:

a,b(array_data): The arrays must be identical in shape, except for the axis-corresponding dimension.
axis(int): A axis along which the test is computed. Calculate using the entire arrays, a, and b, if None.
nan_policy: Explains what to do when input contains nan. The following choices are available
(‘propagate’ is the default):

‘propagate’: nan is returned.
“raise”: throws a mistake
‘omit’: Calculations are done by ignoring nan values.

equal_var(boolean): Perform a typical independent two-sample test with identical population variances if True (the default). If False, carry out Welch’s t-test, which does not require equal variance across the population.
permutations: Calculate p-values using the
t-distribution if 0 or None (the default) is selected. If not, the number of random permutations that will be used to calculate the p-values for the permutation test is called permutations. An exact test is conducted in its place if the number of permutations equals or exceeds the number of different partitions of the pooled data.
random_state(int, numpy generator): numpy.random is used if seed is None (or np.random). It uses a singleton of RandomState. If the seed is an
integer, a fresh instance of RandomState is created and seeded with the seed. A Generator or RandomState instance is utilized if the seed already has one. State of the pseudorandom number generator used to produce permutations.
trim(float): Performs a trimmed (Yuen’s) t-test if the result is non-zero. specifies how many elements from either end of the input samples should be removed. If 0 (the default), no trimming will be done to any components on either side. The floor
of the trim multiplied by the number of items is the number of trimmed elements from each tail. The allowed range is [0, 5].
alternative: Describes the alternative hypothesis. The following choices are available (the default is “two-sided”):

“Two-sided” signifies that the distributions’ means, which the samples are drawn from are not equal.
The first sample’s underlying distribution’s mean is lower than the second sample’s
underlying distribution’s mean, which is expressed as “less”.
“Greater” means that the distribution’s mean for the first sample is higher than that of the second sample’s distribution.

The method ttest_ind() returns the statistic and pvalue of type float array.

Let’s take an example and compute the T-test of the independent samples by following the below steps:

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import norm, ttest_ind

Define
random number generator using np.random.default_rng() and generate two samples from norm distribution with the same means using the method norm.rvs().

rnd_num_gen = np.random.default_rng()
samp1 = norm.rvs(loc=3, scale=7, size=250, random_state=rnd_num_gen)
samp2 = norm.rvs(loc=3, scale=7, size=250, random_state=rnd_num_gen)

Now perform the T-test on the samples with the same means using the below code.

ttest_ind(samp1,samp2)

Python Scipy ttest_ind

Here the ttest_ind returns two values, a statistic = 0.295 and pvalue
= 0.76.

Read: Python Scipy Mann Whitneyu

Python Scipy ttest_ind alternative

The parameter alternative of the method ttest_ind() is used to describe the alternative hypothesis.

The alternative parameter accepts the following options.

“two-sided”: signifies that the distributions’ means,
which the samples are drawn from are not equal.
“less”: The first sample’s underlying distribution’s mean is lower than the second sample’s underlying distribution’s mean, which is expressed as “less”.
“greater”: means that the distribution’s mean for the first sample is higher than that of the second sample’s distribution.

Let’s understand with an example how to perform the T-test with an alternative hypothesis by following the below
steps:

Import the required libraries or methods using the below python code.

from scipy.stats import ttest_ind

Create a sample using the below code.

samp_1 = [[1.2,2.1,5.6,1.3],[3.4,2.1,1.6,4.8]]
samp_2 = [[2.4,1.1,3.6,5.8],[0.2,4.1,2.6,6.3]]

Apply the T-test with an alternative hypothesis equal two-sided.

ttest_ind(samp_1,samp_2,axis =1,alternative="two-sided")

Python Scipy ttest_ind alternative two sided

Again apply the T-test with an alternative hypothesis equal to less.

Python Scipy ttest_ind alternative less

Now again, perform the T-test with an alternative hypothesis equal to greater.

Python Scipy ttest_ind alternative greater

This is how to use the alternative hypothesis with the help of Python SciPy ttest_ind.

Read:
Python Scipy Eigenvalues

Python Scipy ttest_ind nan

The method ttest_ind() accepts the parameter nan_policy to handle the nan values within the arrays or samples which we have learned in the above subsection.

nan_policy: Explains what to do when input contains nan. The following choices are available
(‘propagate’ is the default):

‘propagate’: nan is returned.
“raise”: throws a mistake
‘omit’: Calculations are done by ignoring nan values.

Let’s see with examples how to handle the nan values in arrays or samples while performing the T-test.

Import the required methods or libraries using the below python code.

from scipy.stats import ttest_ind
import numpy as np

Generate data with nan values using the below code.

data1 = np.random.randn(30)
data2 = np.random.randn(30)
mask_nan = np.random.choice([1, 0], data1.shape, p=[.1, .9]).astype(bool)
data1[mask_nan] = np.nan
data2[mask_nan] = np.nan

Perform the T-test on the data with nan_policy equal to
raise using the below code.

ttest_ind(data1,data2, nan_policy='raise')

Python Scipy ttest_ind nan raise

Again perform the T-test with nan_policy equal to omit using the below code.

ttest_ind(data1,data2, nan_policy='omit')

Python Scipy ttest_ind nan omit

At
last, perform the T-test with nan_policy equal to propagate using the below code.

ttest_ind(data1,data2, nan_policy='propagate')

Python Scipy ttest_ind nan

This is how to handle the nan
values within the sample while computing the T-test using the method ttest_ind() of Python Scipy with parameter nan_policy.

Read: Python Scipy Stats Mode

Python Scipy ttest_ind output

The method ttest_ind() of Python Scipy returns or outputs the two values after performing the T-test on the sample. The first value is statistic and
second pvalue.

Using these two values, we determine the significance of the means of two samples. To know about the method ttest_ind() refer to the above subsection of this tutorial “Python Scipy ttest_ind”

Let’s see with an example and compute the T-test by following the below steps:

Import the required libraries or methods using the below python code.

from scipy.stats import ttest_ind

Generate two sample data using the below code.

sample_1 = [2.4,5.1,2.6,1.8]
sample_2 = [1.4,2.1,5.6,3.8]

Perform the T-test to get
the two values that we have discussed above.

ttest_ind(sample_1,sample_2)

Python Scipy ttest_ind output

This is how to
perform the T-test on the sample and get the output to determine the significance of the sample.

Read: Python Scipy Minimize

Python Scipy ttest_ind axis

The axis parameter of the method ttest_ind() of Python Scipy allows us to compute the T-test along the specified axis of the given array or sample.

The provided
2-dimensional array has two axes, one that runs vertically across rows is axis 1 and the other that runs horizontally across columns is axis 0.

Here we will see an example of how to compute the T-test along the specified axis of data by following the below steps:

Import the required libraries or methods using the below python code.

from scipy.stats import ttest_ind

Generate sample data using the below code.

samp_1 = [[1.2,2.1,5.6,1.3],[2.4,1.1,3.6,5.8]]
samp_2 = [[2.4,1.1,3.6,5.8],[1.2,2.1,5.6,1.3]]

Perform the T-test on the whole array which is by default.

ttest_ind(samp_1,samp_2)

Now compute
the T-test on the specified axis of the data using the below code.

ttest_ind(samp_1,samp_2,axis =1)

Python Scipy ttest_ind axis

This is how to compute the T-test along the specified axis of the given array or sample using the method ttest_ind() with parameter axis.

Read:
Python Scipy Exponential

Python Scipy ttest_ind equal_var

If we have data samples with equal variances, then what we will do in that case?, We will use the parameter equal_var of method ttest_ind() of type boolean of Python Scipy.

When there is the same number of samples in each group or when the variance of the two data
sets is comparable, the identical variance t-test, an independent t-test, is used.

The parameters accept two values True or False. Let’s see with an example by following the below steps:

Import the required libraries or methods using the below code.

import numpy as np
from scipy.stats import norm, ttest_ind

Generate data with equal variance using the below code.

rnd_num_gen = np.random.default_rng()
samp1 = norm.rvs(loc=4, scale=5, size=100, random_state=rnd_num_gen)
samp2 = norm.rvs(loc=4, scale=5, size=200, random_state=rnd_num_gen)

Compute the T-test on the above sample with equal variances using the below code.

ttest_ind(samp1,samp2)

Python Scipy ttest_ind equal_var

This is how to compute the T-test of the sample with equal means using the method ttest_ind() with parameter equal_var.

Read: Scipy Find Peaks

Python Scipy ttest_ind statistic

The method ttest_ind() of Python Scipy returns the value t-statistic that we have already learned in the subsection Python Scipy ttest_ind output. The
t-statistic measures how far an estimated value of a parameter deviates from its hypothesized value about its standard error.

Let’s do an example by following the below steps:

Import the required libraries or methods using the below python code.

from scipy.stats import ttest_ind

Generate sample data using the below code.

samp_data1 = [[0.2,5.1,1.6,1.3],[2.4,1.1,3.6,5.8]]
samp_data2 = [[1.4,2.1,5.6,3.8],[2.2,5.1,1.6,5.3]]

Compute the T-test and get the t-statistic value using the below code.

ttest_ind(samp_data1,samp_data2)

Python Scipy ttest_ind statistic

In the above output, statistic=array([-0.42717883, -0.2,....)] is the t-statistic value.

Read: Python Scipy Special Module

Python Scipy ttest_ind degrees of freedom

First, we are going
to know about “What are degrees of freedom?“, The number of independent data points used to calculate an estimate is referred to as the degree of freedom of the estimate.

It’s not the same as the sample’s sample size. We must deduct 1 from the total number of items to obtain the degrees of freedom for the estimate.

Imagine we were looking for the average weight loss for a diet. One option is to utilise 50 persons with df = 49, or 10 people with 9 degrees
of freedom (10 – 1 = 9).

The amount of values in a data collection that is free to change is another way to think about degrees of freedom. “Free to change” – what does that mean? The mean (average) is used in the following example:

Choose a group of numbers with an average (mean) of 10, Like we could choose from the following sets of numbers: 7, 9, 11, 2, 10, 9, or 4, 8, 12.

The third number in the set is fixed once we’ve selected the first two. In other words, we are
unable to select the third piece from the group. The first two numbers are the only ones that can change. We can choose 7 + 9 or 2 + 10, but once we’ve made our choice, we must select a specific number that will yield the desired mean. Therefore, a set of three numbers has TWO degrees of freedom.

Also, take a look at some more Python SciPy tutorials.

Python Scipy Matrix +
Examples
Python Scipy Derivative of Array
Scipy Linalg – Helpful Guide
Scipy Stats Zscore + Examples
Scipy Signal – Helpful Tutorial

So, in this tutorial, we have learned about the “Python Scipy ttest_ind” and covered the following topics.

What is a T-test in the Statistic
Python Scipy ttest_ind
Python Scipy ttest_ind alternative
Python Scipy ttest_ind nan
Python Scipy ttest_ind output
Python Scipy ttest_ind
equal_var
Python Scipy ttest_ind axis
Python Scipy ttest_ind statistic
Python Scipy ttest_ind degrees of freedom

Python is
one of the most popular languages in the United States of America. I have been working with Python for a long time and I have expertise in working with various libraries on Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… I have experience in working with various clients in countries like United States, Canada, United Kingdom, Australia, New Zealand, etc. Check out my
profile.