Python t test 2 sample

A two sample t-test is used to test whether or not the means of two populations are equal.

Nội dung chính Show

Additional Resources
Assumptions
Two sample T-Test in Python
Method 1: Using Scipy library
Method 2: Two-Sample T-Test with Pingouin
Method 3: Two-Sample T-Test with Statsmodels
How do you code a two sample t
What is ttest_ind () in Python?
How do you do a 2 t

This tutorial explains how to conduct a two sample t-test in Python.

Researchers want to know whether or not two different species of plants have the same mean height. To test this, they collect a simple random sample of 20 plants from each species.

Use the following steps to conduct a two sample t-test to determine if the two species of plants have the same height.

Step 1: Create the data.

First, we’ll create two arrays to hold the measurements of each group of 20 plants:

import numpy as np

group1 = np.array([14, 15, 15, 16, 13, 8, 14, 17, 16, 14, 19, 20, 21, 15, 15, 16, 16, 13, 14, 12])
group2 = np.array([15, 17, 14, 17, 14, 8, 12, 19, 19, 14, 17, 22, 24, 16, 13, 16, 13, 18, 15, 13])

Step 2: Conduct a two sample t-test.

Next, we’ll use the ttest_ind() function from the scipy.stats library to conduct a two sample t-test, which uses the following syntax:

ttest_ind(a, b, equal_var=True)

where:

a: an array of sample observations for group 1
b: an array of sample observations for group 2
equal_var: if True, perform a standard independent 2 sample t-test that assumes equal population variances. If False, perform Welch’s t-test, which does not assume equal population variances. This is True by default.

Before we perform the test, we need to decide if we’ll assume the two populations have equal variances or not. As a rule of thumb, we can assume the populations have equal variances if the ratio of the larger sample variance to the smaller sample variance is less than 4:1.

#find variance for each group
print(np.var(group1), np.var(group2))

7.73 12.26

The ratio of the larger sample variance to the smaller sample variance is 12.26 / 7.73 = 1.586, which is less than 4. This means we can assume that the population variances are equal.

Thus, we can proceed to perform the two sample t-test with equal variances:

import scipy.stats as stats

#perform two sample t-test with equal variances
stats.ttest_ind(a=group1, b=group2, equal_var=True)

(statistic=-0.6337, pvalue=0.53005)

The t test statistic is -0.6337 and the corresponding two-sided p-value is 0.53005.

Step 3: Interpret the results.

The two hypotheses for this particular two sample t-test are as follows:

H0: µ1 = µ2 (the two population means are equal)

HA: µ1 ≠µ2 (the two population means are not equal)

Because the p-value of our test (0.53005) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test. We do not have sufficient evidence to say that the mean height of plants between the two populations is different.

Additional Resources

How to Conduct a One Sample T-Test in Python
How to Conduct a Paired Samples T-Test in Python

In this article, we are going to see how to conduct a two-sample T-test in Python.

This test has another name as the independent samples t-test. It is basically used to check whether the unknown population means of given pair of groups are equal. tt allows one to test the null hypothesis that the means of two groups are equal

Assumptions

Before conducting the two-sample t-test using Python let us discuss the assumptions of this parametric test. Basically, there are three assumptions that we can make regarding the data groups:

Whether the two samples data groups are independent.
Whether the data elements in respective groups follow any normal distribution.
Whether the given two samples have similar variances. This assumption is also known as the homogeneity assumption.

Note that even if our data groups don’t follow the three assumptions discussed above. This is because there is an alternate test present if our data do not fall in the normal distribution or we can transform the dependent data group using different techniques like square root, log, etc

Two sample T-Test in Python

Let us consider an example, we are given two-sample data, each containing heights of 15 students of a class. We need to check whether two different class students have the same mean height. There are three ways to conduct a two-sample T-Test in Python.

Method 1: Using Scipy library

Scipy stands for scientific python and as the name implies it is a scientific python library and it uses Numpy under the cover. This library provides a variety of functions that can be quite useful in data science. Firstly, let’s create the sample data. Now let’s perform two sample T-Test. For this purpose, we have ttest_ind() function in Python.

Syntax: ttest_ind(data_group1, data_group2, equal_var=True/False)
Here,
data_group1: First data group
data_group2: Second data group
equal_var = “True”: The standard independent two sample t-test will be conducted by taking into consideration the equal population variances.
equal_var = “False”: The Welch’s t-test will be conducted by not taking into consideration the equal population variances.

Note that by default equal_var is True

Before conducting the two-sample T-Test we need to find if the given data groups have the same variance. If the ratio of the larger data groups to the small data group is less than 4:1 then we can consider that the given data groups have equal variance. To find the variance of a data group, we can use the below syntax,

Syntax: print(np.var(data_group))
Here,
data_group: The given data group

Python3

import scipy.stats as stats

data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,

17, 16, 14, 19, 20, 21, 15,

15, 16, 16, 13, 14, 12])

data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,

19, 19, 14, 17, 22, 24, 16,

13, 16, 13, 18, 15, 13])

print(np.var(data_group1), np.var(data_group2))

Output:

Two sample T-Test

Here, the ratio is 12.260 / 7.7275 which is less than 4:1.

Performing Two-Sample T-Test

Python3

import scipy.stats as stats

data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,

17, 16, 14, 19, 20, 21, 15,

15, 16, 16, 13, 14, 12])

data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,

19, 19, 14, 17, 22, 24, 16,

13, 16, 13, 18, 15, 13])

stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)

Output:

Performing Two-Sample T-Test

Analyzing the result:

Two sample t-test has the following hypothesis:

H0 => µ1 = µ2 (population mean of dataset1 is equal to dataset2)
HA => µ1 ≠µ2 (population mean of dataset1 is different from dataset2)

Here, since the p-value (0.53004) is greater than alpha = 0.05 so we cannot reject the null hypothesis of the test. We do not have sufficient evidence to say that the mean height of students between the two data groups is different.

Method 2: Two-Sample T-Test with Pingouin

Pingouin is a statistical-type package project that is based on Pandas and NumPy. Pingouin provides a wide range of features. The package is used to conduct the T-Test but also for computing the degree of freedoms, Bayers factor, etc.

Firstly, let’s create the sample data. We are creating two arrays and now let’s perform two sample T-Test. For this purpose, we have ttest() function in the pingouin package of Python. The syntax is given below,

Syntax: ttest(data_group1, data_group2, correction = True/False)
Here,
data_group1: First data group
data_group2: Second data group
correction = “True”: The standard independent two sample t-test will be conducted by taking into consideration the homogeneity assumption.
correction = “False”: The Welch’s t-test will be conducted by not taking into consideration the homogeneity assumption.

Note that by default equal_var is True

Example:

Python3

from statsmodels.stats.weightstats import ttest_ind

import numpy as np

import pingouin as pg

data_group1 = np.array([160, 150, 160, 156.12, 163.24,

160.56, 168.56, 174.12,

167.123, 165.12])

data_group2 = np.array([157.97, 146, 140.2, 170.15,

167.34, 176.123, 162.35, 159.123,

169.43, 148.123])

result = pg.ttest(data_group1,

data_group2,

correction=True)

print(result)

Output:

Two-Sample T-Test with Pingouin

Interpreting the result

This is the time to analyze the result. The p-value of the test comes out to be equal to 0.523, which is greater than the significance level alpha (that is, 0.05). This implies that we can say that the average height of students in one class is statistically not different from the average height of students in another class. Also, the Cohen’s D that is obtained in a t-test is in terms of the relative strength. According to Cohen:

cohen-d = 0.2 is considered as the ‘small’ effect size
cohen-d = 0.5 is considered as the ‘medium’ effect size
cohen-d = 0.8 is considered as the ‘large’ effect size

It implies that even if the two data groups’ means don’t differ by 0.2 standard deviations or more then the difference is trivial, even if it is statistically significant.

Method 3: Two-Sample T-Test with Statsmodels

Statsmodels is a python library that is specifically used to compute different statistical models and for conducting statistical tests. This library makes use of R-style modules and dataframes.

Firstly, let’s create the sample data. We are creating two arrays and now let’s perform the two-sample T-test. Statsmodels library provides ttest_ind() function to conduct two-sample T-Test whose syntax is given below,

Syntax: ttest_ind(data_group1, data_group2)
Here,
data_group1: First data group
data_group2: Second data group

Example:

Python3

from statsmodels.stats.weightstats import ttest_ind

import numpy as np

import pingouin as pg

data_group1 = np.array([160, 150, 160, 156.12,

163.24,

160.56, 168.56, 174.12,

167.123, 165.12])

data_group2 = np.array([157.97, 146, 140.2, 170.15,

167.34, 176.123, 162.35,

159.123, 169.43, 148.123])

ttest_ind(data_group1, data_group2)

Output:

Two-Sample T-Test with Statsmodels

Interpreting the result:

This is the time to analyze the result. The p-value of the test comes out to be equal to 0.521, which is greater than the significance level alpha (that is, 0.05). This implies that we can say that the average height of students in one class is statistically not different from the average height of students in another class.

How do you code a two sample t

There are three ways to conduct a two-sample T-Test in Python..

data_group1: First data group..

data_group2: Second data group..

equal_var = “True”: The standard independent two sample t-test will be conducted by taking into consideration the equal population variances..

What is ttest_ind () in Python?

Calculates the T-test for the means of TWO INDEPENDENT samples of scores. This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances.