Mann-whitney u test confidence interval python

Question

I have two datasets (Pandas Series) - ds1 and ds2 - for which I want to calculate 95% confidence interval for difference in mean (if normal) or median (for non-normal).

Nội dung chính Show

Complete python code with worked examples for Mann-Whitney U test and Wilcoxon signed-rank test.
What Shalt Thou Choose — Love or Money?
Getting Our Hands Dirty with Code
For Unpaired Samples
For Paired Samples
Does Mann Whitney U have confidence interval?
How do you do a Mann
How do you find the confidence interval in Python?

For difference in mean, I calculate t test statistic and CI as such:

import statsmodels.api as sm
tstat, p_value, dof = sm.stats.ttest_ind(ds1, ds2)
CI = sm.stats.CompareMeans.from_data(ds1, ds2).tconfint_diff()

for median, I do:

from scipy.stats import mannwhitneyu
U_stat, p_value = mannwhitneyu(ds1, ds2, True, "two-sided")

How do I to calculate CI for difference in median?

asked Aug 14, 2018 at 16:07

1

I came across a paper (Calculating confidence intervals for some non-parametric analyses by MICHAEL J CAMPBELL, MARTIN J GARDNER) that gave CI formula.

Based on that:

from scipy.stats import norm

ct1 = ds1.count()  #items in dataset 1
ct2 = ds2.count()  #items in dataset 2
alpha = 0.05       #95% confidence interval
N = norm.ppf(1 - alpha/2) # percent point function - inverse of cdf

# The confidence interval for the difference between the two population
# medians is derived through these nxm differences.
diffs = sorted([i-j for i in ds1 for j in ds2])

# For an approximate 100(1-a)% confidence interval first calculate K:
k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))

# The Kth smallest to the Kth largest of the n x m differences 
# ct1 and ct2 should be > ~20
CI = (diffs[k], diffs[len(diffs)-k])

answered Aug 15, 2018 at 15:27

VanTanVanTan

5973 silver badges12 bronze badges

1

Complete python code with worked examples for Mann-Whitney U test and Wilcoxon signed-rank test.

Photo by Daiga Ellaby on Unsplash

“I think I need to visit the doctor”, said sleepy-eyed Janhvi.

“Why?”, I asked, still half-asleep, enjoying the lazy morning vibes.

“Feeling quite tired since the last few days, and my appetite’s gone. I better get diagnosed”, she answered, while pulling my blanket away and half-kicking me out of bed. That was my cue to get ready and drive her to the doctor — there went my lovely morning!

“O! But you look stunning in that white dress. I’m sure you’ll be okay in a couple of hours” — buttering always helps, I thought. The expression in her eyes while looking at me could have given the “dull” emoji a run for its money, so I cut my losses and got up. Time to go and visit our trusted doctor.

What Shalt Thou Choose — Love or Money?

“Your blood plasma count has dipped a bit. Working under stress and not eating right, are you Janhvi?”, said “Doctor Uncle” scoldingly.

“I can write you the prescription for your usual medicines, should be fine in a couple of weeks. There is however, this new brand which seems to be giving better results in the patients I have tried them on, in the past 6 months. I think you should try it. Results are evident in a week.”

“Same price?”, I enquired, albeit a bit too eagerly, I must confess.

“I thought asking about any new side effects would have been the first question. But yeah, it costs a bit more. Which one then?”, Doctor uncle quipped dryly without shifting his eyes from Janhvi.

The costly one it was then. My mind raced for the next 10 minutes to come up with a lengthy excuse to save face. In the car, driving back home, the data scientist in me kicked in. “You know, statisticians have a horde of tests to compare two categories or treatments. They just don’t go about saying that one is better than the other based on their gut feeling. With so few patients, it’s difficult to generalize findings of difference between approaches.”

Janhvi: “Mmm…hmmm”

Me: “I imagine in this case, one could very well document the plasma count for every patient undergoing treatment under our doctor, divide it into two groups based on the treatment they underwent and then use statistical tests to check if there’s really any difference.”

Janhvi: “Mmm…hmmm”

Me: “It’s your blood plasma count we are measuring, isn’t it? I think that does not follow a normal distribution. A Mann-Whitney U test should give us the answer.”

Janhvi: “Say what?! My blood plasma is not normal? Is that what you are saying?”

Me: “No, no! I said it does not follow a normal distribution. See that far-away mountain? Imagine that, just in 2D. A peak in the middle with gradual tapering to the left and right. Most of the things we measure in our universe tend to follow a normal distribution — like height, blood pressure, test scores or IQ. Without going into the Maths, if I were to count all the people, or at least many of them, on this planet in the range of say, 6 inches to 20 foot, and jot it down in the following manner in a table— count of shortest on the left-most side → count of 1 inch taller to its right → count of 1 inch taller to its right → so on till count of tallest as the right-most entry, then I will see an intersting pattern, observed in so many things we experience around us.”

Height & Count

Janhvi: “Which is?” Her interest grew.

Me: “A mountain, a Bell curve. If I go to our garden and start stacking bricks to create a new column for each entry I made in my table (each new column erected just to the right of the previous one), with the height of the brick column equal to the count I entered in my table, then you will have a mountain-shaped structure.”

Normal distribution

Janhvi: “Interesting. Don’t do that in our garden.”

Me: “Ok.”

“I think that’s one of the reasons why we are able to empathize with others in this world. Most of us go through the same things, at some point or another.
We share a lot.”
She said softly while looking away into the horizon.

I smiled. I had never thought of it that way. I guess she was right.

“And plasma counts don’t follow this normal distribution?”. She came back to this world as quickly as she had floated away from it.

Me: “No. Not all things do, and plasma counts across people do not, I think. But we can check that later with the doctor.”

Janhvi: “So why does that matter?”

Me: “To put it simply, the type of statistical test to be used depends on the type of data you deal with. In our case, we compare 2 categories of treatments based on the output data, plasma count which is continuous data but non-normal. Mann-Whitney U test is used for such cases.”

Janhvi: “And you will explain this to the doctor, ask him to do this comparison test and will then be able to tell for sure if the new treatment’s better than the old one?”

Me: “Well…… I will be able to say it with some level of confidence. I would be able to make a statement which sounds something like — We can be 95% confident that the difference in the average plasma counts between the treatments lies in the range between x and y. Since this range does not look too big, there’s a good chance both the treatments have pretty much the same effect on patients. Or alternatively, that they are likely to be different.”

“So much work for that statement!”, she exclaimed.

“Anything for you dear! I would never let you take a new medicine without being reasonably confident about it myself”, I said, hoping to convey a glint of pride and care in my voice. “You may not realize it right now, but such tests are critical to how studies are conducted in the pharmaceutical or financial sectors, among others.”

Janhvi: “Yeah, right. And I suppose just by luck you had a chance to study about these tests in the past week and have found them quite interesting?”

Me: “Yes! Finding out the confidence intervals for some of these tests, called non-parametric tests, is not easy and no standard programming packages exist, at least not in python. So I did some digging around and created custom functions for those. I plan to write an article on it soon. How did you know?”

Janhvi: “I know you too well my dearest. We’re almost home. Tell you what, I feel like I need to sleep till late evening and I’ll be famished when I get up. How about you finish off your article in peace in the next few hours and then cook me a wonderful dinner, so that I totally forget about the ‘price’ question you asked Doctor uncle?”

“Sounds fair”, I said, grinning sheepishly.

Getting Our Hands Dirty with Code

So here it is, the step-by-step code.

The methodology behind the code has been referenced from the British Medical Journal, volume 296, 21 May 1988: “Calculating confidence intervals for some non-parametric analyses” by M J Campbell & M J Gardner.

You can download my complete code either from Colab or GitHub.

For Unpaired Samples

non_param_unpaired_CI python function

Let’s go through the example listed in the paper itself, so that we can compare results:

Unpaired dataset

n =Group 1 or Sample 1 size = 10

m =Group 2 or Sample 2 size = 10

Sort both the Samples’ values and out them as column headers and row indices respectively. We need to calculate the difference of each observation in Sample 1 with every observation in Sample 2, giving us nxm differences. Here’s how it looks:

n x m differences

In our python function, non_param_unpaired_CI( ), this is achieved by the following line of code:

n x m differences in python

The estimate of the difference in population medians or means is now given by the median of these differences. From the 100 differences in the table the 50th smallest difference is -6 g/l and the 51st is -5 g/l so the median difference is estimated as (-6+(-5))/2= -5.5 g/l. This can be calculated in python by :

median of the differences

We need the Kth smallest and Kth largest median difference value to find the interval range. K is given as:

K for unpaired samples

Here, this term is simply the percentile at a given confidence level. If our confidence level is 0.95, then alpha would be (1–0.95) = 0.05. We can use this in python to calculate N as :

N calculated

and hence K as :

K calculated

This comes out to be 24. The 24th smallest difference is -10 g/l and the 24th largest is + 1 g/l. Hence the 95% confidence interval for the difference in population medians is from -10 g/l to +1 g/l.

For Paired Samples

This can be used in case of Wilcoxon signed-rank test.

non_param_paired_CI python function

Worked example — paired samples

Notice that the very first step here is to calculate the differences in observations of each sample (After-before).

n = size of each Sample = 11

Let’s now calculate our N and K, with confidence level 0.95 and alpha = (1–0.95) = 0.05 :

N calculated

K formula

K calculated

K comes out to be 11.

We’ll now create a n x n table with all the average of differences. Since the column headers and row indices will be the same, we only need the n(n+1)/2 values :

Average of differences

In python :

Average of differences in python

The 11th smallest and the 11th largest averages are 11.9 and 25.1, which are therefore, our intervals.

Photo by Lana Abie on Unsplash

Phew! Just in time! Hope this helps someone out there. Do let me know if it does. Now off I go, got to make a spectacular dinner :)

Does Mann Whitney U have confidence interval?

The Mann-Whitney test is a commonly used non-parametric alternative of the two-sample t-test. Despite its frequent use, it is only rarely accompanied with confidence intervals of an effect size.

How do you do a Mann

A Mann-Whitney U test is used to compare the differences between two samples when the sample distributions are not normally distributed and the sample sizes are small (n <30)..

Step 1: Create the data. ... .

Step 2: Conduct a Mann-Whitney U Test. ... .

Step 3: Interpret the results..

How do you find the confidence interval in Python?

Confidence interval calculator in Python.

import numpy as np. ... .

x = np.random.normal(size=100) ... .

m = x.mean() ... .

t_crit = np.abs(t.ppf((1-confidence)/2,dof)) ... .

(m-s*t_crit/np.sqrt(len(x)), m+s*t_crit/np.sqrt(len(x))) # (-0.14017768797464097, 0.259793719043611).

What is p

The p-value represents the probability of getting a test-statistic at least as extreme† as the one you had in your sample, if the null hypothesis were true.

programming python What is p Python utest Mann-Whitney test