Mean, Median, and ModeWhat can we learn from looking at a group of numbers? Show In Machine Learning (and in mathematics) there are often three values that interests us:
Example: We have registered the speed of 13 cars:
What is the average, the middle, or the most common speed value? MeanThe mean value is the average value. To calculate the mean, find the sum of all values, and divide the sum by the number of values:
The NumPy module has a method for this. Learn about the NumPy module in our NumPy Tutorial. ExampleUse the NumPy import numpy speed = [99,86,87,88,111,86,103,87,94,78,77,85,86] x = numpy.mean(speed) print(x) Run example » MedianThe median value is the value in the middle, after you have sorted all the values:
It is important that the numbers are sorted before you can find the median. The NumPy module has a method for this:
ExampleUse the NumPy import numpy speed = [99,86,87,88,111,86,103,87,94,78,77,85,86] x = numpy.median(speed) print(x) Try it Yourself » If there are two numbers in the middle, divide the sum of those numbers by two.
86.5 ExampleUsing the NumPy module: import numpy speed = [99,86,87,88,86,103,87,94,78,77,85,86] x = numpy.median(speed) print(x) Try it Yourself » ModeThe Mode value is the value that appears the most number of times:
The SciPy module has a method for this. Learn about the SciPy module in our SciPy Tutorial. ExampleUse the SciPy from scipy import stats speed = [99,86,87,88,111,86,103,87,94,78,77,85,86] x = stats.mode(speed) print(x) Try it Yourself » Chapter SummaryThe Mean, Median, and Mode are techniques that are often used in Machine Learning, so it is important to understand the concept behind them. IntroductionWhen we're trying to describe and summarize a sample of data, we probably start by finding the mean (or average), the median, and the mode of the data. These are central tendency measures and are often our first look at a dataset. In this tutorial, we'll learn how to find or compute the mean, the median, and the mode in Python. We'll first code a Python function for each measure followed by using Python's
With this knowledge, we'll be able to take a quick look at our datasets and get an idea of the general tendency of data. Calculating the Mean of a SampleIf we have a sample of numeric values, then its mean or the average is the total sum of the values (or observations) divided by the number of values. Say we have the sample
The mean (arithmetic mean) is a general description of our data. Suppose you buy 10 pounds of tomatoes. When you count the tomatoes at home, you get 25 tomatoes. In this case, you can say that the average weight of a tomato is 0.4 pounds. That would be a good description of your tomatoes. The mean can also be a poor description of a sample of data. Say you're analyzing a group of dogs. If you take the cumulated weight of all dogs and divide it by the number of dogs, then that would probably be a poor description of the weight of an individual dog as different breeds of dogs can have vastly different sizes and weights. How good or how bad the mean describes a sample depends on how spread the data is. In the case of tomatoes, they're almost the same weight each and the mean is a good description of them. In the case of dogs, there is no topical dog. They can range from a tiny Chihuahua to a giant German Mastiff. So, the mean by itself isn't a good description in this case. Now it's time to get into action and learn how we can calculate the mean using Python. Calculating the Mean With PythonTo calculate the mean of a sample of numeric data, we'll use two of Python's built-in functions. One to calculate the total sum of the values and another to calculate the length of the sample. The first function is The second function is Here's how we can calculate the mean:
We first sum the values in Using Python's mean()Since calculating the mean is a common operation, Python includes
this functionality in the Here's how Python's
We just need to
import the Finding the Median of a SampleThe median of a sample of numeric data is the value that lies in the middle when we sort the data. The data may be sorted in ascending or descending order, the median remains the same. To find the median, we need to:
When locating the number in the middle of a sorted sample, we can face two kinds of situations:
If we have the sample On the other hand, if we have the sample Let's take a look at how we can use Python to calculate the median. Finding the Median With PythonTo find the median, we first need to sort the values in our sample. We can achieve that using the built-in The second step is to locate the value that lies in the middle of the sorted sample. To locate that value in a sample with an odd number of observations, we can divide the number of observations by 2. The result will be the index of the value in the middle of the sorted sample. Since a division operator ( If the sample has an even number of observations, then we need to locate the two middle values. Say we have the sample Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it! Let's put all these together in function that calculates the median of a sample. Here's a possible implementation:
This function takes a sample of numeric values and returns its median. We first find the length of the sample, The The final Note that the slicing operation Using Python's median()Python's
Note that Finding the Mode of a SampleThe mode is the most frequent observation (or observations) in a sample. If we have the sample The mode doesn't have to be unique. Some samples have more than one mode. Say we have the sample The mode is commonly used for categorical data. Common categorical data types are:
When we're analyzing a dataset of categorical data, we can use the mode to know which category is the most common in our data. We can find samples that don't have a mode. If all the observations are unique (there aren't repeated observations), then your sample won't have a mode. Now that we know the basics about mode, let's take a look at how we can find it using Python. Finding the Mode with PythonTo find the mode with Python, we'll start by counting the number of occurrences of each value in the sample at hand. Then, we'll get the value(s) with a higher number of occurrences. Since counting objects is a common operation, Python provides the
The Let's use Here's a possible implementation:
We first count the observations in the Since Note that the comprehension's condition compares the count of each observation ( Using Python's mode()Python's
With a single-mode sample, Python's Since
Python 3.8 we can also use Here's an example of how to use
Note: The function always returns a ConclusionThe mean (or average), the median, and the mode are commonly our first looks at a sample of data when we're trying to understand the central tendency of the data. In this tutorial, we've learned how to find or compute the mean, the median, and the mode using Python. We first covered, step-by-step, how to create our own functions to compute them, and then how to use Python's How do you find the mode in Python?Use the mode() Function From the statistics Module to Find the Mode of a List in Python. The mode() function in the python statistics module takes some dataset as a parameter and returns its mode value. This function will raise the StatisticsError when the data set is empty or when more than one mode is present.
How do you find the mean and mode?The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set. The median is the middle value when a data set is ordered from least to greatest. The mode is the number that occurs most often in a data set.
How do you do mean in Python?Arithmetic mean is the sum of data divided by the number of data-points. It is a measure of the central location of data in a set of values which vary in range. In Python, we usually do this by dividing the sum of given numbers with the count of number present.
What is mode in Python?Python has two basic modes: script and interactive. The normal mode is the mode where the scripted and finished . py files are run in the Python interpreter. Interactive mode is a command line shell which gives immediate feedback for each statement, while running previously fed statements in active memory.
|