Paths To Knowledge (dot Science)

What is actually real in Objective Reality? How do you know? Now, prove it's real!

The Mean Median, A Mode of Confusion

Posted by pwl on January 14, 2010

Not to be mean but to clarify a conversation about global warming where there was a mode of confusion about the meaning of “mean” when “median” was meant by the other party. (I apologize for being in pun mode but there are some mean puns in there that were not meant to be avoided). [:)]

The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of values, or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely (mode).

This confusion was the case in the conversation. It’s important that all parties get the same meaning for terms being used. This primer is to clarify for anyone who wants to comprehend these basic yet important statistical terms.

“The median is primarily used for skewed distributions, which it summarizes differently than the arithmetic mean. Consider the multiset { 1, 2, 2, 2, 3, 9 }. The median is 2 in this case, as is the mode, and it might be seen as a better indication of central tendency than the arithmetic mean of 3.166. Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.”

“The term central tendency refers to the “middle” value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best to use depends upon the situation.

Mean: The mean is the most commonly-used measure of central tendency. When we talk about an “average”, we usually are referring to the mean. The mean is simply the sum of the values divided by the total number of items in the set. The result is referred to as the arithmetic mean. …

Median: The median is determined by sorting the data set from lowest to highest values and taking the data point in the middle of the sequence. There is an equal number of points above and below the median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points greater than this value and two data points less than this value. In this case, the median is equal to the mean. But consider the data set {1,2,3,4,10}. In this data set, the median still is three, but the mean is equal to 4. If there is an even number of data points in the set, then there is no single point at the middle and the median is calculated by taking the mean of the two middle points.

The median can be determined for ordinal data as well as interval and ratio data. Unlike the mean, the median is not influenced by outliers at the extremes of the data set. For this reason, the median often is used when there are a few extreme values that could greatly influence the mean and distort what might be considered typical. This often is the case with home prices and with income data for a group of people, which often is very skewed. For such data, the median often is reported instead of the mean. For example, in a group of people, if the salary of one person is 10 times the mean, the mean salary of the group will be higher because of the unusually large salary. In this case, the median may better represent the typical salary level of the group.

Mode: The mode is the most frequently occurring value in the data set. For example, in the data set {1,2,3,4,4}, the mode is equal to 4. A data set can have more than a single mode, in which case it is multimodal. In the data set {1,1,2,3,3} there are two modes: 1 and 3.

The mode can be very useful for dealing with categorical data. For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the most popular sandwich [or sandwiches].

In probability theory and statistics, a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value, so one often takes the mean of the two middle values.

In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the median (in the case of an even sample size) and, if there is such a member, there may be more than one so that the median may not uniquely identify a sample member. Nonetheless the value of the median is uniquely determined with the usual definition,

At most half the population have values less than the median and at most half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if a < b < c, then the median of the list {a, b, c} is b, and if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c, i.e. it is (b + c)/2.

The median can be used as a measure of location when a distribution is skewed, when end values are not known, or when one requires reduced importance to be attached to outliers, e.g. because they may be measurement errors. A disadvantage of the median is the difficulty of handling it theoretically.

MEAN aka Arithmetic Mean aka Average
In statistics, mean has two related meanings:

* the arithmetic mean (and is distinguished from the geometric mean or harmonic mean).
* the expected value of a random variable, which is also called the population mean.

There are other statistical measures that use samples that some people confuse with averages – including ‘median’ and ‘mode.’ Other simple statistical analyses use measures of spread, such as range, interquartile range, or standard deviation. For a real-valued random variable X, the mean is the expectation of X. Note that not every probability distribution has a defined mean (or variance); see the Cauchy distribution for an example.

For a data set, the mean is the sum of the observations divided by the number of observations. The mean of a set of numbers x1, x2, …, xn is typically denoted by \bar{x}, pronounced “x bar”. The mean is often quoted along with the standard deviation: the mean describes the central location of the data, and the standard deviation describes the spread.

An alternative measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less sensitive to outliers, but less mathematically tractable.

As well as statistics, means are often used in geometry and analysis; a wide range of means have been developed for these purposes, which are not much used in statistics.

The arithmetic mean is the “standard” average, often simply called the “mean”.

The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of values, or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely (mode). For example, mean income is skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income, and favors the larger number of people with lower incomes. The median or mode are often more intuitive measures of such data.

Nevertheless, many skewed distributions are best described by their mean.

In statistics, the mode is the value that occurs the most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score. [1]

Like the statistical mean and the median, the mode is a way of capturing important information about a random variable or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions.

The mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely.

One Response to “The Mean Median, A Mode of Confusion”

  1. fayetteville north carolina…

    […]The Mean Median, A Mode of Confusion « Paths To Knowledge (dot NET)[…]…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: