There are certain statistics that are generated for the purpose of describing your databases or the relationships between your variables. They are called descriptive statistics. These very helpful statistics bring together large amounts of data so they can be presented and comprehended with minimal effort.
Descriptive statistics are widely applied. A good example of a real life application is the US Census. By using some of the popular descriptive statistics, we get a sense of important characteristics of households in the United States. For example, descriptive statistics that are available in Census Data may indicate:
* Average household size
* Ethnic and gender breakdowns
* Employment rates
* Average cost of single family homes and rental units
* Percent of children in different age categories
* Per capita income
* High school completion rates
There are three commonly reported descriptive statistics called measures of central tendency. The mean is the preferred statistic for calculating the center of your data set. But when outliers emerge, the median is an alternative statistic that is not affected by these unrepresentative scores. If you are in a hurry and want to get a ballpark feeling for the average, the mode will work with a quick glance at the frequency distribution table.
The Mean (x bar)
The mean represents a whole data set of scores with one single number! To obtain this statistic you add up all of the scores (x) and divide by the total number of scores in your distribution. Since it is an average, you probably have been calculating a mean often in your every day life.
> It is based on more precise measurement scales such as interval and ratio. Sometimes ordinal is used, too. This may occur with rating scales, performance scales or satisfaction scales where averages are useful. Certainly, no mean can be calculated with nominal scaled variables.
> As mentioned before, all the scores in a data set are used to calculate the mean. This is another plus for this statistic.
> Finally, many of the powerful statistical analyses rely on the mean to calculate formulas for statistical significance. So the mean has become the Queen of Central Tendency. But this royal designation should be avoided if your data set possesses outliers.
Beware of Outliers!
An outlier is an extremely high or low score in your distribution. What happens is that outliers cause the mean to shift in the direction of the outlier... You can tell if you have outliers in your data set by setting up a frequency distribution and then graphing a frequency polygon. What should you do? Examine your data through the grunt work of developing a frequency distribution, and then use more than one measure of central tendency. One of the best, when your data has outliers, is the statistic called the median.
Use the mean:
* If you want the greatest reliability
* If you will be calculating variability and other statistical computations
* If your distribution has no outliers
* If your data are interval, ratio and ordinal scaled
When there are extreme scores or outliers in your distribution, the median is the preferred statistic to measure central tendency. It is simply the midpoint in your distribution of ranked, ordered scores. To calculate the median, list all of the values in your distribution from the lowest to the highest and then find the midpoint – the place where it divides your distribution into equal halves. That is, fifty percent of all scores is above and fifty percent is below.
The Median - odd number of scores in your distribution
If there is an odd number of scores in your distribution, the median is easy to identify. Find the midpoint in the range of high to low scores. That midpoint is the median, since half of the scores are above it and half are below it. A formula can be used to locate the position in an ordered set of data. It is the:
[Median = Number of scores plus 1 divided by 2]
If you had 11 scores (an odd number), the formula would be (11 + 1) / by 2 = 6. The median would be the sixth score in the set of data where the scores are listed in order from lowest to highest.
The Median - even number of scores in your distribution
If you have an even number of scores, find the two that make the centermost point, and then average them. If you have an even number of scores, the median may or may not be an actual score, depending on what the two midpoints are. If the midpoints are identical scores, then this is the median. If they need to be averaged, then the median will be an average and not an actual score in your distribution. That is why we say the median is the midpoint, a point not a score.
Use the median:
* If your distribution is skewed by outliers
* If your data are interval, ratio or ordinal scaled
The mode is the most frequently occurring value in your distribution. The mode as a statistic does not need to be calculated. If you look at your frequency distribution, a simple eyeball inspection of your data can tell you which score occurred most often.
Many frequency distributions have more than one mode. That is, more than one score turns up at the same high level of frequency. Two modes in a frequency distribution create a bimodal frequency distribution. This might occur in a set of data where two groups are performing very differently. If graphed, there would be two humps in the curve or shape. If there are more than two modes, the distribution is called multimodal and the graphing shows several humps or curves.
Use the mode:
* If you need a quick estimate
* If you have nominal, ordinal, interval or ratio scaled data
* If you can eyeball the data from a frequency distribution