**Descriptive statistics for your dissertation research**

There are certain statistics that are generated for the purpose of describing your databases or the relationships between your variables. They are called **descriptive statistics**. These very useful statistics bring together large amounts of data so they can be presented and comprehended with minimal effort.

Descriptive statistics are widely applied. A good example of a real life application is the US Census. By using some of the popular descriptive statistics, we get a sense of important characteristics of households in the United States. For example, descriptive statistics that are available in Census Data may indicate:

* Average household size

* Ethnic and gender breakdowns

* Employment rates

* Average cost of single family homes and rental units

* Percent of children in different age categories

* Per capita income

* High school completion rates

There are three commonly reported descriptive statistics called **measures of central tendency**. The **mean** is the preferred method of calculating the center of your data set. But when outliers emerge, the **median** is an alternative that is not affected by these unrepresentative scores. If you are in a hurry and want to get a ballpark feeling for the average, the **mode** will work with a quick glance at the frequency distribution table.

**The Mean (x bar)**

The *mean* represents a whole data set of scores with one single number! To obtain the mean you add up all of the scores (x) and divide by the total number of scores in your distribution. Since it is an average, you probably have been calculating a mean often in your every day life.

> It is based on more precise measurement scales such as interval and ratio. Sometimes ordinal is used, too. This may occur with rating scales, performance scales or satisfaction scales where averages are useful. Certainly, no mean can be calculated with nominal scaled variables.

> As mentioned before, *all* the scores in a data set are used to calculate the mean. This is another plus.

> Finally, many of the powerful statistical analyses rely on the mean to calculate formulas for statistical significance. So the mean has become the Queen of Central Tendency. But this royal designation should be avoided if your data set possesses *outliers*.

**Beware of Outliers!**

An **outlier** is an extremely high or low score in your distribution. What happens is that outliers cause the mean to shift in the direction of the outlier... You can tell if you have outliers in your data set by setting up a frequency distribution and then graphing a frequency polygon. What should you do? Examine your data through the grunt work of developing a frequency distribution, and then use *more than one measure* of central tendency. One of the best, when your data has outliers, is the *median*.

*Use the mean:*

* If you want the greatest reliability

* If you will be calculating variability and other statistical computations

* If your distribution has no outliers

* If your data are interval, ratio and ordinal scaled

**The Median**

When there are extreme scores or outliers in your distribution, the *median* is the preferred measure. It is simply the midpoint in your distribution of ranked, ordered scores. To calculate the median, list all of the values in your distribution from the lowest to the highest and then find the midpoint – the place where it divides your distribution into equal halves. That is, fifty percent of all scores is above and fifty percent is below.

*The Median - odd number of scores in your distribution*

If there is an *odd* number of scores in your distribution, the median is easy to identify. Find the midpoint in the range of high to low scores. That midpoint is the median, since half of the scores are above it and half are below it. A formula can be used to locate the position in an ordered set of data. It is the:

[Median = Number of scores plus 1 divided by 2]

If you had 11 scores (an odd number), the formula would be (11 + 1) / by 2 = 6. The median would be the sixth score in the set of data where the scores are listed in order from lowest to highest.

*The Median - even number of scores in your distribution*

If you have an even number of scores, find the two that make the centermost point, and then average them. If you have an even number of scores, the median may or may not be an actual score, depending on what the two midpoints are. If the midpoints are identical scores, then this is the median. If they need to be averaged, then the median will be an average and not an actual score in your distribution. That is why we say the median is the midpoint, a point not a score.

*Use the median:*

* If your distribution is skewed by outliers

* If your data are interval, ratio or ordinal scaled

**The Mode**

The *mode* is the most frequently occurring value in your distribution. The mode does not need to be calculated. If you look at your frequency distribution, a simple eyeball inspection of your data can tell you which score occurred most often.

Many frequency distributions have more than one mode. That is, more than one score turns up at the same high level of frequency. Two modes in a frequency distribution create a *bimodal* frequency distribution. This might occur in a set of data where two groups are performing very differently. If graphed, there would be two humps in the curve or shape. If there are more than two modes, the distribution is called *multimodal* and the graphing shows several humps or curves.

**Use the mode:**

* If you need a quick estimate

* If you have nominal, ordinal, interval or ratio scaled data

* If you can eyeball the data from a frequency distribution

Return from descriptive statistics to statistical tests.