7 Tests for Nominal Data

(PSY206) Data Management and Analysis

Author

Md Rasel Biswas

6.1 Nominal Data

  • Nominal data are also known as categorical data
  • Categories have no natural order
  • Numbers used for coding are labels only, not quantities

Examples:

  • Sex: Male / Female
  • Smoking status: Smoker / Non-smoker
  • School type: Private / State

Nominal data are sometimes called qualitative data, but this is different from qualitative research.


6.2 Dichotomous Variables

  • A dichotomous variable is a special case of nominal data
  • It has only two categories

Examples:

  • Yes / No
  • Alive / Dead
  • Disease present / Disease absent

Every dichotomous variable is nominal, but not every nominal variable is dichotomous.


6.3 Descriptive Statistics for Nominal Data

For nominal data, the following are meaningless:

  • Mean
  • Median
  • Standard deviation

The only appropriate summaries are:

  • Frequencies (counts)
  • Percentages

Recommended displays:

  • Frequency tables
  • Bar charts

Histograms should NOT be used for nominal data.


6.4 Chi-square Test: Overview

  • The chi-square test is designed for nominal data
  • It compares observed frequencies with expected frequencies

Two main types:

  1. Goodness-of-fit chi-square
  2. Multidimensional chi-square

6.5 Goodness-of-Fit Chi-square

Purpose:

  • To test whether an observed distribution differs from what is expected by chance

Example:

  • Do children prefer toy A, B, and C equally in a play-therapy setting?
  • Do smokers choose Brand A and Brand B equally often?
  • Are different coping strategies (avoidance, problem-focused, emotion-focused) used equally often by stressed students?

Null hypothesis:

  • Categories occur in the expected (usually equal) proportions

This test is used less frequently in psychology.


6.6 Multidimensional Chi-square

This is the most commonly used chi-square test in psychology and public health.

It can be viewed as:

  • A test of association, or
  • A test of difference between groups

Research question:

  • Are two nominal variables independent?

6.7 Examples of Research Questions

  • Is smoking status associated with income level?
  • Is treatment received associated with survival status?
  • Is gender associated with help-seeking behaviour (Yes/No)?
  • Is exposure to trauma associated with PTSD diagnosis?
  • Is school type associated with exam anxiety level (High/Low)?

A significant result indicates association, not causation.


6.8 Assumptions of Chi-square Test

To use chi-square, the following must hold:

  1. Variables are nominal
  2. Data are frequency counts
  3. Categories are mutually exclusive
  4. Observations are independent

Repeated measures data violate independence.


6.9 Contingency Table

  • Data are summarized using an N × N contingency table

Example: 2 × 2 table

High income Low income Total
Smokers 10 20 30
Non-smokers 35 35 70
Total 45 55 100

6.10 Expected Frequencies

If variables are independent: \[ E = \frac{(\text{Row total}) (\text{Column total})}{\text{Grand total}} \]

  • Chi-square compares observed (O) and expected (E) counts

Large differences between O and E lead to a larger chi-square value.


6.11 Example Data

In this lecture, we will use a sample psychology dataset to demonstrate the chi-square test of independence. Eighty young women completed an eating questionnaire, which allowed them to be classified as having either high or low tendency toward anorexia (1 = high, 2 = low), where participants with high scores are at greater risk of developing anorexia. In addition, the dataset includes several nominal background variables: cultural background (1 = Asian, 2 = Caucasian, 3 = Other), employment status of the women’s mother (1 = Full-time, 2 = None, 3 = Part-time), and type of school she attended (1 = Comprehensive, 2 = Private).

Download data: (In Excel) (In SPSS Format))

Previous research has suggested that the incidence of anorexia is higher among girls attending private schools than state schools, and higher among girls whose mothers are not in full-time employment. In addition, the incidence seems to be higher in Caucasian girls than non-Caucasian girls.

We therefore hypothesised that there would be an association between these factors and the classification on the eating questionnaire. To test this hypothesis, we conducted a series of chi-square analyses.


6.12 SPSS: Multidimensional Chi-square

Menu path:

AnalyzeDescriptive StatisticsCrosstabs

Steps:

  1. Put one variable in Rows
  2. Put the other variable in Columns
  3. Click Statistics → select Chi-square
  4. Click Cells → select Observed, Expected, Row %, Column %

SPSS Syntax

CROSSTABS
  /TABLES=var1 BY var2
  /STATISTICS=CHISQ PHI
  /CELLS=COUNT EXPECTED ROW COLUMN.

6.13 Interpreting SPSS Output

When Crosstabs options are selected properly, each cell reports:

  • Observed Count: Actual number of cases in the cell
  • Expected Count: Number expected if variables were independent
  • Row %: Percentage within the row category
  • Column %: Percentage within the column category
  • Total %: Percentage of the full sample

Always describe results using row or column percentages, not raw counts alone.


6.14 Reporting Chi-square Results

Standard Reporting Format

χ²(df, N = sample size) = value, p = value

Example (non-significant):

There was no association between mother’s employment status and anorexia tendency: χ²(2, N = 80) = 0.29, p = .862.

Example (significant):

There was a significant association between school type and anorexia tendency: χ²(1, N = 80) = 28.19, p < .001.

Always follow this with a description of the pattern observed in the contingency table.


6.15 Degrees of Freedom

\[ df = (r - 1)(c - 1) \]

  • r = number of rows
  • c = number of columns

Example:

  • 2 × 2 table → df = 1
  • 2 × 3 table → df = 2

6.16 Effect Size: Phi and Cramer’s V

  • Phi (φ): used for 2 × 2 tables
  • Cramer’s V: used for larger tables
  • In the Crosstabs → Statistics dialog box:
    • Select Phi for 2 × 2 tables
    • Select Cramer’s V for larger tables
  • Interpretation is similar to correlation coefficients.
  • Interpretation Guidelines (rule of thumb):
    • 0.10 → small association
    • 0.30 → moderate association
    • 0.50 → strong association

Effect size should be reported even if the chi-square result is statistically significant.


6.17 Small Expected Frequencies

  • Chi-square is not valid if expected count < 5

Solutions:

  • For 2 × 2 tables: use Fisher’s Exact Test
  • For larger tables: use Exact option in SPSS

Always check the footnote in SPSS output.


6.18 Summary

  • Nominal data require special handling
  • Use frequencies and percentages only
  • Chi-square tests association between nominal variables
  • Always check assumptions and expected counts
  • Association does not imply causation