7 Tests for Nominal Data
(PSY206) Data Management and Analysis
6.1 Nominal Data
- Nominal data are also known as categorical data
- Categories have no natural order
- Numbers used for coding are labels only, not quantities
Examples:
- Sex: Male / Female
- Smoking status: Smoker / Non-smoker
- School type: Private / State
Nominal data are sometimes called qualitative data, but this is different from qualitative research.
6.2 Dichotomous Variables
- A dichotomous variable is a special case of nominal data
- It has only two categories
Examples:
- Yes / No
- Alive / Dead
- Disease present / Disease absent
Every dichotomous variable is nominal, but not every nominal variable is dichotomous.
6.3 Descriptive Statistics for Nominal Data
For nominal data, the following are meaningless:
- Mean
- Median
- Standard deviation
The only appropriate summaries are:
- Frequencies (counts)
- Percentages
Recommended displays:
- Frequency tables
- Bar charts
Histograms should NOT be used for nominal data.
6.4 Chi-square Test: Overview
- The chi-square test is designed for nominal data
- It compares observed frequencies with expected frequencies
Two main types:
- Goodness-of-fit chi-square
- Multidimensional chi-square
6.5 Goodness-of-Fit Chi-square
Purpose:
- To test whether an observed distribution differs from what is expected by chance
Example:
- Do children prefer toy A, B, and C equally in a play-therapy setting?
- Do smokers choose Brand A and Brand B equally often?
- Are different coping strategies (avoidance, problem-focused, emotion-focused) used equally often by stressed students?
Null hypothesis:
- Categories occur in the expected (usually equal) proportions
This test is used less frequently in psychology.
6.6 Multidimensional Chi-square
This is the most commonly used chi-square test in psychology and public health.
It can be viewed as:
- A test of association, or
- A test of difference between groups
Research question:
- Are two nominal variables independent?
6.7 Examples of Research Questions
- Is smoking status associated with income level?
- Is treatment received associated with survival status?
- Is gender associated with help-seeking behaviour (Yes/No)?
- Is exposure to trauma associated with PTSD diagnosis?
- Is school type associated with exam anxiety level (High/Low)?
A significant result indicates association, not causation.
6.8 Assumptions of Chi-square Test
To use chi-square, the following must hold:
- Variables are nominal
- Data are frequency counts
- Categories are mutually exclusive
- Observations are independent
Repeated measures data violate independence.
6.9 Contingency Table
- Data are summarized using an N × N contingency table
Example: 2 × 2 table
| High income | Low income | Total | |
|---|---|---|---|
| Smokers | 10 | 20 | 30 |
| Non-smokers | 35 | 35 | 70 |
| Total | 45 | 55 | 100 |
6.10 Expected Frequencies
If variables are independent: \[ E = \frac{(\text{Row total}) (\text{Column total})}{\text{Grand total}} \]
- Chi-square compares observed (O) and expected (E) counts
Large differences between O and E lead to a larger chi-square value.
6.11 Example Data
In this lecture, we will use a sample psychology dataset to demonstrate the chi-square test of independence. Eighty young women completed an eating questionnaire, which allowed them to be classified as having either high or low tendency toward anorexia (1 = high, 2 = low), where participants with high scores are at greater risk of developing anorexia. In addition, the dataset includes several nominal background variables: cultural background (1 = Asian, 2 = Caucasian, 3 = Other), employment status of the women’s mother (1 = Full-time, 2 = None, 3 = Part-time), and type of school she attended (1 = Comprehensive, 2 = Private).
Download data: (In Excel) (In SPSS Format))
Previous research has suggested that the incidence of anorexia is higher among girls attending private schools than state schools, and higher among girls whose mothers are not in full-time employment. In addition, the incidence seems to be higher in Caucasian girls than non-Caucasian girls.
We therefore hypothesised that there would be an association between these factors and the classification on the eating questionnaire. To test this hypothesis, we conducted a series of chi-square analyses.
6.12 SPSS: Multidimensional Chi-square
Menu path:
Analyze → Descriptive Statistics → Crosstabs
Steps:
- Put one variable in Rows
- Put the other variable in Columns
- Click Statistics → select Chi-square
- Click Cells → select Observed, Expected, Row %, Column %
SPSS Syntax
CROSSTABS
/TABLES=var1 BY var2
/STATISTICS=CHISQ PHI
/CELLS=COUNT EXPECTED ROW COLUMN.6.13 Interpreting SPSS Output
When Crosstabs options are selected properly, each cell reports:
- Observed Count: Actual number of cases in the cell
- Expected Count: Number expected if variables were independent
- Row %: Percentage within the row category
- Column %: Percentage within the column category
- Total %: Percentage of the full sample
Always describe results using row or column percentages, not raw counts alone.
6.14 Reporting Chi-square Results
Standard Reporting Format
χ²(df, N = sample size) = value, p = value
Example (non-significant):
There was no association between mother’s employment status and anorexia tendency: χ²(2, N = 80) = 0.29, p = .862.
Example (significant):
There was a significant association between school type and anorexia tendency: χ²(1, N = 80) = 28.19, p < .001.
Always follow this with a description of the pattern observed in the contingency table.
6.15 Degrees of Freedom
\[ df = (r - 1)(c - 1) \]
- r = number of rows
- c = number of columns
Example:
- 2 × 2 table → df = 1
- 2 × 3 table → df = 2
6.16 Effect Size: Phi and Cramer’s V
- Phi (φ): used for 2 × 2 tables
- Cramer’s V: used for larger tables
- In the Crosstabs → Statistics dialog box:
- Select Phi for 2 × 2 tables
- Select Cramer’s V for larger tables
- Interpretation is similar to correlation coefficients.
- Interpretation Guidelines (rule of thumb):
- 0.10 → small association
- 0.30 → moderate association
- 0.50 → strong association
Effect size should be reported even if the chi-square result is statistically significant.
6.17 Small Expected Frequencies
- Chi-square is not valid if expected count < 5
Solutions:
- For 2 × 2 tables: use Fisher’s Exact Test
- For larger tables: use Exact option in SPSS
Always check the footnote in SPSS output.
6.18 Summary
- Nominal data require special handling
- Use frequencies and percentages only
- Chi-square tests association between nominal variables
- Always check assumptions and expected counts
- Association does not imply causation