4 Data Handling
(PSY206) Data Management and Analysis
3.1 Introduction
- SPSS provides a range of commands to modify, manipulate, or transform data, collectively referred to as Data Handling commands.
- These commands are particularly useful when working with large datasets containing numerous variables for each participant, such as survey or questionnaire data.
- Questionnaires often include items (questions) that can be grouped into subscores, which can be calculated using Data Handling commands.
- Data transformations, such as log transformations, can also be performed to reduce distortions like skewness and improve the validity of statistical analyses.
- Another common use of these commands is to filter data to analyze specific groups of participants — for example, analyzing males and females separately, or excluding respondents who do not meet certain inclusion criteria.
Example Data File
To illustrate the use of these commands, we will use a small dataset (download) based on a fictitious survey exploring people’s attitudes toward adoption.
This dataset includes:
- Participant number
- Demographic variables (age, sex, ethnicity, religious belief, and adoption experience)
- Responses to 10 statements on adoption, measured on a 5-point Likert scale ranging from Strongly Agree (1) to Strongly Disagree (5)
Each response is recorded in variables q1
to q10
.
3.2 Sorting Data
Although the order of cases usually does not affect statistical analysis, sorting can make it easier to inspect and verify data. For example, sorting participants by sex and then by ethnicity can help detect data entry issues or compare group distributions.
In this example, we sort the data first by sex, and then within each sex by ethnicity.
3.3 Splitting Data
The Split File function allows SPSS to temporarily divide a dataset into groups, so that all subsequent analyses are performed separately for each group.
For instance, you may want to produce separate statistical outputs for male and female participants.
To split a file, follow the steps below:
The difference between the two options is important:
- Compare groups: produces one combined output section showing group comparisons.
- Organize output by groups: generates separate output sections for each group.
We usually prefer Organize output by groups for clearer interpretation, but you should explore both options to understand their differences.
Undoing Split File
The Split File command remains active until you manually turn it off. You can check whether it is on by looking at the bottom-right corner of the Data View window. When Split File is active, SPSS displays a message like “Split by Sex.”
To disable it, simply select Unsplit File from the same menu.
3.4 Selecting Cases
Sometimes you may wish to analyze only a subset of your data, such as respondents who have been adopted.
The Select Cases command allows you to temporarily exclude all other participants from analysis.
- Split File analyzes all data but displays separate outputs by group.
- Select Cases analyzes only the chosen subset, suppressing all other cases.
Use Select Cases when you want to restrict analysis to specific participants.
Selection Rules
You can define complex selection criteria using logical operators such as AND
, OR
, and NOT
.
Rules can be typed directly or created using the on-screen calculator.
For example, to select only Chinese Christians with experience of adoption, the expression would be: religion = 3 and ethnicity = 3 and adopted > 0
You can also create more advanced selection rules by combining logical conditions with built-in functions available in the dialogue box.
Reselecting All Cases
The selection remains in effect until you manually reset it.
To restore all participants, open the Select Cases dialog and choose All cases.
3.5 Recoding Values
Recoding is the process of changing the values of a variable — often to correct errors, merge categories, or prepare data for specific analyses.
For instance, if preliminary results show very few participants with adoption experience through “immediate family” or “other family,” these categories could be combined.
SPSS provides two main recode options:
- Recode into Same Variables — replaces original values (riskier)
- Recode into Different Variables — creates a new variable (safer and recommended)
Tip: Always use Recode into Different Variables to preserve the original data in case of mistakes.
Conditional Recoding
You can also apply conditional recoding, where values are changed only if specific conditions are met — for example, recoding age values only for female participants.
3.6 Computing New Variables
The Compute Variable command allows you to create new variables from existing ones.
This is useful when:
- Summing item scores into total or subscale scores
- Calculating averages
- Applying mathematical transformations
In our example, the 10 questionnaire items can be combined into two subscales by summing or averaging specific variables (q1
–q5
, q6
–q10
).
SPSS also provides built-in functions such as SUM()
, MEAN()
, SD()
, etc., that simplify computation.
3.7 Counting Values
Sometimes we need to count how many times a particular response occurs across several variables.
For example, you may want to know how many times each participant selected “Strongly Agree (1)” across all 10 questionnaire items (q1
–q10
).
The Count Values within Cases function creates a new variable representing this count.
Summary
In this chapter, you learned how to:
- Sort and split datasets
- Select specific cases for analysis
- Recode and compute variables
- Count responses across variables
These data handling skills are fundamental for data preparation and cleaning — an essential step before conducting any statistical analysis in SPSS.