Data Sets for the 2nd Edition
Psychology example datasets are prefixed with psy, medical example datasets are prefixed med, general datasets have no prefix. Data are fabricated to demonstrate aspects of the methods except where otherwise stated.
Alternatively you can download all the datasets as a .ZIP file.
Data Sets from the 1st Edition can be found here.
Chapter 2
- psy.anova.oneway.sav (Table 2.3) Depth of processing may influence people's ability to recall stimuli. For a between subjects oneway ANOVA the DV is a count of words RECALLed, the IV is DEPTH of processing required.
- med.anova.oneway.sav (Table 2.3 with medical labels substituted) To find the correct dosage for a drug expected to reduce systolic blood pressure we have a between subjects oneway ANOVA where the DV is reduction in SBP (SPBDROP). The IV is drug DOSAGE.
- psy.anova.between.sav (Table 2.5) Three levels of practice and three methods of teaching students to work with data are compared. For a 3*3 factorial between subjects ANOVA the DV is a test SCORE, the IVs are TRAINING method and number of practice TRIALS.
- med.anova.between.sav (Table 2.5 with medical labels substituted) Three drug types and three doses are compared for effect on migraine prevention. For a 3*3 factorial between subjects ANOVA the DV is a migraine reduction SCORE, the IVs are DRUG type and DOSE.
- psy.anova.within.sav (Table 2.6) Participants must decide whether or not two objects are identical when differently oriented. For a 3*2 factorial within subjects ANOVA the DV is TIME to make the decision, the IVs are ANGLE of relative orientation (30, 60 or 90 degrees) and MODE (letter or abstract shape).
- med.anova.within.sav (Table 2.6 with medical labels substituted) Choice reaction time may be a marker for presymptomatic Huntington's disease. For a 3*2 factorial within subjects ANOVA the DV is choice reaction TIME, the IVs are YEAR of test and MODE (visual or auditory).
- psy.anova.mixed.sav (Table 2.7) Ability and method of presentation may affect the ability to recall word lists. For a 3*2 mixed ANOVA the DV is the free RECALL score. The IVs are IQ (between subjects) and METHOD of presentation (within subjects). For a between subjects MANOVA with two DVs, the two recall scores obtained for each subject are the two DVs and there is only one IV, IQ.
- med.anova.mixed.sav (Table 2.7 with medical labels substituted) Difficulty of hearing is compared for degree of hearing loss and type of hearing aid. For a 3*2 mixed ANOVA the DV is a hearing difficulty score (DIFFIC). The IVs are level of hearing LOSS (between subjects) and TYPE of hearing aid (within subjects). For a between subjects MANOVA with two DVs, the two difficulty scores obtained for each subject are the two DVs and there is only one IV, TYPE.
Chapter 3
- psy.manova.between.sav (Table 3.1) The effect of training method and number of training sessions are compared for recruits in a travel query call centre. For a 2*3 between subjects MANOVA with two DVs, we have as DVs a count of CORRECT answers to queries and the DELAY before they begin to answer. TIME, the reciprocal of CORRECT, is used as a DV after diagnostic analysis. The IVs are training METHOD and number of practice SESSIONS.
- med.manova.between.sav (Table 3.1 with medical labels substituted) The effect of electronic reminder METHOD and amount of OT involvement in TRAINING are compared for dementia patients. For a 2*3 between subjects MANOVA with two DVs, we have as DVs the time to PROCESS the reminder before beginning tasks and a count of tasks completed (COMPLETE). SPEED, the reciprocal of PROCESS, is used as a DV after diagnostic analysis. The IVs are reminder device METHOD and type of TRAINING.
- psy.manova.within.sav (Table 3.2) Participants must decide whether or not two objects are identical when differently oriented. For a 3*2 factorial within subjects ANOVA with two DVs, the DVs are time to make the decision (reaction time or RT) and the amount of head movement (HEADMOVE). The IVs are ANGLE of relative orientation (30, 60 or 90 degrees) and MODE (letter or abstract shape).
- med.manova.within.sav (Table 3.2 with medical labels substituted) Choice reaction time and strength of response may be markers for presymptomatic Huntington's disease. For a 3*2 factorial within subjects MANOVA the DVs are choice reaction TIME and STRENGTH of response. The IVs are YEAR of test and MODE (visual or auditory).
Chapter 4
- psy.regression.stepwise.sav (Table 4.1) We attempt to predict success at IT trouble shooting. The DV is a SUCCESS score. The IVs are aptitude test scores (APT1 and APT2), a personality score (EXTRAVER) and educational variables (ITDEGREE and OTHERIT).
- psy.regression.crossvalidation.sav An extra column is added (XVAL) to the dataset above to enable alternate cases to be used as a training set (XVAL = 1) and a validation set (XVAL = 2).
- med.regression.stepwise.sav (Table 4.1 with medical labels substituted) We attempt to predict the success of cognitive behaviour therapy for patients with an eating disorder. The DV is a SUCCESS score. The IVs are a self esteem score (SELFEST), a measure of SEVERITY of the illness, a DEPRESSION score, and referral variables (GP and FAMILY).
- med.regression.crossvalidation.sav An extra column is added (XVAL) to the dataset above to enable alternate cases to be used as a training set (XVAL = 1) and a validation set (XVAL = 2).
- psy.regression.hierarchical.sav We try again to predict success at IT trouble shooting. The DV is a new success score (SUCCESS2). The IVs are aptitude test scores (APT1 and APT2), personality scores (EXTRAVER and CONTROL), educational variables (ITDEGREE and OTHERIT) and amount of online TRAINING.
- med.regression.hierarchical.sav We try again to predict success of treatment for eating disorder patients. The DV is a new success score (SUCCESS2). The IVs are a self esteem score (SELFEST) and a measure of SEVERITY of the illness, a DEPRESSION score and a social ISOLATION score, referral variables (GP and FAMILY) and a count of full weekly DIARY entries.
- regression.nonlinear.sav (Table 4.3) Astronomy data are used to demonstrate diagnostics. The DV is DISTANCE of planets from the sun and the IV is the NUMBER of the planet counted outwards from the sun.
Chapter 5
- psy.ancova.sav (Table 5.1) We examine the effect of a factor (TRAINING method) on a test score (POSTTEST) allowing for a covariate (PRETEST).
- med.ancova.sav (Table 5.1 with medical labels substituted) We examine the effect of a factor (TREATment method) on a platelet count (POSTCOUNT) allowing for a covariate (PRECOUNT).
Chapter 6
- psy.partial.sav (Table 6.1) For a partial correlation analysis we use student data with outcome variable being a measure of computer anxiety (CA), and predictor variables being SEX, psychological gender (PSYGEN) and the university FACULTY in which the student studies.
- med.partial.sav (Table 6.1 with medical labels substituted) For a partial correlation analysis we use as outcome variable a DEPRESSION score, and with predictor variables family history of depression (FAMILYHIST), state – trait ANXIETY, and employment status (EMPLOY).
- psy.moderation.sav (Table 6.2) For a moderation analysis we use CA (as in psy.partial.sav) as an outcome variable and with predictor variables, PSYGEN and hours of computer experience (COMPEXP).
- med.moderation.sav (Table 6.2 with medical labels substituted) For a moderation analysis we use DEPRESSION (as in med.partial.sav) as an outcome variable and with predictor variables, FAMILYHIST and number of anxiety free days (ANXFREEDAYS).
Chapter 7
- psy.path1.sav (Table 7.1) The principal DV is time spent using computers (USE), the exogenous variable is a measure of relaxed early experience of computers (RELAX), and the endogenous variables are computer anxiety (CA) and experience of feeling in CONTROL.
- med.path1.sav (Table 7.1 with medical labels substituted) The principal DV is number of suicidal THOUGHTS, the exogenous variable is family history of mental health problems (FAMILYHIST), and the endogenous variables are positive mental health (POSMENTAL) and SUSCEPTIBILITY to depression.
- psy.path2.sav (Table 7.2) This is part of a real dataset relating to the controversy about ordaining women priests in the Church of England, adapted from Sani and Todman (2002). The principal DV is schismatic intentions (SCHISM). The exogenous variable is the extent to which the ordination of women was perceived as changing a core aspect of the Church (CHANGE). The endogenous variables are the perception that ordination subverted the identity of the Church (SUBVERT), the extent to which the Church is perceived as entitative, that is, as an entity (ENTITATI) and the perceived ability of opponents of ordination to voice their dissent (VOICE).
- med.path2.sav (Table 7.2 with medical labels substituted) We have fabricated a medical context for the dataset above, psy.path2.sav. Here we suppose that the data refer to 211 patients with chronic fatigue syndrome (CFS). The principal DV is reduction in ACTIVITY with CFS. The exogenous variable is MONTHS since onset of CFS. The endogenous variables are the extent to which the illness is MANAGEd, the amount of PAIN experienced and loss of physical and cognitive FUNCTION.
Chapter 8
- psy.factor.sav (Table 8.2) This is a real dataset: the Wechsler Adult Intelligence Scale (WAIS) was administered to a sample of psychology students. The 11 variables record the subtests: information, digit span, vocabulary, arithmetic, comprehension, similarities, picture completion, picture arrangement, block design, object assembly and digit symbol, abbreviated in the datasheet to INFORM, DIGSPAN, VOCAB, ARITH, COMPREH, SIMIL, PICCOMP, PICARRAN, BLOCK, OBJASSEM, and DIGSYM.
- med.factor.sav (Table 8.2 with medical labels substituted) For this example we use the same (real) WAIS data but we have fabricated a medical context for it. Here we suppose that a Positive Health Inventory (PHI) is being developed in order to monitor the effectiveness of a free check-up programme. The PHI comprises 11 subtests, concerning healthy function of lungs, muscular system, liver, skeletal system, kidneys and heart, and scores on a step test, a stamina test, a stretch test, a blow test, and a urine flow test. These are abbreviated in the datasheet to LUNG, MUSCLE, LIVER, SKELETON, KIDNEYS, HEART, STEP, STAMINA, STRETCH, BLOW, and URINE
Chapter 9
- discriminant.sav (Table 9.1) Here we have data on SEX, HEIGHT in cms and WEIGHT in kgs for 35 adults.
- psy.discriminant.sav (Table 9.3) We have data on four test results and subsequent reading progress classification for primary children. The tests are TEST1, TEST2, TEST3 and LATER, the classification variable is GROUP.
- psy.discriminant.newcases.sav (Table 9.4) We have TEST1, TEST2 and TEST3, results for 10 new cases for whom we will attempt to predict progress classification (GROUP).
- med.discriminant.sav (Table 9.3 with medical labels substituted) Here we have data on four test results obtained soon after patients experience a traumatic brain injury (TBI), and a subsequent classification according to recovery status. The tests are EEG (an EEG-derived score), COMA (Glasgow coma score), PUPIL (a pupil reactivity score), and LATER (scan data). The classification variable is GROUP.
- med.discriminant.newcases.sav (Table 9.4 with medical labels substituted) We have EEG, COMA and PUPIL test results for 10 new cases for whom we will attempt to predict recovery status classification (GROUP).
- psy.logistic.sav (Table 9.6) This is a variant on the reading example where we have the TEST1 and TEST2 results for the cases from psy.discriminant.sav. We also have a new predictor variable SUPPORT. The classification variable is now INITIAL, where the children are classified on whether or not they progress on the first intervention.
- psy.logistic.newcases.sav (Table 9.7) We have TEST1, TEST2 and SUPPORT data on 10 new cases for whom we hope to predict the classification INITIAL.
- psy.logistic.predict.sav The new cases above are added to the bottom of psy.logistic.sav. An extra column CHOOSE is added to enable original cases (CHOOSE = 1) to be used to calculate discriminant functions and new cases (CHOOSE = 2) to have their classification on INITIAL predicted.
- psy.logistic.crossvalidation.sav An extra column, XVAL, with values alternately 1 and 2, is added to psy.logistic.sav to enable alternate cases to be used as a training set (XVAL = 1) and a validation set (XVAL = 2).
- med.logistic.sav (Table 9.6 with medical labels substituted) This is a variant on the TBI example where we have the EEG and COMA scores from med.discriminant.sav. We also have a different pupil reactivity variable, REACT. The classification variable is now WORK, where the patients are classified on whether or not they were able to return to work within six months.
- med.logistic.newcases.sav (Table 9.7 with medical labels substituted) We have EEG, COMA and REACT data on 10 new cases for whom we hope to predict the classification WORK.
- med.logistic.predict.sav The new cases above are added to the bottom of med.logistic.sav. An extra column CHOOSE is added to enable original cases (CHOOSE = 1) to be used to calculate discriminant functions and new cases (CHOOSE = 2) to have their classification on WORK predicted.
- med.logistic.crossvalidation.sav An extra column, XVAL, with values alternately 1 and 2, is added to med.logistic.sav to enable alternate cases to be used as a training set (XVAL = 1) and a validation set (XVAL = 2).
Chapter 10
- cluster.sav (Table 10.4) We have data on mathematical and verbal aptitude tests (MATH and LANG) and a MANUAL dexterity test for 52 students.
- psy.cluster.variables.sav (Table 10.5) We have presence/absence data on eight problems for 50 children. The variables are named PROB1, PROB2,….PROB8.
- med.cluster.variables.sav (Table 10.5 with medical labels substituted) We have presence/absence data on eight symptoms for 50 TBI patients. The variables are named SYMPT1, SYMPT2,….SYMPT8.
Chapter 11
- mds1.sav (Table 11.1) This is the lower triangle of the matrix of distances between eight British cities.
- mds2.sav (Table 11.2) These are real data adapted from Sani and Reicher (1999). We have the lower triangle of the matrix of distances between nine Church of England organisations according to opponents of women's ordination, obtained before the first ordinations took place.
- mds3.sav (Table 11.3) These are real data adapted from Sani and Reicher (1999). We have the lower triangle of the matrix of distances between nine Church of England organisations according to supporters of women's ordination, obtained before the first ordinations took place.
- mds.similaritya.sav1 (Table 11.4a) This is the lower triangle of counts of interactions between pairs of students before an intervention (psychology example). It is also the lower triangle of similarities between pairs of MS patients provided by physiotherapy staff before an intervention (medical example).
- mds.similarityb.sav1 (Table 11.4b) This is the lower triangle of counts of interactions between pairs of students after an intervention(psychology example). It is also the lower triangle of similarities between pairs of MS patients provided by physiotherapy staff after an intervention (medical example).
- mds.distancea.sav1 (Table 11.5) This is the lower triangle of distances between the pairs of students before the intervention, calculated from the counts of interactions in mds.similaritya.sav (psychology example). It is also the lower triangle of distances between pairs of MS patients before the intervention, calculated from the similarities provided in mds.similaritya.sav (medical example).
- mds.distanceb.sav1 This is the lower triangle of distances between the pairs of students after the intervention, calculated from the counts of interactions in mds1b.sav (psychology example). It is also the lower triangle of distances between pairs of MS patients after the intervention, calculated from the similarities provided in mds.similarityb.sav (medical example).
- mds.distancec.sav1 (Table 11.6) This is the lower triangle of distances between the pairs of students calculated from 'control' questions (psychology example). It is also the lower triangle of distances between pairs of MS patients calculated from 'how you see the future' questions (medical example).
Chapter 12
- psy.loglinear.sav (Table 12.3) This example examines relationships among mathematical ability, sex and finger length. We have three binary variables, SEX, INDEX finger longer or shorter than third finger, MATHS ability good or not, and the FREQuencies of participants in each of the eight possible combinations.
- med.loglinear.sav (Table 12.3 with medical labels substituted) This example examines relationships among obesity, diabetes and a variant gene. We have three binary variables, DIABETES present or not, OBESITY present or not, FTO variant gene present or not, and the FREQuencies of participants in each of the eight possible combinations.
- loglinear.sav (Table 12.6) This example concerns the voting behaviour of young and old people. We have three variables, AGE and VOTE which are binary and PARTY with three categories, and the FREQuencies of participants in each of the 12 possible combinations.
Chapter 13
- psy.poissonregression.equaltimes.sav (Table 13.1) This example investigates the relationship between TREATMENT group and the number of violent incidents (EVENTS) instigated by prisoners in a year of observation, with covariates which are their scores on a self esteem scale (ESTEEM) and whether they experienced abuse as a child (ABUSE).
- med.poissonregression.equaltimes.sav (Table 13.1 with medical labels substituted) This example investigates the relationship between TREATMENT group and the number of epileptic seizures (EVENTS) experienced by patients in a year of observation, with covariates which are their scores on a self esteem scale (ESTEEM) and whether they exceed the recommended maximum alcohol consumption (ALCOHOL).
- psy.poissonregression.unequaltimes.sav (Table 13.2) This example extends the one with equal times to the period spent in prison before the intervention. Additional variables are YEARS in prison before the intervention, the log of years (OFFSET), and the number of violent incidents before the intervention (PRE_TRT_EVENTS).
- med.poissonregression.unequaltimes.sav (Table 13.2 with medical labels substituted) This example extends the one with equal times to the period registered with the treatment unit before the intervention. Additional variables are YEARS in the unit before the intervention, the log of years (OFFSET), and the number of seizures before the intervention (PRE_TRT_EVENTS).
Chapter 14
- psy.survival.sav (Table 14.1) This example investigates the relationship between TREATMENT group and the time to relapse (TTF), if relapse occurs (RELAPSE) in a smoking cessation trial. Covariates are mean NUMBER smoked daily before the trial and length of time the person has smoked (YEARS).
- med.survival.sav (Table 14.1 with medical labels substituted) This example investigates the relationship between TREATMENT group and the time to a dyspenea event (TTF), if one occurs (RELAPSE) in a drug trial for chronic obstructive pulmonary disease (COPD) patients. Covariates are mean NUMBER of cigarettes smoked daily in the month before the trial and length of time the person has smoked (YEARS).
Chapter 15
- longitudinal.anova.sav (Table 15.1) This example investigates whether choice reaction time (RT) might serve as a marker of neurological function to assess the impact of experimental therapies on presymptomatic gene carriers for Huntington’s disease. A test in two MODEs is carried out three times at intervals of a YEAR.
- longitudinal.manova.sav (Table 15.2) As for longitudinal.anova.sav but with the addition of an extra DV, STRENGTH.
- longitudinal.MRM.sav (Table 15.3) This is a study of body mass index. For seven participants we record AGE at the start and BMI every month for 10 months, along with treatment group.
- psy.gee.sav (Table 15.4) This is a study of three TREATMENT groups for anxiety, with a binary RESPONSE (have you had trouble sleeping in the last week, yes/no?) and two covariates GENDER and a mood score (SF_36). Each participant has three observations (VISITs).
1 These mds files serve for both psychology and medical examples because the variable names are people's names; the same names for students in the psychology example and MS patients in medical example.