Chapter 20: Simulator Data Reduction
Handbook of Driving Simulation for Engineering, Medicine, and Psychology
Simulator Data Reduction
Michelle L. Reyes, The University of Iowa
John D. Lee, University of Wisconsin
The Problem. Simulators produce a potentially overwhelming volume of data. The raw data generated by a simulator need to be parsed, aggregated, and combined to produce summary variables that relate to the theoretical constructs underlying the research questions that motivated the study. This process of transformation can be quite complex and error-prone, and manual reduction using a spreadsheet is often infeasible. Role of Driving Simulators. Driving simulator studies provide a more detailed account of human behavior than many other experimental approaches. This chapter provides suggestions for exploring this rich source of information through three basic steps: planning, writing, and testing. Planning emphasizes a focus on data reduction throughout the entire research process that begins with links to the theoretical constructs being measured and manipulated in the study. Writing describes a series of tips to avoid common frustrations in developing code for data reduction. Testing advocates a systematic plan that includes automatic checks for the bounds of the reduced variables and visualization to identify unexpected failures of the software. Key Results of Driving Simulator Studies. The data reduction demands of simulator studies repeatedly frustrate both novice and experienced researchers. Data reduction requires the power of software and researchers often find themselves victims of the many pitfalls associated with developing software. Undiscovered errors in the data reduction process have the potential to invalidate a research program and undermine the collective understanding of driver behavior. Scenarios and Dependent Variables. Future trends toward standardized scenarios and measures might avoid many data reduction challenges, but such standardizations make it more likely for researchers to blindly interpret outcome variables without careful consideration for how they relate to the theoretical constructs of interest. Platform Specificity and Equipment Limitations. The substantial differences between simulator platforms make it difficult for the content of this chapter to address the particular challenges any particular researcher will likely face. Data from different simulators reflect different underlying assumptions, definitions, and hardware configurations (e.g., eye tracker), but the general processes described in this chapter should help avoid the pitfalls commonly confronted when interpreting simulator data by increasing the opportunities for finding errors.
Data Reduction, Data Verification, Data Analysis, Visualization, Scenario Development, Pilot Testing, Research Process, Software Development
• Data reduction is critical for interpreting the outcomes of a study and practices inspired by formal software development procedures can help researchers avoid undiscovered errors in the data reduction process.
• Planning the data reduction process should begin long before data collection, entails gathering specifications for data reduction from nearly every phase of the research process, and should also include plans for testing the data reduction process.
• Practices like writing and testing the code incrementally, using pseudocode, avoiding hard coding, becoming thoroughly familiar with the simulator data, and using shortcuts with caution can help avoid frustrations while writing the data reduction code.
• Testing the data reduction code calls for a systematic plan that includes verifying the raw data and using visualization to validate the summary measures.
• Although standardized scenarios and measures can simplify the data reduction process, they also embody assumptions about driver behavior that can make it more difficult to understand behavior that is not consistent with those assumptions.
Web Figures 20.1-20.3. (click for all)
Web Figure 20.1: Research process implications for data reduction planning (Figure 20.1 in printed chapter).
Web Figure 20.2: Visualization plot of raw, mean, and standard deviation of lane position, as well as raw steering angle for four drivers (Figure 20.2 in printed chapter).
Web Figure 20.3: Visualization of components of the lane position. Identical values of the standard deviation of lane position do not reflect similar behavior (Figure 20.6 in printed chapter).
Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2(2), 131–160.
DeMarco, T. (1979). Structured analysis and system specification. Upper Saddle River, NJ: Yourdon Press.
Tufte, E. R. (1990). Envisioning information. Cheshire, CT: Graphics Press.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.