Criteria for Testing and Evaluating Occupational Safety Interventions: An Overview of Shannon, Robson, & Guastello (1999)

Organizations can sink a lot of resources into improving safety, but how can we make sure that these interventions are effective? Easy – science! Well, not so easy. Especially if you want the science to be done well. Testing safety interventions requires considerable time, effort, and funding. However, if done well, these tests can be invaluable. You may ask, though, what do I mean by ‘science done well’?

Shannon and colleagues (1999) give an inspired answer to this question in their paper by providing a practical but evidence-based list of criteria to consider when testing interventions. I list these criteria below along with some commentary to emphasize the main take-aways.

Criteria for evaluating occupational safety intervention research

 Program objectives and conceptual basis

  • Were the program objectives stated?
  • Was the conceptual basis of the program explained and sound?

These questions are a great place to start. The objectives of the study are key because they will determine which intervention is appropriate and how we would determine whether the intervention was successful or not. Meanwhile, the conceptual basis provides us with a way of organizing our thoughts, making it easier to connect the vast array of moving parts within interventions.

Study design

  • Was an experimental or quasi-experimental design employed instead of a non-experimental design?

As far the hierarchy of rigor goes, experimental randomized controlled trials with baseline measures are at the peak of the pyramid. This is how you would want to design your study to have the most confidence in your results. However this is rarely feasible given the nature of field research. As such, quasi-experimental designs are next down the gradation of the rigor hierarchy.  There are several variations of quasi-experiments that I won’t expand upon here, but there are several great resources to figure out which quasi-experimental features to consider weigh above others (see for example Cook & Campbell, 1979). Finally, non-experimental designs can give you the ability to observe change, but without a control group the ability to infer where the change comes from is hard to determine. These non-experimental designs should therefore remain last case scenarios.

External validity

  • Were the program participants/study population fully described?
  • Was the intervention explicitly described?
  • Were contextual factors described?

Detailed information about the participants (e.g., demographics, means of recruitment, and drop-out rate), intervention (e.g., duration and program content), and context (e.g., current state of safety performance within the given organization) provide crucial insight towards extrapolating any one intervention to other workplaces. As such, it is good practice to be as explicit about these particular features as possible.

Outcome measurement

  • Were all relevant outcomes measured?
  • Were the measurement methods shown to be valid and reliable?
  • Was the outcome measurement standardized by exposure?

The first point brings us back to thinking about the objective of the intervention. In short, your main outcome should be tied directly to your objective. If the goal was to decrease injuries, you should measure injuries. It may also be worth measuring other outcomes, such as those which may help explain how the intervention is working (both the implementation and mechanistic outcomes) and ensuring that it is not having any unintended consequences (reducing major injuries may increase minor injuries or underreporting). Finally, the latter two points ask whether you’re actually measuring what you want to measure (validity), whether your measurement is consistent (reliability), and whether the comparison you’re making is a fair one based on relative exposure to hazards (standardization).

Qualitative data

  • Were qualitative methods used to supplement quantitative data?

By adding some qualitative features, such as interviews, observation, and primary/secondary documents, to the test of the intervention can be very insightful. At a minimum it adds to the richness of the study and at best it can help spot threats to or even strengthen internal validity.

Threats to internal validity

  • Were the major threats to internal validity addressed in the study?

There are a number of potential threats to internal validity (i.e., a condition or event that may lead a research to the wrong answer), with more adding up as the rigor of the research design goes down. For randomized designs, the main threats are those which counteract randomization (e.g., improper randomization), diminish treatment (e.g., unintentional diffusion of treatment to control groups), and reactions of those participating in or administering the study (e.g., going along with or against expectations on purpose). For non-randomized control groups, selection biases are vital to consider. Meanwhile, major threats to the internal validity of before-and-after designs can be those related to the temporal aspect of this design, such as history (i.e., changes attributable to other factors than the intervention), maturation (i.e., natural changes in study group that occur outside the intervention), testing (i.e., essentially the placebo effect), instrumentation (i.e., changes in measurement over the course of the study), and regression to the mean (i.e., individuals in the extremes at one point in the study naturally move towards the average). For more details about these threats, please see Cook & Campbell (1979).

Statistical analysis

  • Were the appropriate statistical analyses conducted?
  • If study results were negative, were statistical power or confidence intervals calculated?

Typically, the analyses for testing the differences between experimental and control groups are fairly straight forward but become slightly more complicated for pre- and post-intervention measurements. Essentially, the less rigorous the study design, the more likely that you will need to add statistical controls to adjust for any differences between groups or within individuals over time.


  • Did conclusions address program objectives?
  • Were the limitations of the study addressed?
  • Were the conclusions supported by the analysis?
  • Was the practical significance of the result discussed?

Finally, the concluding points are good questions to be able to answer after the intervention. By following the criteria above, it will be clear whether the intervention objectives were addressed, what the specific limitations are for the interpretation of the intervention, how this affects the conclusions that can be drawn, and what this means for practice. Striking a good balance will be critical in the conclusion. Field research is still rare, and good field research is even more rare. By following the criteria above and attempting to design a study to maximize the validity of the intervention, the more power it will have towards helping us improve safety in the workplace.


Cook, T. D., Campbell, D. T., & Day, A. (1979). Quasi-experimentation: Design & analysis issues for field settings. Boston, MA: Houghton Mifflin.

Shannon, H. S., Robson, L. S., & Guastello, S. J. (1999). Methodological criteria for evaluating occupational safety intervention research. Safety Science31(2), 161-179.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s