The Effectiveness of Restorative Justice Practices: A Meta-Analysis

3. Method

Following the techniques of Rosenthal (1991), a meta-analysis was designed to test the effectiveness of restorative justice practices. One of the major issues in conducting this form of research is agreeing on a definition of restorative justice. Generally, it is much easier to identify a non-restorative approach than it is to provide a precise definition of what constitutes restorative justice. For the purpose of this meta-analysis, the following operational definition was developed: restorative justice is a voluntary, community-based response to criminal behaviour that attempts to bring together the victim, the offender and the community in an effort to address the harm caused by the criminal behaviour.

While this may be open to debate, an operational definition is necessary to conduct research. Therefore, for the present meta-analysis, programs that contained “restorative” elements, such as restitution or community service, but did not attempt to bring together the victim, the offender and the community, were not considered.This definition provided us with a guide for the study selection process and ensured that we were examining a consistent response to criminal behaviour.

We also needed to identify appropriate outcomes that were measurable and linked directly to the goals of restorative justice. Although several outcome measures have been used, we selected victim and offender satisfaction, restitution compliance and recidivism as these were the only ones that were sufficiently available to be subjected to a meta-analysis. Furthermore, these four outcomes are clear and quantifiable determinants of the effectiveness of restorative justice.

3.1 Literature Review: Study Identification Criteria

To gather eligible studies for the meta-analysis, we conducted a comprehensive search on the restorative justice literature over the last 25 years. The studies were primarily drawn from the Internet, social science journals, and governmental and non-governmental reports. A secondary search was conducted using the bibliographies of the identified studies and by contacting researchers active in the field to identify new, unpublished and/or undiscovered research. An explicit set of criteria was established in order to select studies for inclusion in the meta-analysis:

3.2 Data Collection: Coding Procedures

The standardized information contained below (Table 1) was drawn from each study using a pre-designed coding manual. In designing a coding manual, the definition of certain variables can be problematic. For example, several studies chose to operationalize recidivism differently. In keeping with standard meta-analytic practice, we accepted multiple definitions of recidivism (i.e. a new criminal conviction, a new criminal charge, pre-post test offending). We also accepted two definitions of restitution compliance (proportion of offenders who repaid their restitution and proportion of total restitution dollars repaid by offenders).

Table 1. Primary Variables in Meta-analysis

For an overall mean effect size, in cases where multiple control/comparison groups were used in a single study, we combined the results to generate a single effect size for each program. In addition, where multiple follow-up periods were reported in a single study, we selected the longest at-risk period. To examine the impact of follow-up length and the use of different control/comparison groups, we did, however, also code multiple effect sizes for each program. The results of the two coding methods will be presented separately.

Since a large proportion of programs accepted referrals along multiple entry points, we coded both the earliest and the latest entry points in the criminal justice system. This provided us with two methods of conducting analysis on the entry point of the program and its subsequent impact on each outcome.

While we did identify those programs that randomly assigned participants to treatment and control groups, it should be noted that this is somewhat misleading. While participants are initially assigned to either group, restorative justice participation by definition is voluntary so participants can choose to withdraw from a program. Consequently, the problem of self-selection bias, which random assignment strives to eliminate, remains as the attrition rate in many studies was quite high.

To effectively compare victim and offender satisfaction between restorative and traditional approaches, it was necessary to create a binary satisfaction variable. This was achieved by coding positive measures of satisfaction as satisfied whereas neutral and negative responses were collapsed into an unsatisfied category. For example, if a study employed a five-point scale to measure satisfaction (i.e. very satisfied, somewhat satisfied, neutral, somewhat dissatisfied, very dissatisfied), we selected the top two categories as indicating satisfaction and considered the last three as unsatisfied.

In certain studies, the actual number of victims was not indicated but the study reported the percentage of satisfied versus unsatisfied victims. In these cases, we assumed the number of victims was equal to the number of offenders in order to calculate an effect size. In meta-analytic work, there is usually a trade-off between the comprehensiveness of the research and the precision of the coding techniques due to the reporting practices contained in most studies.

To test the reliability of the coding procedures, a second individual coded six randomly selected studies containing a total of 15 effect sizes. The general rate of agreement between the coders ranged from 47 percent to 100 percent, with an overall rate of agreement of 91 percent. In cases of coder disagreement, both coders discussed the discrepancy until a consensus was reached and this decision was then entered as the final code. Those variables that fell below 80 percent agreement were not included in the analysis.

3.3 Data Analysis: Effect Size Calculations

The relationship between participation in a restorative justice program and each of the four outcomes (victim satisfaction, offender satisfaction, restitution compliance and recidivism) was calculated from the raw statistics reported within each study. The phi coefficient (Pearson’s r product moment correlation applied to dichotomous data) was used as the effect size estimate. If the necessary data were not contained in an individual study, but a non-significant relationship between participation in a restorative justice program and the outcome was reported, the effect size was recorded as zero.

Once the effect sizes from each of the studies were calculated, we conducted a series of analyses across each of our four outcome measures of interest. First, the overall mean effect size, along with the corresponding confidence intervals and standard deviation (SD), was calculated. It should be noted that both the weighted and unweighted mean effect sizes were calculated but only the unweighted estimates were used in interpreting the results and in the moderator analyses listed below. This was done because, as stated previously, we had to estimate the actual number of victims, thus reducing the reliability of the weighted estimates. Furthermore, the weighted mean effect sizes were only marginally lower or higher than the unweighted effect sizes and would not have made a significant difference to the results of the analysis.

We also determined whether the overall difference between the restorative programs and the non-restorative control/comparison groups was statistically significant by conducting a one-sample t-test. This determines if the mean effect size is significantly different from zero (a zero effect size would indicate that participation in restorative justice had no effect on the subsequent outcomes). Additional analyses were conducted to explore whether certain variables, such as demographic or study characteristics, had a moderating impact on effect size magnitude. For example, if adequate information was available, we explored whether the age of the study sample (adult versus youth) had a significant effect on program outcome. This provided us with a mechanism whereby specific program impacts could be isolated for further study.