The Effectiveness of School-Based Violence Prevention Programs for Reducing Disruptive and Aggressive Behavior: A Meta-analysis
by Sandra WILSON, Mark W. LIPSEY, Center for Evaluation Research and Methodology, Institute for Public Policy Studies, Vanderbilt University

Theme : International Journal on Violence and School, n°1, May 2006

The issue addressed in this paper was the effectiveness of school-based programs for preventing or reducing aggressive and disruptive behavior. The results of 219 experimental and quasi-experimental studies of school-based psychosocial programs were synthesized using meta-analysis techniques. We divided the school violence prevention programs into four groups that represented distinct program formats: universal, selected/indicated, special classes/schools, and comprehensive. Overall, the universal, selected/indicated, and comprehensive programs were generally effective at reducing the more common types of aggressive behavior seen in schools, including fighting, name-calling, intimidation, and other negative interpersonal behaviors, especially among higher risk students. Overall, the special programs did not significantly impact aggressive and disruptive behavior, though some special programs were effective.

Keywords : .
English text (download the pdf file here)

Extremely violent events in schools draw our attention to programs designed to prevent and reduce school violence. Highly publicized school shootings, though tragic, are thankfully rare. In fact, less than 1% of the over 2,000 homicides of American children in 1999-2000 happened at school (DeVoe, et al., 2004). Further, over the past 10 years, victimization of school-aged children has declined both at and away from school in the United States. Nevertheless, some forms of antisocial behavior are common in schools. For example, according to principal reports from the 1999-2000 school year, 71% of public schools experienced a violent crime and over half took serious disciplinary action for some children (DeVoe, et al., 2004).

Teachers and students also report negative behaviors in schools. According to Gottfredson, et al. (2000) American teachers report that student misconduct is common and interferes with their efforts to teach. Among middle school students surveyed, 22% reported being threatened and 41% reported hitting or threatening to hit other students. Fewer high school students report such incidents (16% report being threatened and 32% report hitting or threatening to hit other students), but the numbers are not trivial. Minor behavior problems appear to be relatively common in schools. In addition, though serious violence is rare, some schools can be dangerous places for children.

There are a variety of approaches to school violence prevention and maintaining safe schools, including surveillance (e.g., metal detectors, security guards, etc.), deterrence (e.g., rules, regulations, zero tolerance policies), and psychosocial intervention. This research focuses on summarizing the effectiveness of programs of the latter sort. Because extreme violence is rare in schools, the psychosocial programs that are the subject of this paper typically focus on more common forms of aggressive behavior such as fighting, bullying, verbal conflict, and disruptive behavior. These behaviors, even when not overtly violent, may inhibit learning and create interpersonal problems for those involved. In addition, minor forms of aggressive behavior can escalate (Garofalo, Siegel, & Laub, 1987) and schools that do not effectively counteract this progression may create an environment in which violence is normatively acceptable (Goldstein, Harootunian, & Conoley, 1994). Thus, it is appropriate for schools to attempt to reduce behaviors such as fighting, name-calling, bullying, and general intimidation that can create a negative school climate, disrupt learning, and lead to more serious violence. This paper updates and expands the meta-analysis reported in Wilson, Lipsey, and Derzon (2003) on the effectiveness of school-based programs for preventing or reducing these forms of aggressive and disruptive behavior.

The work presented here uses meta-analysis to summarize the available primary research on the effectiveness of school-based violence prevention programs. Meta-analysis is a technique for recording and analyzing the statistical results of a collection of empirical research studies. The methodology for our meta-analysis is described briefly below.


Eligibility Criteria

Studies were selected for the meta-analysis based on a set of detailed criteria, summarized as follows:

    The study involved an evaluation of a school-based program for children attending any grade, pre-Kindergarten through 12th grade.
    The study presented intervention results on at least one outcome variable, measured on children, representing aggressive or disruptive behavior, or a risk factor for aggressive behavior.
    The study used a control group design that compared students exposed to an identifiable intervention with students in a comparison condition on at least one qualifying outcome variable. Both randomized and non-randomized designs were considered eligible.
    The study provided post-treatment results sufficient for calculating a standardized mean difference effect size (defined below).

Any published or unpublished research produced since 1950 that met the above criteria was considered suitable for our meta-analysis.

Search and Retrieval of Studies

An attempt was made to identify and retrieve the entire population of studies of school violence prevention programs that met the eligibility criteria specified above. The primary source was a comprehensive search of bibliographic databases, including Psychological Abstracts, Dissertation Abstracts International, ERIC (Educational Resources Information Center), U.S. Government Printing Office publications, National Criminal Justice Reference Service, and MedLine. Second, the bibliographies of previous meta-analyses and literature reviews (e.g., Durlak, 1995; 1997; Lösel & Beelmann, 2003; Wilson, Gottfredson, & Najaka, 2001), and the tables of contents of relevant journals were reviewed for eligible studies. Finally, the bibliographies of retrieved studies were themselves examined for candidate studies. Identified studies were retrieved from the university library, obtained via interlibrary loan, or requested directly from the author. We obtained and screened more than 95% of the reports identified as potentially eligible through these sources.

Coding of Study Reports

Effect Size Coding. Central to meta-analysis is the effect size statistic, which represents the quantitative findings of each study in a standardized way that permits comparison across studies. Study findings were coded to represent the mean difference on the outcome measures between the experimental conditions at the posttest measurement. The effect size statistic used for these purposes was the standardized mean difference (Cohen, 1988; Lipsey & Wilson, 2001), defined as the difference between the treatment and control group means on an outcome variable divided by their pooled standard deviation. This effect size statistic indexes the outcomes for the treatment group relative to the control group in standard deviation units.
Study Descriptor Coding. Studies of school-based violence prevention programs involve youth from a variety of different age groups, ethnicities, risk levels, and so forth, and use different procedures and methods to evaluate program effectiveness. Of course, the interventions themselves also differ across studies. Thus, in addition to the effect sizes, our meta-analytic database includes detailed information about each study's methods, subjects, treatments, and other such descriptive characteristics that may be relevant for understanding the results. The coded study information allows us to investigate relationships between various characteristics of the studies and the effect sizes those studies produce. In addition, statistical techniques can be used to control for many of the sources of error and natural differences between studies so that better estimates of the effects of intervention can be derived (Cooper & Hedges, 1994; Hedges & Olkin, 1985). The items coded to describe study methods and procedures included details of the design, measures, and attrition. Those coded to describe the subject samples included age, gender, ethnicity, and risk for antisocial behavior. The intervention was described by coding the type of program, duration, intensity, setting, and format of the program, delivery personnel, and other such characteristics.


Eligible studies were coded by trained research assistants familiar with social science research. Coding was done directly into a computer database using pre-formatted computer screens and supported by detailed computerized and paper coding manuals. The coding was reviewed by the first author and disagreements or questions were resolved through discussion with the coding team. In addition, to assess coder reliability, approximately ten percent of the studies were selected at random and recoded by a different coder. The reliability of the coding for the study descriptors was generally high. For categorical items, intercoder agreement ranged from 73% to 100%. For continuous items, the intercoder correlations ranged from .76 to .99. A copy of the full coding protocol is available from the authors.


General Analytic Procedures


Effect sizes based on small samples are known to be biased; to adjust for this, all effect sizes were multiplied by the small sample correction factor, 1 – (3/4n-9), where n is the sample size for the study (Lipsey & Wilson, 2001). Also, each effect size was weighted by its inverse variance in all computations so that its contribution was proportionate to its reliability (Hedges & Olkin, 1985). Examination of the effect size distribution identified a small number of outliers with potential to distort the analysis; these were recoded (i.e., Winsorized) to less extreme values (Hedges & Olkin, 1985; Lipsey & Wilson, 2001). In addition, several studies used unusually large samples. Because the inverse variance weights chiefly reflect sample size, those few studies would dominate any analysis in which they were included. Therefore, the extreme tail of the sample size distribution was recoded to a maximum of 250 subjects per intervention or control group for the computation of weights. These adjustments to outliers allow us to retain them in the analysis with high-end values, but make those values less extreme so that they do not exercise highly disproportionate influence on the analysis results.


To create sets of independent effect size estimates for analysis, only one effect size from each subject sample was used in any analysis. When more than one was available, the effect size from the measurement source (or informant) most frequently represented across all studies (e.g., teachers' reports, self-reports) was selected. We wanted to retain informant as a variable for analysis, so did not elect to average across effect sizes from different informants when more than one was reported. If there was more than one effect size from the same informant or source, however, their mean value was used.


Finally, many studies provided data sufficient for calculating mean difference effect sizes on the outcome variables at the pretest. In cases where pretest effect sizes were available, we adjusted the posttest effect sizes for pretest differences by subtracting the pretest value from the posttest value. In the regression models presented below, we tested whether there were systematic differences between effect sizes that were adjusted and those that were not by including dummy codes for adjustment in the regression models.


Our analysis of the effect sizes had several stages. For any given set of effect sizes, we first tested the homogeneity of the effect size distribution using the Q-statistic (Hedges & Olkin, 1985). When the Q-statistic showed significant variability in the effect sizes, moderator analyses were performed to identify the characteristics of the most effective violence prevention programs using weighted mixed effect multiple regression. The aggressive and disruptive behavior outcomes were our primary dependent variable.



General Effects of School Violence Prevention Programs


The literature retrieval and coding process yielded data from 372 school-based studies. The research studies included in this meta-analysis examined program effects on many different outcomes, ranging from aggression and violence to social skills, academic performance, and self-esteem. This report will focus primarily on the outcomes most relevant to school violence prevention, namely aggressive and disruptive behavior. However, we will first present the overall mean effect sizes and confidence intervals for each outcome category.

We have grouped each of the outcome measures into one of fourteen construct categories, described below:

    Aggressive and disruptive behavior: involves a variety of negative interpersonal behaviors including fighting, hitting, bullying, disruptiveness, acting out and the like.
    Problem behavior: measures that include both internalizing and externalizing behaviors (e.g., Child Behavior Checklist total score; Achenbach, 1991).
    Anger, hostility, rebelliousness, and the like.
    Activity level, attention deficit problems, ADD, ADHD.
    Antisocial peers: measures of whether the target child has antisocial peers or is supportive of peers' antisocial behavior.
    Substance use: including alcohol, tobacco, marijuana, and other substances.
    Social skills: including communication skills, social problem solving, conflict resolution skills, and the like.
    Social adjustment: measures of how well children get along with their peers, i.e., Do they have friends? Are they popular? Are they well-liked or rejected? etc.
    School performance: includes achievement tests, grades, school progress, etc.
    School participation: includes tardiness, truancy, absences, and dropout.
    Personal adjustment: includes measures of self-esteem, self-concept, and other measures of general well-being.
    Students' knowledge and attitudes about problem behavior.
    Internalizing: includes measures of anxiety, depression, and the like.
    Family outcomes: includes a wide range of family functioning variables.
The weighted mean effect sizes and 95% confidence intervals for each of the fourteen construct categories are shown in Figure 1. Confidence intervals that lie above the zero line are statistically significant, while those that cross this line are not significantly different from zero. The confidence intervals for all the variables except three do not cross the zero line. In general, therefore, school violence programs have positive effects on a wide variety of outcomes, ranging from aggressive behavior to internalizing problems such as anxiety. The effects are particularly strong in the area of social skills, which is not surprising since the majority of programs specifically target aspects of social behavior. Substance use, antisocial peers, and family relations are not significantly different from zero, indicating that the programs in our database did not significantly impact these outcomes. These are the three smallest categories of outcomes and were not generally the primary target of any of the programs we reviewed.

Figure 1. Weighted Means and Confidence Intervals for Each Outcome


General Characteristics of Studies Measuring Program Effects on Aggressive and Disruptive Behavior


Of the 372 school-based studies, 236 studies included outcomes related to aggressive and disruptive behavior. We excluded a small number of programs from that group (n=17) because they did not directly target aggressive behavior or behaviors closely related to aggression (e.g., social skills). These programs were either academically oriented or focused on a specific target population (e.g., children from divorced families). Although these programs may have important influences on aggressive behavior in school, these benefits are secondary to the programs' primary goals. In addition, most schools would not select programs of this type as their primary violence prevention strategy.


Therefore, this report focuses on 219 studies[i]. These studies generated over 600 posttest group comparison effect sizes on some form of aggressive or disruptive behavior and represent nearly 50,000 individual students. The general characteristics of these studies are shown in Table 1. Ninety percent of the studies were conducted in the United States and nearly 75% were conducted by researchers in the fields of psychology or education. Fewer than 20% of the studies were conducted prior to 1980 and most were published in peer-reviewed journals (60%), with the remainder reported in unpublished media such as dissertations, theses, conference papers, and technical reports.


The subject samples had a range of demographic characteristics. Most samples were comprised of a mix of boys and girls, but some all boy samples (16%) and a few all girl samples (6%) were also present. Minority children were well represented with about a third of the study samples having primarily minority youth. However, an additional third of the studies did not report ethnicity information on their subject samples. All school ages were included, from preschool through high school; the average age was around 10. A range of risk levels was also present, from general population students to indicated students already exhibiting aggressive behavior. Among the general population studies, about a third were conducted in low socioeconomic, disadvantaged areas.


A wide range of program and methodological characteristics is evident from Table 1. Most notable among these is that most studies were conducted mainly for research purposes with high levels of researcher involvement (i.e., research and demonstration programs), that nearly two-thirds of the programs were less than 20 weeks in length, and that almost 40% suffered from implementation problems.

Table 1. Characteristics of the Studies in the Meta-Analysis

aPercentages may not add up to 100 because of rounding.

b It was often impossible to distinguish between a study with no attrition between pretest and posttest and a study that reported only the number of subjects available at posttest. Thus, although no attrition and unreported attrition are clearly different, they are, of necessity, combined in the same category.

Program Format and Treatment Modality


The collection of school-based violence prevention programs analyzed here represents a wide range of intervention programming and illustrates the variety of strategies available for school-based intervention. We have divided these programs into four groups based on the general format of the programs and (in some cases) by treatment modality within format. The different formats tend to differ on a number of methodological, participant, and intervention characteristics that make it unwise to combine them into a single analysis. For example, universally delivered programs tend to have different kinds of student subjects than studies of programs that focus on youth selected or indicated on the basis of various risk factors. In addition to the differences in the subject samples in universal versus selected or indicated programs, we noticed that there were other important differences across studies, most notably in study procedure and method. Most notable was that nearly all of the studies of programs delivered under a selected/indicated approach used random assignment of individual subjects to produce treatment and comparison groups, while only a few studies of universal programs used individual randomization. Based on these differences in study and subject characteristics, and the possibility that these factors might be associated with study outcomes, we elected to separate the 219 studies into four separate format groups. This separation by format has the added benefit of allowing us to better identify important study and subject characteristics that are associated with positive outcomes for particular program formats. The four intervention formats are as follows:


    Universal programs: these programs are delivered in classroom settings to the entire classroom; children are generally not selected individually for treatment but receive treatment simply because they are students in a program classroom. However, schools are frequently selected because they are in low socioeconomic status and/or high crime neighborhoods. The children in these universal programs may be considered at risk by virtue of their socioeconomic background or neighborhood risk.
    Selected/Indicated programs: these programs are delivered to students who are selected especially to receive treatment by virtue of the presence of some risk factor, including disruptiveness, aggressive behavior, activity level, etc. Most of these programs are delivered to the selected children outside of their regular classrooms (and may use either group or individual formats), although some programs are delivered in the regular classrooms but are targeted for the selected children.
    Special schools or classes: these programs involve special schools or classrooms that (for the students involved) serve as a usual classroom or school. Children are placed in these special schools or classrooms because of some behavioral or school difficulty that is judged to warrant their placement outside of mainstream classrooms. The programs in this category include special education classrooms for behavior disordered children, alternative high schools, and schools within schools programs.
    Comprehensive/Multimodal programs: these programs generally involve multiple modalities and multiple formats, including both classroom-based and pull-out programs. They may also involve programs for parents and capacity building components for school administrators and teachers in addition to the programming provided for the students. The defining characteristic of these programs is that they include multiple treatment elements and formats.


The weighted mean effect sizes and 95% confidence intervals for each program format are shown in Figure 2. The confidence intervals for the selected/indicated, universal, and comprehensive programs are beyond the zero line, indicating that the means for these programs are statistically significant. In addition, the means for selected/indicated and universal programs are significantly different from the mean for comprehensive programs. Does this mean that selected/indicated or universal programs are "better” than comprehensive programs? We don't believe so. As mentioned above, there are important differences in study and subject characteristics across the different formats. Because of these differences, it makes little sense to compare outcomes across formats. For example, the selected/indicated programs are delivered to specially identified and selected children; it is not surprising that this format has the largest overall mean since the children in these programs likely have significant room for improvement. Although some comprehensive programs have components for such specially selected children, we would not expect the outcomes on a whole school to be similar to the kinds of outcomes that might be expected with a targeted programs on a specialized samples.

Figure 2. Weighted Mean Effect Sizes and Confidence Intervals by Program Format

        Results for Universal Programs

There were sixty-one universal programs in the database. All of these were delivered in classroom settings to entire classes of students[ii]. The overall weighted mean posttest effect size on aggressive behavior outcomes for the universal programs was .18. This was significantly different from zero. Four treatment modalities were used with the universal format, as shown in Table 2.

There were sixty-one universal programs in the database. All of these were delivered in classroom settings to entire classes of students[ii]. The overall weighted mean posttest effect size on aggressive behavior outcomes for the universal programs was .18. This was significantly different from zero. Four treatment modalities were used with the universal format, as shown in Table2.

Table 2. Treatment Modalities for Universal Format Programs

General Moderators of Observed Effects on Aggressive and Disruptive Behavior for Universal Programs  

Tests of the homogeneity of the effect sizes using the Q-statistic (Hedges & Olkin, 1985) showed significant variability in outcome across the 61 universal programs (Q60=87). That is, some studies produced effect sizes that were larger than the corresponding mean across studies while others produced effects that were smaller. This variation was expected to be associated with the nature of the interventions, subjects, and methods in the studies of universal programs. Our next step, therefore, was to identify the study characteristics most strongly associated with effect size. An inverse variance weighted multiple regression analysis was conducted using mixed effects models (Raudenbush, 1994). The dependent variable in these analyses was the effect size for aggressive and disruptive behavior.

The results of the weighted regression analysis for the universal programs are shown in Table 3. The model was statistically significant and left a non-significant residual; together, the variables in the model account for the significant variability in the study effects. Note that not all variables retained in the model were statistically significant. We elected to retain some important variables, though they were non-significant, to make their weak relationships explicit. Among the variables relating to study method and procedure, two variables representing study attrition and the form of measurement were retained in the model, although attrition was not significant. Archival and observational measures both produced larger effects than measures reported by teachers or by the subjects themselves. This finding illustrates that different types of measurement instruments can produce different results and suggests that the common self- and teacher-reported measures of aggressive behavior (e.g., Child Behavior Checklist) may not be sensitive to changes induced by typical universal violence prevention programs. Since paper-and-pencil teacher surveys are used so frequently in this kind of research, their link to actual classroom behavior should be established clearly.

Table 3. Weighted Mixed Effects Multiple Regression: Universal Programs

Of the three subject characteristics included in the model, only two were significant – age and the socioeconomic status by age interaction. Overall, programs delivered to groups of younger children produced larger effects than those for older children. The interaction effect describes even larger effects for younger, low socioeconomic status children. The significant interaction suggests that universal programs may be particularly beneficial for elementary schools in troubled areas. Note, however, that there are only three universal programs for high school children, so it is difficult to evaluate the effects of universal programs on the full spectrum of ages.


A variable associated with the operation and delivery of the programs was also retained in the model, though it was not significant. This variable identifies programs as routine practice programs versus programs delivered primarily by researchers for research or demonstration purposes. With the other variables controlled in the model, routine practice programs were not significantly different from researcher-involved programs.


Three attributes of the programs themselves were retained in the model. Two of these variables relate to the strength or dose of the intervention. Implementation quality was not significant in the model, but it was retained because we thought it important to make this finding explicit. The other dosage-related variable in the model (treatment duration) was significant and indicated that longer programs were associated with smaller effects. Although the longer programs tended to have more implementation difficulties, the lower effectiveness of the longer programs cannot be explained completely by implementation failures because the implementation variable is controlled in the model. It is possible that when delivered over a longer period of time, children may receive less intense treatment contact, resulting in reduced program effectiveness.


The final program attribute in the model is a code for the cognitively-oriented programs (the most common in this format category). Though the effect is negative, it is not significant. In general, this suggests that when other variables are controlled, there are no significant differences in effectiveness across the different treatment modalities used in universal programs. Note that we also tested the other treatment modalities and none were significant.


Though implementation quality was not a significant contributor to the regression model, Figure 3 shows that the relationship was in the expected direction. For the figure, we have converted the effect sizes to percentage estimates using Cohen's (1988) arcsine transformation. We have estimated that approximately 15% of students will be in fights at school using the Youth Risk Behavior Survey (Centers for Disease Control and Prevention, 2002). Assuming then that about 15% of students who do not receive violence prevention programming will get into a physical fight at school, the figure presents the estimated percentage of fighting for well-implemented versus poorly-implemented programs. The overall pattern is as expected, with smaller program effects for programs that had difficulties with implementation. For each treatment modality for which a comparison can be made, the average effect size for the programs without implementation difficulties is greater than that for similar programs with implementation problems.

Figure 3. Estimated Fighting Percentages by Treatment Modality and Implementation Quality

In the original article (Wilson, Lipsey, & Derzon, 2003), we did not separate the program formats (i.e., universal, selected/indicated, comprehensive, special). We did, however, present the results for the routine practice programs separately from the results of the research and demonstration programs. While both routine programs and implementation quality were important in our original analysis, they were notably not significant in the current analysis of the universal programs. This has mainly to do with our separation of the program formats. In the case of the universal programs, there were only seven routine practice programs. The weighted mean for these seven routine programs was smaller than the mean for the research and demonstration programs, but with only seven programs, the difference was not significant. The implementation quality variable also trends in the expected direction, but was not significant for the universal programs. As we shall see below, implementation quality is more important with some of the other program formats. It thus appears that the children's age and socioeconomic status were the stronger influences on the outcomes for the universal programs.


Results for Selected/Indicated Programs


There were 103 programs of this format, distinguished by their selective targeting of interventions to individually selected children. Nearly all of these programs were delivered outside of the classroom to small groups or to individual students. The overall weighted mean effect size for the selected/indicated programs was .29 and was significantly different from zero. Five treatment modalities were identified. Three of the five modalities, social skills training, counseling, and cognitively-oriented programs, had generally the same features as those used in the universal format. The behavioral programs, however, were typically implemented with small groups of children or with individual children rather than on a class-wide basis. The peer mediation programs were typical; the subjects in studies of mediation programs were those students who experience an interpersonal conflict and received mediation services from their peers[iii].

Table 4. Treatment Modalities for Selected/Indicated Program Formats

Tests of the homogeneity of effect sizes for the selected/indicated programs showed significant variability across studies (Q102=157). As with the universal programs, this variability was expected to be associated with methodological and substantive characteristics of the studies of selected/indicated programs. Our analysis for these programs proceeded much the same as the analysis for universal programs. An inverse variance weighted multiple regression analysis was fit to the data using mixed effects models (Raudenbush, 1994). The dependent variable, as above, was the effect size for aggressive and disruptive behavior. We have retained some nonsignificant method and study variables in the model to make explicit their weak relationship with effect size. The results of that analysis are shown in Table 5 below.

Table 5. Weighted Mixed Effects Multiple Regression: Selected/Indicated Programs

The model includes twelve variables relating to study method, subject characteristics, implementation, service personnel, and treatment modality. Regarding study method, effect sizes that were adjusted for pretest differences between the treatment and comparison groups were smaller than effect sizes that were not adjusted. Studies that experienced more attrition had smaller program effects than those with less attrition. And, the method of assignment to treatment and comparison groups was not significant in the model, indicating that with the other variables in the model held constant, experimental and quasi-experimental designs did not produce appreciably different results. Note that randomized designs made up the majority (70%) of programs in the selected/indicated format, while the universal format programs were evaluated predominantly with nonrandom designs.


Higher risk subjects achieved greater benefits from violence prevention programs than lower risk subjects. Note that overall, the students in these programs were generally at much higher risk than the students in universal programs. With the selected/indicated programs, a few lower risk children were involved, but the distinction here is mainly between students who are already exhibiting serious behavior problems and those who have risk factors that may lead to later problems. The age variable was marginally significant (and also positively correlated with subject risk level); programs with older students tended to show larger program effects than those with younger children.


Regarding the characteristics of treatment delivery, programs that did not experience implementation difficulties tended to produce larger effect sizes than programs that had problems with implementation. Beyond the implementation variable, delivery personnel and session format (one-on-one vs. group treatment) did not significantly influence program effects. Finally, the three dummy codes for cognitively-oriented programs, social skills programs, and counseling were not significant, indicating that differences in effectiveness across different treatment modalities were small and nonsignificant.


The finding here that implementation quality is important for the selected/indicated programs parallels the similar finding in our original publication. Also borne out in this update is the influence of subject risk level on program outcomes. However, there were no differences between routine practice and the research/demonstration programs in the present analysis. Overall, the mean for routine programs is smaller than that for research/demonstration programs, but the outcomes for the few routine practice programs we have are quite variable and spread over the four formats. By separating the studies into the four format categories, we have created more circumscribed sets of programs, but we cannot readily examine differences between routine practice and research/demonstration programs.

Results for Comprehensive or Multimodal Programs

There were 17 comprehensive school violence programs in our database, distinguished by their multiple treatment components and formats. The average number of distinct treatment components for comprehensive programs was four, whereas the universal and selected/indicated programs typically have one treatment component (and occasionally two or three). The studies of comprehensive programs tended to involve larger samples of students. In addition, comprehensive programs were generally longer than the universal and selected/indicated programs. The modal program covered an entire school year and almost half of the programs were longer than one year. In contrast, the average program length for universal and selected/indicated programs was about 20 weeks.


The overall mean effect size for comprehensive programs was significant, though small (ES mean=.06). The homogeneity statistic showed that there was significant heterogeneity in the set of 17 comprehensive programs, indicating that some programs produced larger effect sizes than others (Q16=34). Although the overall mean effect size for the comprehensive programs was small at .06, most of the individual program effect sizes were greater than zero. Identifying the characteristics that are associated with the most effective programs can help practitioners select appropriate comprehensive programming and provide insights into what aspects of program delivery might be important to emphasize.


Our analysis of the comprehensive programs proceeded similar to the analyses of heterogeneity for the universal and selected/indicated formats. Weighted multiple regression analysis was conducted to identify influential method, subject, and/or treatment characteristics. In this case, however, fixed effects models were used because the small number of studies involved. The results of that analysis are shown in Table 6.

Table 6. Weighted Fixed Effects Multiple Regression: Comprehensive Programs

Two variables relating to intervention dosage were retained in the model for comprehensive programs, treatment duration and treatment frequency. The total duration of the program in weeks was not significantly associated with effect size. However, the programs that had more sessions per week produced significantly greater reductions in antisocial behavior than programs with fewer sessions. Because the comprehensive programs have multiple treatment components and are (overall) longer than the other program formats, it seems that a critical factor is having the program elements delivered on a more frequent basis to the students.


Two variables describing the subjects were retained in the model; both the age and risk variables were significantly associated with effect size. Younger children tended to achieve greater benefits from comprehensive programs than older children. Note that the direction of effect for the risk variable is negative. This suggests that lower risk subjects show larger treatment outcomes from comprehensive treatments than do the higher risk subjects, which is counter to what we found for the selected/indicated programs. However, we believe the risk and age variables are confounded in this analysis. All of the programs for higher risk children were delivered to samples whose average age was below 9, while nearly all of the lower risk samples were over the age of 9. Thus the larger effects for younger children might be just as easily explained by their higher risk status. With both variables in the model, the shared age-risk variance seems to be accounted for by the age variable. Examination of the effect size distribution shows that there are no obvious differences in outcome across age for the lower risk children. For the higher risk children, the studies with the youngest children achieved the largest outcomes. Unfortunately, there were no programs for higher risk subjects older than age 9 so understanding the age differences in outcome for the higher risk children is difficult.


Results for Special Schools or Classes


There were 37 programs delivered in special schools or classrooms. Programs in special schools or classes generally involved an academic curriculum plus programming that targeted social or aggressive behavior. The students in these programs typically had serious behavioral (and often academic) difficulties that resulted in their placement outside of mainstream classrooms. The weighted mean posttest effect sizes for the special programs was .07 and was not statistically significant.


Although the special programs were not significant overall, the Q-test was significant, indicating that the distribution of effect sizes was heterogeneous. Therefore, we performed a weighted mixed effects multiple regression analysis to identify critical study variables. The results of that analysis shown in Table 7.

Table 7. Weighted Mixed Effects Multiple Regression: Special Classes and Schools

Three variables showed up as important moderators of effect size: method of group assignment, level of risk of students, and implementation quality. Nonrandom assignment methods resulted in larger effects. Among the four formats, the method of assignment was associated with effect size only for the special programs. Studies with implementation problems found smaller effects than studies in which programs were implemented well. Finally, programs with higher risk samples showed more positive results. The implementation variable is the largest in the model, indicating that fidelity to program components may be particularly critical for the special programs.


Summary and Conclusions


The issue addressed in this paper was the effectiveness of school-based programs for preventing or reducing aggressive and disruptive behavior. The results of 219 experimental and quasi-experimental studies of school-based psychosocial programs were synthesized using meta-analysis techniques. We divided the school violence prevention programs into four groups that represented distinct program formats: universal, selected/indicated, special classes/schools, and comprehensive. Overall, the universal, selected/indicated, and comprehensive programs were generally effective at reducing the more common types of aggressive behavior seen in schools, including fighting, name-calling, intimidation, and other negative interpersonal behaviors, especially among higher risk students. Overall, the special programs did not significantly impact aggressive and disruptive behavior, though some special programs were effective.


The mean effect size for selected/indicated programs was .29 for aggressive behavior. We can translate this into terms that are more concrete by converting it into typical levels of aggressive behavior in schools using the arcsine transformation (Cohen, 1988). According to the 1999 Youth Risk Behavior Survey, 14.2% of students reported being in a physical fight on school grounds in the year prior to the survey. For 1995 and 1997, 15.5% and 14.8% of students reported being in physical fights (Centers for Disease Control and Prevention, 2002). If we use these figures to estimate that about 15% of untreated school children will get into a fight during a school year, the overall effect size of .29 for selected/indicated programs translates into about a nine percentage point reduction in fighting. That is, if 15% of students who received no violence prevention programming were getting into fights before intervention, only about 6% of children in selected/indicated programs were getting into fights, less than half of the baseline rate. The most effective programs produced larger effects than this and, thus, would reduce rates of aggressive behavior even more. In addition, since many of the children in the selected/indicated programs were already exhibiting some problem behavior, it is likely that their baseline level of fighting behavior was higher than the general estimate of 15%. Thus, the reduction in aggressive behavior would be even greater.


For the universal programs, our mean effect size of .18 translates into about a five percentage point reduction in fighting over the 15% baseline for untreated children (from 15% to 10%). Though this effect size is smaller than that for the selected/indicated programs, this effect is not trivial. And, as with the selected/indicated programs, the most effective universal programs produced effects larger than this.


Not all programs were equally successful in reducing aggressive behavior. Treatment dose (in the form of treatment duration, frequency, or implementation quality) was uniformly influential. Programs with no or few implementation difficulties tended to produce greater reductions in aggressive behavior. In addition, comprehensive programs with greater session frequency per week were more effective than programs with fewer sessions. For universal programs, shorter programs appeared to be more effective than longer programs. We hypothesize that longer universal programs may be less intense than shorter ones and thus may not have the salience to influence student behavior.


In general, larger treatment effects were achieved with higher risk students. For the universal programs, students from high poverty, disadvantaged neighborhoods achieved the greatest benefits from violence prevention programming; this was especially true for children in early elementary grades. For the selected/indicated programs and special programs whose students were generally already exhibiting potentially problematic behavior, programs delivered to youth with more serious problems tended to have larger treatment effects.


Overall, the different treatment modalities within the universal and selected/indicated formats (e.g., social skills training, cognitively-oriented programs, behavioral programs, counseling) were not significantly different from each other; that is, the modalities appeared to be equally effective at reducing aggressive behavior.



This research was supported by grants from the National Institute of Mental Health (Mark W. Lipsey), the William T. Grant Foundation (Mark W. Lipsey), and funding from the National Institute of Justice to the first author. A bibliography of studies used in the meta-analysis is available from the first author. Correspondence concerning this article should be addressed to: Sandra Jo Wilson, Center for Evaluation Research and Methodology, Vanderbilt Institute for Public Policy Studies, 1207 18th Ave. South, Nashville, TN 37212, (615) 343-7215,

    Télécharger le fichier PDF ici

[i] The original meta-analysis published in the Journal of Consulting and Clinical Psychology included 221 studies, but only 172 of these involved treatment-control designs; thus the sample of studies presented here includes an additional 47 studies.


[ii] There were three universal programs that were delivered to entire classrooms, but certain children (those at risk) were selected for analysis. These were retained in the universal format category because the experiences of these children were more similar to the universal programs than the selected/indicated programs.


[iii] There are other peer mediation programs in the database, but the outcomes are measured either on the student mediators themselves or the whole school. There are two of these and they are included in the social skills category with the universal programs.


Achenbach, T. M. (1991) Integrative Guide to the 1991 CBCL/4-18, YSR, and TRF Profiles. Burlington, VT: University of Vermont, Department of Psychology

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Cooper, H., & Hedges, L. V. (Eds.). (1994). The handbook of research synthesis. New York: Russell Sage.

DeVoe, J. F., Peter, K., Kaufman, P., Miller, A., Noonan, M., Snyder, T. D., & Baum, K. (2004). Indicators of school crime and safety: 2004 (NCES 2005-002/NCJ 205290). U. S. Departments of Education and Justice. Washington, DC: Government Printing Office.

Durlak, J. A. (1995). School-based prevention programs for children and adolescents. Thousand Oaks, CA: Sage.

Durlak, J. A. (1997). Primary prevention programs in schools. Advances in Clinical Child Psychology, 19, 283-318.

Garofalo, J., Siegel, L., & Laub, J. (1987). School-related victimizations among adolescents: An analysis of National Crime Survey (NCS) narratives. Journal of Quantitative Criminology, 3, 321-338.

Goldstein, A. P., Harootunian, B., & Conoley, J. C. (1994). Student aggression: Prevention, management, and replacement training. New York: The Guilford Press.

Gottfredson, G. D., Gottfredson, D. C., Czeh, E. R., Cantor, D., Crosse, S., & Hantman, I. (2000). National study of delinquency prevention in schools. Final Report, Grant No. 96-MU-MU-0008. Ellicott City, MD: Gottfredson Associates, Inc. Available online:

Hedges, L. V., & Olkin, D. (1985). Statistical methods for meta-analyses. San Diego, CA: Academic Press.

Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications.

Lösel, F., & Beelmann, A. (2003). Effects of child skill training in preventing antisocial behavior: A systematic review of randomized evaluations. Annals of the American Academy of Political and Social Science, 587, 84-109.

Raudenbush, S. W. (1994). Random effects models. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 301-321). New York: Russell Sage Foundation.

Wilson, D. B., Gottfredson, D. C., and Najaka, S. S. (2001). School-based prevention of problem behaviors: A meta-analysis. Journal of Quantitative Criminology, 17, 247-272.

Wilson, S. J., Lipsey, M. W., & Derzon, J. H. (2003). The effects of school-based intervention programs on aggressive and disruptive behavior: A meta-analysis. Journal of Consulting and Clinical Psychology, 71(1), 136-149.

Read also

> Editorial
> Dynamique démocratique et violence scolaire
> Keeping Violence in Perspective
> La violence au préscolaire et au primaire : Aperçu de la situation canadienne
> Revues systématiques dans le champ criminologique et le groupe crime et justice de la collaboration Campbell
> Violence in school: a few orientations for a worldwide scientific debate
> Violence prevention : schools and communities working in partnership

<< Back