This approach compares actual postprogram data to targets set in prior periods, usually before implementation of the program. The analyst sets specific goals and targets for pre-established evaluation criteria for known time periods, and obtains data on the performance that actually occurs. Finally, the analyst compares actual performance to target performance, and seeks plausible explanations for differences that might have been brought about by program and nonprogram factors. In practice, this approach has been modified to involve comparison of actual program performance with implied rather than explicit targets. Targets can be set each year for one or more years in advance, and annual evaluations can be made of programs that have existed for a number of years (where preprogram data may not have much relevance). Although this method may be helpful in revealing year-to-year and other short-term changes, it does not allow us to determine the extent to which changes can be attributed to the policy or program.
These types of designs, which do not have equivalent experimental and control groups or pretests and posttests to help measure and determine the causes of change in key criteria, present a variety of interpretation problems, referred to as problems of internal and external validity. Internal validity refers to the ability todetermine whether unequivocal conclusions can be drawn about the experiment itself, and external validity refers to the ability to generalize from the experiment to other settings. In general, internal-validity problems for simple evaluation designs described above include the inability to determine whether observed changes occurred because of the program or because of non-experimental events. Such events include learning by or maturing of participants, improved scoring on a posttest as a result of taking a pretest, changes in measurements or procedures, sampling errors, false conclusions drawn from statistical tests, use of treatment and comparison groups that are not equivalent, dropping out of participants, and uneven growth or maturation of experimental and comparison groups. Any of these conditions may call into question conclusions about the particular experiment or evaluation, including its application to other settings. When research results are replicated in various settings, they tend to acquire greater credibility or external validity. It is important to be aware, however, that repetition is not replication. The republishing of a single analysis in different contexts, for example, adds nothing to its validity. Consequently, analysts must distinguish between similar experiments that have been replicated in a number of settings and research results that are merely reprinted in numerous sources.