Accreditation Process: 4 Year Countdown

One of the main purposes of assessment is to determine if programming is in fact resulting in student learning. However, this can be tricky. Most faculty are experts in their discipline and adept at data, but what if the data being gathered cannot accurately tell us if or what a student learned? This may in fact be so common that it is one of the primary reasons programs do not make changes to curriculum based on assessment findings (Banta & Blaich, 2011).

To make evaluative statements about the effectiveness of educational programs, we must assess if students achieve intended learning and development outcomes. One barrier to using outcomes assessment results for program improvement (i.e., closing the assessment loop) is that we do not always know how to make changes because we do not always know where the problems exist.

This barrier can be addressed, in part, via implementation fidelity data, which is coupled with outcomes assessment data to inform more valid inferences about program effectiveness. facilitate accurate communication about the quality of programs and to guide program and learning improvement.

Dr. Sara J Finney, Professor in the Department of Graduate Psychology at James Madison University, delves into this in her webinar, “Using Implementation Fidelity Data to Evaluate & Improve Program Effectiveness,” which is the basis for this article.

Purpose of Assessment: Improvement
Faculty and staff strive for educational excellence in several ways. Programming is designed to support and improve student performance. However, it can be very challenging to determine if performance improves, and if so what intervention it can be attributed to. This is the priority of student learning outcomes assessment, but are the interventions being evaluated based on their effectiveness and against the right data?

To begin, let’s look at the typical assessment cycle:

  • Identify outcomes
  • Tie programming to outcomes
  • Select instruments for measurement
  • Collect data
  • Analyze and interpret data
  • Use results to make decisions (close the loop).

That last step is not happening as often as everyone hoped it would, and it is the most critical part of the process.

Why Aren’t We “Closing the Loop”?

You are likely familiar with the well-known article by Trudy Banta and Charles Blaich, which began with a simple research question: how are student learning outcome assessment results used to improve teaching, learning, and student services programs. Surprisingly, they could only find a handful of examples of any tangible improvements. Although most campuses had some kind of process for outcomes assessment, few used the results for actual change in programming.

Why?

  • Confusion about the purpose and process of outcomes assessment
  • Not enough time or resources allocated to the work
  • No institutional value or reward for assessment
  • Lack of understanding of learning and development theories to purposefully build, evaluate, and improve programming (Bresciani 2011)

Finney offers an additional reason: lack of knowledge of the programming students actually received. How can we expect faculty and staff to improve a program without knowing what is actually (or not actually) occurring in the program?

An Example:
Baseline: Average Ethical Reasoning in 2017 = 30
Programming: Curriculum/Program Implemented
Outcome: Average Ethical Reasoning in 2018 = 80
Reaction: The program may be effective with respect to increasing students’ ability to engage in ethical reasoning.

Baseline: Average Ethical Reasoning in 2017 = 30
Programming: Curriculum/Program Implemented
Outcome: Average Ethical Reasoning in 2018 = 30
Reaction: The curriculum & programming doesn’t work!

Baseline: Average Ethical Reasoning in 2017 = 30
Programming: Although created, NO “Ethical Reasoning” Curriculum/Program Implemented
Outcome: Average Ethical Reasoning in 2018 = 30
Reaction: We may observe this result when NO intervention is actually implemented.

Baseline: Average Ethical Reasoning in 2017 = 30
Programming: Poor implementation of “Ethical Reasoning” Curriculum/Program
Outcome: Average Ethical Reasoning in 2018 = 30
Reaction: We may observe this result when uncoordinated, “sloppy”, incomplete curriculum/ programming is implemented

So…which is it?

  • Program doesn’t “work”
  • Program wasn’t implemented
  • Program was implemented but poorly

As we see in the example, when using the typical assessment cycle you cannot choose between these three possibilities, which limits the use of assessment results for program improvement.

Even as programs attempt to close the loop based on the results they have and implement changes or high impact practices to improve learning, we are falling short. A recent study questioning the value of “high impact practices” mistakenly assumes that just making them available suffices. How they are implemented is crucial, George Kuh and Jillian Kinzie write. “Simply offering and labeling an activity an HIP does not necessarily guarantee that students who participate in it will benefit in the ways much of the extant literature claims.” Implementation quality is critical.

Enter: Implementation Fidelity.

Incorporating Implementation Fidelity into the Assessment Cycle

Implementation fidelity is not a new concept but goes by other names (Opportunity to Learn (OTL) in K‐12 education, Designed vs. Delivered Curriculum in higher education, Manipulation Fidelity/Check in experimental research, Treatment Integrity in behavioral consultation).

For Institutional Effectiveness, it fits in between the third and fourth steps of the assessment cycle:

  • Identify outcomes
  • Tie programming to outcomes
  • Select instruments for measurement
  • IMPLEMENTATION FIDELITY
  • Collect data
  • Analyze and interpret data
  • Use results to make decisions (close the loop)

To translate assessment evidence into program improvement, we need to know when gathering more data would help focus and clarify potential actions, and we need to know what data is needed. Specifically, what programming the students actually received.

When developing a program, a great deal of attention is given to designing the curriculum and training the instructor or faculty member, but less (sometimes none) is devoted to assessing the alignment of the planned vs. the actual implemented programming.

An analogy of a “black box” is helpful when explaining the premise. We conceptualize a program as

Planned intervention → Outcome → Outcome Measure

In fact, we need to know the actual intervention, which is usually unknown (in a black box). To open the box and see the actual intervention we use implementation fidelity.

An example:
Planned Intervention: Drug Intervention: 4 pills per day for 2 months
Outcome: Eliminate presence of disease
Outcome Measure: Blood test
Reaction: If patient tests positive for disease after, does that mean the planned intervention (drug treatment) is ineffective?
Actual Intervention/ Fidelity Measures: Record exact number of pills taken for number of days

Planned Intervention: Physical Fitness Program: diet and exercise regimes
Outcome: Become more physically fit
Outcome Measure: Weight, measurements, stamina, pictures
Reaction: If person doesn’t lose weight, change in measurements, or stamina, & pictures look the same as prior to the program, does that mean the planned fitness program is ineffective?
Actual Intervention/ Fidelity Measures: Record food intake and exercise precisely

Planned Intervention: Civic engagement program: speakers, debates, videos
Outcome: Increase value of civic engagement
Outcome Measure: Essay detailing the importance of civic engagement; self reported valuation measure
Reaction: If students can’t articulate the value of civic engagement or their self‐reported value doesn’t increase, does that mean the planned curriculum is ineffective?
Actual Intervention/ Fidelity Measures: Record curriculum and activities actually completed

Adding the implementation fidelity step to the process and assessing if the actual intervention is being measured yields more reliable data that can then be used to inform program decisions.

In the table below, consider if fidelity assessment was not performed – we would be making uninformed assumptions about curriculum, intervention, and program quality. This could lead to no actions being taken (even though they may be needed) or trying to fix something that may not impact student learning. Implementation fidelity helps us better interpret results, favorable or not, to then make informed decisions.

Realities: 1
Fidelity Assessment Results: High (+)
Outcomes Assessment Results: Good (+)
Common Conclusions without Fidelity Data: “Program” looks great!
More Accurate Inferences with Fidelity Data: Program may be effective.

Realities: 2
Fidelity Assessment Results: Low (‐)
Outcomes Assessment Results: Poor (‐)
Common Conclusions without Fidelity Data: “Program” is not working.
More Accurate Inferences with Fidelity Data: No conclusions can be made about the planned program.

Realities: 3
Fidelity Assessment Results: High (+)
Outcomes Assessment Results: Poor (‐)
Common Conclusions without Fidelity Data: “Program” is not working.
More Accurate Inferences with Fidelity Data: Program is ineffective in meeting outcomes.

Realities: 4
Fidelity Assessment Results: Low (‐)
Outcomes Assessment Results: Good (+)
Common Conclusions without Fidelity Data: “Program” looks great!
More Accurate Inferences with Fidelity Data: No conclusions can be made about the planned program.

Implementation Fidelity in a Nutshell

These components can be used to create a rubric to assess implementation fidelity. See examples from Dr. Finney and Dr. Gerster.

  • Program Differentiation
    • Definition: detailing specific features of a program that theoretically enable students to meet intended outcomes. Essential for assessing other fidelity components.
    • Assessment: not “assessed”; involves describing specific activities & curriculum. Completed as Step 2 in assessment cycle (mapping programming to outcomes).
  • Adherence
    • Definition: whether or not specific features of the program were implemented as planned.
    • Assessment: recording whether or not (i.e., “yes” or “no”) each specific program feature was implemented.
  • Quality
    • Definition: how well the program was implemented or caliber of delivered program features.
    • Assessment: rating the quality of implementation (e.g., 1 = Low to 5 = High).
  • Exposure
    • Definition: extent to which all students participating in the program receive the full amount of treatment.
    • Assessment: recording duration of program components and/or proportion of program participants that received the component
  • Responsiveness
    • Definition: receptiveness of those exposed to the treatment.
    • Assessment: rating levels of engagement (e.g., 1 = Not engaged to 5 = Very engaged).

Once a rubric is developed, it is critical to use valid examples (live program, video of program, program materials) and have multiple sources of raters (independent auditors, program facilitators, program participants).

Other Advantages to Implementation Fidelity
“An effective assessment program should spend more time and money on using data than on gathering it” (Banta & Blaich, 2011). It makes sense to prioritize collecting data that informs use.

By incorporating Implementation Fidelity into the Assessment Cycle, we have much more information about the actual programming students received and can make more valid inferences about the impact of our programming.

Finally, including faculty and staff engages them in the process of closely examining the actual program, which can fuel their desire to examine student learning and development outcome results, as well as facilitate their interpretation and use of assessment results for program improvement.

Other Resources:

Leave your comment