tkg_logo
bar

Kirkpatrick’s 4 Levels of Evaluation Model

Donald L. Kirkpatrick, Professor Emeritus, University of Wisconsin (BBA, MBA and PhD), first published his ideas in 1959, in a series of articles in the Journal of American Society of Training Directors. The articles were later incorporated in Kirkpatrick's book Evaluating Training Programs (published in 1994; now in its 3rd edition - Berrett-Koehler Publishers).

Donald Kirkpatrick was president of the American Society for Training and Development (ASTD) in 1975. In 1994 Kirkpatrick's book Evaluating Training Programs reiterated his originally published ideas from 1959, thereby widely expanding awareness of them, so that his theory has now become arguably the most widely-used and popular model for the evaluation of training and learning. Kirkpatrick's four-level model is now considered an industry standard across the HR and training communities. Kirkpatrick’s 4 levels of evaluation model measure:

  1. reaction of learner - what they thought and felt about the training
  2. learning level- the resulting increase in knowledge or capability
  3. behavior modification - extent of behavior and capability improvement and implementation/application
  4. real results - the effects on the business or environment resulting from the trainee's performance

All these measures are recommended for full and meaningful evaluation of learning in organizations, although their application broadly increases in complexity, and usually cost, through the levels from level 1-4.

Kirkpatrick’s Simplified Structure

Level

Evaluation Type

Description and Characteristics

Examples of Evaluation Tools 
and Methods

Relevance and Applicability

1

Reaction

How the delegates felt about the training or learning experience.

'Happy sheets', feedback forms. 
Verbal reaction, post-training surveys or questionnaires.

Quick and very easy to obtain. 
Not expensive to gather or to analyze.

2

Learning

The measurement of the increase in knowledge - before and after.

Assessments before and after the training. 
Interview or observation can also be used.

Relatively simple to set up; clear-cut for quantifiable skills. 
Less easy for complex learning.

3

Behavior

The extent of applied learning back on the job.

Observation and interviews over time are required to assess change, relevance, and sustainability of change.

Measurement of behavior change typically requires cooperation and skill of line-managers.

4

Results 

The employee’s effect on the business and work environment as a result of the training.

Measures are normally in place - management systems and reporting - the challenge is to relate to the employee.

Individually not difficult; unlike whole organization. 
Process must attribute clear accountabilities.

Kirkpatrick’s Detailed Structure

Level

Type

Description and Characteristics

Examples of Evaluation Tools
and Methods

Relevance and Applicability

1

Reaction

How the learners felt, and their personal reactions to the training or learning experience, for example: 
Did the trainees like and enjoy the training? 
Did they consider the training relevant? 
Was it a good use of their time?
Did they like the venue, the style, timing, domestics, etc? 
Level of participation. 
Ease and comfort of experience. 
Level of effort required to make the most of the learning. 
Perceived practicability and potential for applying the learning.

Typically 'happy sheets'. 
Feedback forms based on subjective personal reaction to the training experience. 
Verbal reaction which can be noted and analyzed. 
Post-training surveys or questionnaires. 
Online evaluation or grading by delegates. 
Subsequent verbal or written reports given by delegates to managers back at their jobs.

Can be done immediately the training ends. 
Very easy to obtain reaction feedback 
Feedback is not expensive to gather or difficult to analyze for groups. 
Important to know that people were not upset or disappointed. 
Important that people give a positive impression when relating their experience to others who might be deciding whether to experience same.

2

Learning

The measurement of the increase in knowledge or intellectual capability from before to after the learning experience: 
Did the trainees learn what what intended to be taught? 
Did the trainee experience what was intended for them to experience? 
What is the extent of advancement or change in the trainees after the training, in the direction or area that was intended?

Assessments before and after the training. 
Interview or observation can be used before and after although this is time-consuming and can be inconsistent. 
Methods of assessment need to be closely related to the aims of the learning. 
Measurement and analysis is possible and easy on a group scale. 
Reliable, clear scoring and measurements need to be established, so as to limit the risk of inconsistent assessment. 
Hard-copy, electronic, online or interview style assessments are all possible.

Relatively simple to set up, but more investment and thought required than reaction evaluation. 
Highly relevant and clear-cut for certain training such as quantifiable or technical skills. 
Less easy for more complex learning such as attitudinal development, which is famously difficult to assess. 
Cost escalates if systems are poorly designed, which increases work required to measure and analyze.

3

Behavior

The extent to which the trainees applied the learning and changed their behavior, and which can be immediate and may continue for several months after the training. 
Did the trainees put their learning into effect when back on the job? 
Were the relevant skills and knowledge used 
Was there noticeable and measurable change in the activity and performance of the trainees when back in their roles? 
Was the change in behavior and new level of knowledge sustained? 
Would the trainee be able to transfer their learning to another person? 
Is the trainee aware of their change in behavior, knowledge, skill level?

Observation and interview over time are required to assess change, relevance of change, and sustainability of change. 
Arbitrary snapshot assessments are not reliable because people change in different ways at different times. 
Assessments need to be subtle and ongoing, and then transferred to a suitable analysis tool. 
Assessments need to be designed to reduce subjective judgment of the observer or interviewer, which is a variable factor that can affect reliability and consistency of measurements. 
The opinion of the trainee, which is a relevant indicator, is also subjective and unreliable, and so needs to be measured in a consistent defined way. 
360-degree feedback is useful method and need not be used before training, because respondents can make a judgment as to change after training, and this can be analyzed for groups of respondents and trainees. 
Assessments can be designed around relevant performance scenarios, and specific key performance indicators or criteria. 
Online and electronic assessments are more difficult to incorporate - assessments tend to be more successful when integrated within existing management and coaching protocols. 
Self-assessment can be useful, using carefully designed criteria and measurements.

Measurement of behavior change is less easy to quantify and interpret than reaction and learning evaluation. 
Simple quick response systems unlikely to be adequate. 
Cooperation and skill of observers, typically line-managers, are important factors, and difficult to control. 
Management and analysis of ongoing subtle assessments are difficult, and virtually impossible without a well-designed system from the beginning. 
Evaluation of implementation and application is an extremely important assessment - there is little point in a good reaction and good increase in capability if nothing changes back in the job, therefore evaluation in this area is vital, albeit challenging. 
Behavior change evaluation is possible given good support and involvement from line managers or trainees, so it is helpful to involve them from the start, and to identify benefits for them, which links to the level 4 evaluation below.

4

Results 

The effect on the business and work environment resulting from the improved performance of the trainee - it is the acid test. 
Measures would typically be business or organizational key performance indicators, such as: 
Volumes, values, percentages, timescales, return on investment (ROI), and other quantifiable aspects of organizational performance, for instance; numbers of complaints, staff turnover, attrition, failures, wastage, non-compliance, quality ratings, achievement of standards and accreditations, growth, retention, etc.

It is possible that many of these measures are already in place via normal management systems and reporting. 
The challenge is to identify which and how relate to to the trainee's input and influence. 
Therefore it is important to identify and agree accountability and relevance with the trainee at the start of the training, so they understand what is to be measured. 
This process overlays normal good management practice - it simply needs linking to the training input. 
Failure to link to training input type and timing will greatly reduce the ease by which results can be attributed to the training. 
For senior people particularly, annual appraisals and ongoing agreement of key business objectives are integral to measuring business results derived from training.

Individually, results evaluation is not particularly difficult; across an entire organization it becomes very much more challenging, not least because of the reliance on line-management, and the frequency and scale of changing structures, responsibilities and roles, which complicates the process of attributing clear accountability. 
Also, external factors greatly affect organizational and business performance, which cloud the true cause of good or poor results.

 

Click here to compare how SOJT and TWI meet Kirkpatrick’s Evaluation Model.