Kirkpatrick and Beyond A review of models of training evaluation
Tamkin P, Yarnall J, Kerrin M Report 392, Institute for Employment Studies, October 2002
a study supported by the IES Research Networks
Training evaluation is a bit like eating five portions of fruit and vegetables a day; everyone knows that they are supposed to do it, everyone says they are planning to do better in the future and few people admit to having got it right.
Despite Investors in People, despite high levels of future intention, many organisations are not satisfied that their methods of evaluating training are rigorous or extensive enough to answer questions of value to the organisation. And this is all despite the fact that there is a model of training evaluation that has been popular for decades, ie Kirkpatrick’s four-level model of training evaluation, beginning with immediate participant reactions to a training experience and ending with organisational impact.
It may be that the discomfort with our activity is because our models are no longer up to the job and need a serious overhaul. If this is so, who are the main contenders? In this review, we look critically at the Kirkpatrick model and compare it to others. We explore what these models might imply about the process of learning and changing behaviour, and we also review some of the research evidence on training evaluation that throws light on issues associated with evaluating at different levels.
Kirkpatrick
Kirkpatrick developed his four-step model in 1959 and provided a simple and pragmatic model for helping practitioners think about training programmes. It has, however, been criticised for implying a hierarchy of value related to the different levels, with organisational performance measures being seen as more important than reactions. More fundamentally, there have been criticisms of the assumption that the levels are each associated to the previous and next levels. This implied causal relationship has not always been established by research. Other complaints are that the model is too simple and fails to take account of the various intervening variables affecting learning and transfer.
Descendant models
In response, others have developed models of their own that purport to resolve some of these difficulties. Several of these might be thought of as Kirkpatrick progeny, in that they take much that was inherent in the original model and extend it either at the front end, with the inclusion of training design or needs analysis, or back end, with an evaluation of societal outcomes - and sometimes both. We consider six models in detail and a further five more briefly.
Fresh blood
Other models are unrelated to Kirkpatrick, having a rather different approach to how training evaluation might take place. These include:
- responsive evaluation (Pulley, 1994), which focuses on what decision makers in the organisation would like to know and how this might be met
- context evaluation (Newby, 1992), which focused on appropriate evaluation for different contexts, and
- evaluative enquiry (Preskill and Torres, 1999), which approaches evaluation as a learning experience using dialogue, reflection and challenge to distil learning opportunities, to create a learning environment and to develop enquiry skills.
The final group of models emphasise the importance of different measures of impact, including the learning outcomes approach of Kraiger et al. (1993) linking training evaluation to cognitive, skill-based and affective learning outcomes, and the balanced scorecard approach of Kaplan and Norton (1996), which focuses on different perspectives of finance, customers and internal processes.
An underlying model
All of the models tacitly base themselves on an assumption that there is a chain of impact from a developmental process to individual learning, changed behaviour and resulting organisational or social impact. However, they rarely make such a model explicit and therefore all are open to the criticism that they ignore some of the key variables that impact on this chain of events. We explore what such a model of learning might include, that recognises the intervening factors affecting the strength of the relationship between one link in the chain and the next. Such a model is not necessarily a model of training evaluation with all the complexity inherent in a model of learning; rather it is meant to support the practitioner to undertake sensible and coherent evaluation of use to the organisation. Inevitably, this also involves knowing what not to evaluate, and simplification is a vital part of the evaluation process.
Evidence on issues affecting evaluation
Evaluating at different links in the chain (or at different Kirkpatrick levels) is affected by different variables. The evaluation ought at least to be cognisant of these, as they can affect the ability of one level to affect the level that follows it.
Reaction
At the reaction level, research has shown that there is relatively little correlation between learner reactions and measures of learning, or subsequent measures of changed behaviour. It has been suggested that ‘satisfaction’ is not necessarily related to good learning and sometimes discomfort is essential. Mixed results may indicate that what is measured at the reaction level stage might be important, and more focused reaction level questionnaires may be more informative about the value of training.
Learning
There is much literature encouraging the use of before-and-after questionnaires to gauge learning gain from courses. Some have urged caution, raising concerns that a trainee might be able to repeat what they have learnt but may not be able to apply it, that performance during training may not be a predictor of post-training performance, that testing may not be appropriate for measuring the attainment of soft skills, or indeed for skills in general.
Behaviour change
There are a wealth of studies that comment on the failure of training to transfer into the workplace and which have identified a range of organisational factors that inhibit success. Some have identified the importance of organisational culture and learning confidence. The more difficult an individual finds the training, the less likely they are to be able to apply it; the more supportive line managers are, the more likely the application of learning. Other important factors are perceived usefulness and job autonomy and commitment.
Similarly, there are a number of individual factors that influence transfer and application of learning; self-efficacy, motivation to learn, and general intelligence have all been associated.
Not surprisingly, several researchers have suggested that evaluation of behaviour change needs to become much more complex to take account of these factors. There have been suggestions of using manager- and self-assessment, but with concerns that they are not always accurate.
Organisational results
Whilst this is probably the most difficult level of evaluation, many writers have expounded the view that training must be evaluated using hard outcome data. The difficulties of doing so tend to be dismissed by these researchers. Others, however, express caution, pointing out the many assumptions that are made, or the inherent difficulties in linking soft skills training to hard results, the time delays that are rarely taken into account, and that hard measures miss much that is of value.
Organisational activity
The evidence from a range of research studies indicates that training evaluation has been steadily becoming more common, but that the predominant level of analysis is level 1, with very few attempting levels 3 or 4. Surprisingly, despite the emphasis on measuring business results, relatively few companies with comprehensive training evaluation, try to justify training spend.
Our review would indicate that although there is an abundance of models that purport to improve on the Kirkpatrick model, there is a huge similarity in many of the models now on offer. The trends have been to extend the model to include the foundations for training and take into account the need that the training is meant to address. At the other end, the model extends to include measures of societal impact. The overall conclusion, however, is that the model remains very useful for framing where evaluation might be made. Organisations would do well to consider some of the findings of the issues that affect the linkage between the levels in the model. They need to think much more carefully about how they structure their reaction questionnaires, about the other factors that can inhibit the transfer of learning to the workplace, and what they might do to maximise impact.
The important message is: to conduct the best evaluation possible, that provides information that meets the needs of the organisation, within the inevitable constraints of organisational life.
Kirkpatrick and Beyond: A review of models of training evaluation, Tamkin P, Yarnall J, Kerrin M. Report 392, Institute for Employment Studies, 2002. ISBN: 978-1-85184-321-3. £19.95. [PDF price: £8.00]
|