Kirkpatrick's Framework for Categorising Training Outcomes
Following the post-2008 worldwide economic slump, businesses have continued to keep a tight control on their costs and expenditure. At the same time, they have also sought to remain competitive in their markets by keeping abreast of the latest industry developments and progress. As such, senior executives have often come to see training and development, on the one hand, as one of a number of competing internal requests for investment. But, on the other hand, it is also the potential source of competitive advantage. As a result of this tension, HR business leaders are under increased pressure from senior managers to justify the value of training by providing supporting evidence, such as business cases and ROI forecasts. However, studies in recent years have suggested that fewer than five percent of organisations are able to provide any hard data on how their investment in learning and development has affected their bottom line (Bersin, 2013). Indeed, training managers in the corporate learning function have routinely identified measurement and evaluation as their top challenge (Bersin, 2006). This paper discusses the challenges of measuring the business impact of learning and development within organisations. It discusses the advantages and disadvantages of Donald L. Kirkpatrick's four-level framework (1998) for evaluating the effectiveness of training programmes, before drawing conclusions about its relevance in today's economic environment.
The tension between the benefit and cost of training also characterises the literature that investigates the relationship between training, human resource, employee performance and financial outcomes. Some argue that workplace learning is essential for an organisation's competitiveness and believe that substantial investments should lead to improved performance and/or results (Salas & Cannon-Bowers, 2001). Others, conversely, criticise training for not transferring to the job and being too expensive (Kraiger, 2003) and question the link between training and results criteria (Alliger et al, 1997). The contrasting opinions point strongly towards a lack of consensus, both practically and theoretically, about how to evaluate learning and development programmes. In order to understand the reasons for a lack of confidence in training evaluation, it is helpful to consider current practice. The best-known model for evaluating training programmes was developed by Donald Kirkpatrick in the late 1950s. A cursory glance at popular business websites today shows how his four-level framework continues to characterise training evaluation models today. The following section describes the model in more detail before discussing the benefits and disadvantages, which may underpin to the on-going cost-benefit debates.
In his model, Kirkpatrick set out to evaluate the impact of training by assessing the following key areas: (1) reaction, or the extent to which learners were satisfied with the programme; (2) learning, or the extent to which learners took on board the course content; (3) behaviour, or the extent to which learners applied their knowledge in role; and (4) results, or the extent to which targeted outcomes were achieved, such as cost reduction, increased quality and productivity.
Level One: Reaction
Results at Level One are typically measured by means of post-training questionnaires which encourage participants to appraise criteria such as the topic, materials, and instructor. Reaction level evaluation is popular with training professionals as it is relatively easy to administrate and provides immediate information to managers and supervisors about how valuable participants found the programme. Indeed, Morrow et al (1997) describe how some professionals choose to rely solely on this level of evaluation. However, to use the reaction-level exclusively as an accurate measure of training effectiveness is to overlook its limitations. 'Smile sheets' (Davis et al 1998) do not indicate the extent to which participants have internalised the programme's goals, nor do they offer direct insight into how the organisation will benefit from the investment. Indeed, participants' subjective responses may be influenced by a wide variety of personal factors, from lack of interest in the topic, to personal problems and distractions. By responding to this level of feedback in isolation, organisations risk revising programmes needlessly (Aldrich, 2002). Clearly, organisations need to consider further, complementary levels of evaluation to generate a more holistic view of training's impact.
Level Two: Learning
Learning results are frequently measured either by end-of-training examination, or by participants' self-assessment about whether learning expectations have been met. Whereas the latter evaluation method remains open to criticism about participants' subjectivity, the former does not necessarily indicate whether the participant can transfer and apply their classroom knowledge to the workplace. Indeed, research still quoted today suggests that only 10%-30% of training content translates to the workplace knowledge and skills (Ford & Weissbein, 1997). As Wisher et al (2001) point out , data sources need to be unbiased, understandable and immune to irrelevant influences if they are to indicate accurately a training session's effectiveness. Thus, Level Two, like Level One, remains a useful source of information, but is not substantiated by hard facts and therefore cannot be relied on exclusively as a measure of effectiveness.
Level Three: Behaviour
Kirkpatrick's third level aims to measure the continuity between learning and practice by assessing how training participants apply their new knowledge and skills in the workplace. Traditionally, this would have been measured subjectively by supervisors, whose evaluation skills and working relationships with the employee would inevitably vary greatly. However, increasingly, technological solutions are used to assess objectively and consistently whether a participant can apply their knowledge and skills to perform tasks, take actions and solve problems (Galloway, 2005). As technology advances, it is likely that these indicators of proficiency and competency will become more sophisticated and accurate. Thus, Level Three evaluation attempts to address the barriers to transfer that Levels One and Two both neglect. In doing so, it contributes to an organisation's understanding of the strengths and weaknesses of its training and development process. It permits the identification of successful participants one the one hand, and, on the other, creates the opportunity to reinforce important points to those who have not grasped them. As such, Level Three evaluation begins to indicate how well training is aligned with certain organisational goals and the likelihood of achievement (Phillips, 1994).
Level Four: Results
Evaluation at Kirkpatrick's fourth level aims to produce evidence of how training has a measurable impact on an organisation's performance. Hard data, such as sales, costs, profit, productivity, and quality metrics are used to quantify the benefits and to justify or improve subsequent training and development activities. For business leaders, this is arguably the most important level of evaluation. Yet, it is also the most difficult level to understand, define and execute well. As Wile (2009) points out, the challenge is to connect the results specifically to the training. Not only is it necessary to identify the most relevant measures, but it is also essential to attribute any change in those measures to the intervention of training.
Kirkpatrick's model is relatively simple to understand and presents a useful taxonomy for considering the impact of training programmes at different organisational levels. As discussed above, there are risks and weaknesses to using the individual levels in isolation. However, Kirkpatrick did not mean for the framework to be so used. Rather, each level of evaluation is intended to answer whether a fundamental requirement of the training program was met, with a view to building up a picture of the whole-business impact of the training. All levels are important as they contain diagnostic checkpoints for their predecessors enabling root cause analysis of any problems identified. For example, if participants did not learn (Level Two), participant reactions gathered at Level One (Reaction) may reveal barriers to learning that can be addressed in subsequent programmes. Thus, used correctly, the evaluation framework can benefit organisations in a number of ways.
Firstly, the evaluation framework can validate training as a business tool. Training is one of many options that can improve performance and profitability. Proper evaluation allows comparisons and informed selection in preference to, or in combination with, other methods. Secondly, effective evaluation can justify the costs incurred in training. When the money is tight, training budgets are amongst the first to be sacrificed. Only by thorough, quantitative analysis can training departments make the case necessary to resist these cuts. Thirdly, the right measurement and feedback can help to improve the design of training. Training programmes need continuous improvement and updating to provide better value and increased benefits. Without a formal evaluation, the basis for change is subjective. Lastly, systematic evaluation techniques can allow organisations to make informed choices about the best training methods to deliver specific results. A variety of training approaches are available at different prices with different outcomes. By using comparative evaluation techniques, organisations can make evidence-based decisions about how to get the most value for money, and thereby minimise the risk of wasting resources on ineffective training programmes.
Despite its popularity, Kirkpatrick's model is not without its critics. Some argue that the model is too simple conceptually and does not take into account the wide range of organisational, individual, and design and delivery factors that can influence training effectiveness before, during, or after training. As Bates (2004) points out, contextual factors, such as organisational learning cultures and values, support in the workplace for skill acquisition and behavioural change, and the adequacy of tools, equipment and supplies can greatly influence the effectiveness of both the process and outcomes of training. Other detractors criticise the model's assumptions of linear causality, which assumes that positive reactions lead to greater learning, which in turn, increases the likelihood of better transfer and, ultimately, more positive organisational results (Alliger et al, 1997).
Training professionals also criticise the simplicity of the Kirkpatrick model on a practical level. Bersin (2006) observes how practitioners struggle routinely to apply the model fully. Since it offers no guidance about how to measure its levels and concepts, users often find it difficult to translate the model's different initiatives. They are often obliged to make assumptions and leaps of logic that leave their cost-benefit analyses open to criticism. Most are able to gather Level 1 and Level 2 feedback and metrics with relative ease, but find the difficulty, complexity and cost of conducting an evaluation increases as the Levels advance and become more vague. Bersin claims that only five per cent of organisations measure ROI (and they do so for a small percentage of their programs) and fewer than ten per cent regularly measure business impact. Paradoxically, therefore, it is precisely the elements that Heads of Learning and Development want to measure, that they end up measuring the least.
On a more fundamental level, some have taken issue with the content of Kirkpatrick's model. Philips (1994), for example, adds a fifth level to the framework in order to address the recurring need for organisations to measure return on investment in training and development activity. Bersin (2006) goes further still and calls into question the overall relevance of Kirkpatrick's framework as a means of measuring the business impact of training. He argues that the model fundamentally overlooks the role of learning and development as a business support function. Whilst it is appropriate for business critical lines to be measured according to the outputs for which they are directly accountable, e.g. revenue, profit or customer satisfaction, it is not reasonable to measure HR and Training by the same means. Since these non-revenue-generating functions exist to support strategic initiatives and to make business lines run better, their business impact needs to be measured differently. Since Kirkpatrick's model overlooks this, practitioners who attempt to apply it to their business activity end up spending large amounts of time and energy trying to evaluate direct business impact, where there is only indirect responsibility.
Kirkpatrick's four-level framework is a simple, flexible and comprehensible means of evaluating the business impact of training. Its enduring influence on evaluation methods used by training professionals today is a testament to its adaptability and practicality. However, evidence suggests that most organisations succeed only partially in executing all levels of measurement. By focussing on the reaction and learning levels, they rely on subjective participant-related feedback at the cost of assessing the full impact at the organisation-level. Confusion about precisely what to measure at the higher levels, and how to do so, further detracts from evaluation. Thus, although Kirkpatrick provides a useful point of reference for evaluating the business impact of learning and development, its limitations are evident from training professionals' on-going call for a simple, repeatable, standardised measuring process that is more flexible, scalable and business orientated.
Aldrich, C. (2002) Measuring success: In a post-Maslow/Kirkpatrick world, which metrics matter? Online Learning 6(2), 30-32
Alliger, G. M., Tamnenbaum, S. I. ; Bennett, W. Jr. ; Traver, H. and Shotland, A (1997) A meta-analysis on the relations among training criteria. Personnel Psychology 50, 341-358
Bates, R. (2004) A critical analysis of evaluation practice: the Kirkpatrick model and the principle of beneficence Evaluation and Program Planning 27, 341-347
Bersin J., (2006) High-Impact Learning Measurement: Best Practices, Models, and Business-Driven Solutions for the Measurement and Evaluation of Corporate Training. [Online] Available from http://www.bersin.com
Bersin by Deloitte. (2013) The Corporate Learning Factbook 2013. [Online] Available from http://www.bersin.com
Davis, A., Davis, J., & Van Wert, F. (1998) Effective training strategies: A comprehensive guide to maximising learning in organisations. Philadelphia: Berret-JKoehler
Ford, J. K. & Weissbein, D. A., (1997) transfer of training: An updated review and analysis. Performance Improvement Quarterly 10(2), 22-41
Galloway, D. L. (2005) Evaluating distance delivery and e-learning: Is Kirkpatrick's Model Relevant?. Performance Improvement 44(4), 21-27
Kirkpatrick, D. L. (1998) Evaluating training programmes. The four levels. Philadelphia: Berrett-Koehler
Kraiger, K. (2003) Perspectives on training and development. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (eds.), Handbook of Psychology: Industrial and Organisational Psychology (pp. 171-192) Hoboken: Wiley
Morrow, C. C., Jarrett, M. Q. & Rupinski, M. T. (1997) An investigation of the effect and economic utility of corporate-wide training. Personnel Psychology 50, 91-119
Phillips, J. J., (1994) Measuring return on investment, Alexandria: American Society for Training and Investment
Salas, E. & Cannon-Bowers, J. A. (2001) The science of training: A decade of progress. Annual Review of Psychology 52, 471-499
Wile, N. (2009) Kirkpatrick four level evaluation model. In B. Hoffman (ed.) Encyclopaedia of educational technology. [Online] Available from http://eet.sdsu.edu
Wisher, R. A. , Curnow, C. K., Drenth, D. J. (2001) From student reactions to job performance: A cross-sectional analysis of distance learning effectiveness. Proceedings of the 17th Annual Conference on distance teaching and learning. (pp. 399-404) Madison: Wisconsin University