Print Email Download

Reviewing The State Of Workplace Based Assessments Nursing Essay

There are many ways to view workplace-based assessment (WPBA). PMETB {2005 #32} defines it as the assessment of working practices based on what doctors actually do in the clinical setting and predominantly carried out in the workplace itself. This definition includes two attributes of WPBA—it is 1) direct observation of performance and it is 2) conducted in the workplace.

WPBA is not a replacement for some other form of assessment {Swanwick, 2009 #18}. Rather, it is a vehicle for collecting quantitative and qualitative data about trainee performance from various sources and using it to provide feedback that creates learning. Consequently, many WPBA programs assume a continuum based on a developmental model of competence (e.g. expertise model of Dreyfus and Dreyfus, {Dreyfus, 1986 #36}). With this framework, WPBA is a good tool for providing feedback along the way while providing some assurance that there is ongoing growth in competence.

In some settings, however, WPBA is used to provide summative as well as formative assessment. Additional work is needed before such use is recommended. Because most of the methods rely on clinical encounters and assessors that vary in difficulty across trainees and sites, final results are not necessarily equivalent. [T—I'm not sure this goes here but it is important] [ I suppose, we will shift it down when we talk of equivalence].

This chapter will review the state of the art in WPBA. Specifically, it will provide the rationale for using work place based assessment in view of its ability to assess authentic tasks. Happening right in real life settings gives certain advantages in terms of higher validity. We will also look at how learning happens in the workplace and the role that timely, effective and specific feedback can play to improve that learning.

Rationale for Workplace-based Assessment

WPBA plays an important role in assessment for four reasons. First, it aims at clinical skills and these are critical to diagnosis and treatment. Second, observation and feedback are lacking in clinical training and they are critical to learning. Third, forms of assessment traditionally used in clinical education have a number of limitations. Fourth, there is a misalignment

Clinical Skills are Important

There are many reports to suggest that a good history and physical examination can provide an accurate clinical diagnosis in a majority of cases. Hampton et al ({Hampton, 1975 #69}) reported this figure to be as high as 80%, laboratory investigations adding only marginally to diagnostic accuracy. Paterson et al ({Peterson, 1992 #70}) reported a figure of 76% in a primary care setting. This can be considered a high figure compared to diagnostic imaging which provided the correct diagnosis in only 35% of the cases ({Kirch, 1996 #71}).

There is another aspect to this. Generally assessments focus on ‘hard’ or tacit competencies but there are a number of other skills and competencies which are important for a physician. A physician often has to use both- e.g. tacit knowledge forms the basis of mental schemas but pattern recognition to arrive at a diagnosis is highly dependent on what has been called the ‘non-cognitive’ component. {Norman, 2010 #117}. A recent review summarizes the variety of non-cognitive abilities that may be important in professional education {Lane, 2010 #118}. Assessment in workplace should encourage collaborative rather than competitive environment; it should promote integration and interaction. Lane emphasizes that “Competence in health care professions includes nontechnical skills that tap into diverse talents and ‘‘intelligences.’’ Evidence is growing that such skills can influence overall success in school and clinical settings”.

Interpersonal and communication skills of physicians, for example, can play a large role not only in gathering a useful history but also in building a meaningful relationship with the patient. There are a number of verbal and non-verbal cues, which can help in developing such relationship. However, these skills do not lend themselves to assessment except by direct observation. Standardized patients may be able- to some extent- to provide information about these skills but may not be able to replace the ability of direct observation methods on real patients. Context plays a major role in deciding the way physicians act in a given situation {Regehr, 2006 #119}. Physicians are known to perform differently in the controlled setting of the examination and the real setting of work {Rethans, 2002 #120}.

Observation and Feedback are Lacking

A meta-analysis by Hattie {Hattie, 1999 #57} remains an important study of the the importance of feedback as a tool for creating learning. Summarizing the results of over 1800 studies, it demonstrated a feedback effect size of 0.79 on student achievement compared to 0.40 for overall schooling. There was variability depending on the type of feedback and largest effects were seen when information was provided around a specific task. In the field of medical education, a meta-analysis by Veloski et al {Veloski, 2006 #21} demonstrated a beneficial effect of feedback in 74% of the 41 studies reviewed. The magnitude of these effects was increased further when feedback was combined with other educational interventions. Studies by Gipps {Gipps, 1999 #54} and Burch et al {Burch, 2006 #58} also suggest positive effects of feedback on learning behaviors.

Despite its importance as a tool to create learning, the use of feedback based on the observation of performance does not seem frequent as one might expect {Holmboe, 2004 #59}; {Kassebaum, 1999 #60}). Observed assessment of clinical performance was used with only 7.4% to 21.3% of medical students during clinical clerkship in a study by Kassebaum, 1999 {#60} and up to 40% of students were not observed performing a clinical examination during any given clerkship in a study by Colleges {2004 #61}. Other reports also suggest that less than one third of the clinical encounters are actually observed during training ({Kogan, 2006 #62}; {Daelmans, 2004 #63}). This limits the number of opportunities where feedback could have been provided to the students.

Even at the postgraduate level, where there are fewer trainees, the situation is no better. Day et al ({Day, 1990 #64}) reported that up to 80% of the postgraduates had only one observed clinical encounter and less than one third of them had more than one. Isaacson et al. ({Isaacson, 1995 #65}) reported that 80% of the trainees never, or only infrequently, received feedback following direct observation. Even in situations where feedback was provided, less than a third elicited any reflection from the trainee ({Holmboe, 2004 #59}). It seems that even when direct observation occurs, it is not being used as an opportunity to provide feedback. While assessors may not fully appreciate the role of feedback in creating learning, lack of training in providing quality feedback may be another important factor.

Concerns have also been raised regarding assessors missing student errors and thereby not being able to provide appropriate feedback ({Noel, 1992 #66}). Use of checklists may increase this error detection but rate, but it does not seem to influence the accuracy of the assessors. Many of the assessors tend to rate marginal performance of students as superior ({Herbers, 1992 #67}; {Kalet, 1992 #68}), further limiting the utility of feedback.

Traditional Clinical Assessment is Flawed

Assessment of postgraduate training is still not as well developed as in undergraduate education due to various reasons. Smaller numbers, scattered resources, need to focus on clinical skills and performance within the limitations of postgraduates still not being completely responsible for patient care add to the problem. The focus of assessment has to be more on potential to practice rather than on actual practice {Norcini, 2010 #39}. The curriculum for postgraduate training itself is less structured with trainees generally having more responsibilities, not only in terms of clinical skills but also for procedural skills. In addition, it is important to assess the integrated skills rather than individual ones.

To be meaningful assessments have to be longitudinal, sample multiple areas of work and focus on process of learning as much as on the product of learning at the same time using the results for providing timely and appropriate feedback to make learning better. It is also important to distinguish between competency and performance. In general, competence assessment is easier to administer. However, the perspectives of patients and society demand that doctors should meet the assessment standards in their working conditions in any given situation. {Rethans, 2002 #120}.

Miller’s pyramid is frequently used to design assessment at various levels. While it is a good conceptual model for setting up curricula and learning experiences, its utility for assessment in practice settings is questionable {Rethans, 2002 #120}. Amongst others, its failure to take into account various factors which can influence practice is a major issue. System related influences like facilities and infrastructure or individual related influences like mental state of the trainees at the time of assessment can have major influence on practice. It is generally agreed that assessment under examination settings can meaningfully predict future performance only if factors like efficiency and consultation time are taken into consideration {Rethans, 1991 #14}.

The major flaw with traditional assessment is its focus on summative function without using it for improved learning. We have discussed later that feedback, though a very useful improvement tool, is not really used to its full potential.

Lack of Alignment with Learning in the Workplace

Research suggests that learning in the workplace is triggered by specific problems in patient care, which are solved by consulting directly available resources (colleagues, books, etc.) {Wiel, 2010 #79}. In contrast, courses and independent study are useful for learning more general problems. This difference between on-the-spot learning and planned learning has been reported {Hoffman, 2004 #80}. On-the-spot learning is influenced by type of patient cases dealt with, the tension between work load and the time available for teaching and learning, and the learning climate Differences of opinion, explaining things to others, feedback and criteria for performance- all seem to influence on-the-spot learning. The activities contributing to learning in medical practice seem inherent in the job of providing high-quality patient care.

Although these ideas might be unsurprising to practicing physicians, it is startling to realize that doctors do not engage as extensively in the type of deliberate practice as professionals in other, competitive domains {Ericsson, 2004 #81}. This work implies that learning opportunities in the workplace could be better recognized and more deliberately exploited. The challenge for improving diagnosis and treatment is to increase experience with challenging cases that afford more in-depth knowledge in the context of an interaction between faculty and trainees. A well-developed knowledge base that allows control by direct retrieval of relevant alternatives and enables reasoning about a patient’s problem in case of uncertainty is a prerequisite for high-quality patient care {Ericsson, 2004 #81}; {Norman, 2006 #82}.

Tools for WPBA

Commonly used assessment tools for WPBA generally fit into one of the following categories: {which have been modified from Swanwick, 2005 #1}

Documentation of work experience through logs, such as clinical encounter cards (CEC)

Observation of individual clinical encounters, such as the mini-clinical evaluation exercise (mini-CEX), direct observation of procedural skills (DOPS), and clinical work sampling (CWS)

Discussion of individual clinical cases, such as chart stimulated recall (called case-based discussion, or CbD, in the UK)

Feedback on routine performance from peers, coworkers, and patients, collected by survey and usually called multisource feedback. Tools for gathering this data include the mini-assessment tool (mini-PAT), team assessment of behaviors (TAB), and the various patient satisfaction questionnaires (PSQ).

In addition, data from these tools and information from other sources are often combined into a portfolio which serves as documentation of experiences and achievements.

Documentation of work experience

Clinical encounter cards (CEC). CEC originated as a means of documenting the learning experiences of medical students, although they are applicable across the educational continuum. In general, CECs are packets of 5 X 8 inch computer read cards, provided to the students at the beginning of the rotation, along with an instruction booklet. The contents of the cards can be tailored to reflect the type of clinical material seen in a department {Rattner, 2001 #101}. A diagnosis list is provided in the accompanying booklet, along with a list of codes, which are used to indicate the clinical condition. The booklet also provides examples of disease staging system (stage 1 indicates a disease with no complication, 2 is disease with local complications and 3 is disease with systemic complications). Trainees complete the card each time they participate in the care of a patient (e.g., taking a history or performing physical exam). Multiple encounters with the same patient are record on the same card. Trainees encode the age and sex of patient, location (outpatients or inpatients), level of involvement and supervision and procedures seen or performed. The cards are scanned weekly. Reports generated twice a year are reviewed by the program directors. Individual reports and peer group comparisons are available to the students {Richards, 2008 #104}.

Experience has shown a good degree of concordance between the diagnoses entered on the patient charts and those coded by the trainees. When secondary diagnoses were also considered, the concordance rate reached almost 97%. Authors {Rattner, 2001 #101} also reported similar patterns of diagnostic reporting, indicating reliability of the method.

Analysis of the trends generated by this method can help in identification of disease patterns not seen by students during a particular year, allowing remedial action. Keely et al {Keely, 2010 #107} used this tool as a teaching encounter card, wherein students provided feedback on the teaching skills of the faculty. Kim et al {Kim, 2005 #103} modified these cards to record the oral case presentation skills and required the trainees to have one card completed per week following a case presentation, which was graded on 9 competencies, using a 9-point scale. Feedback was provided based on these ratings and they correlated with other performance on other tools. Greenberg {Greenberg, 2004 #106} reported in a qualitative study that CEC significantly improved the quality of feedback. Students’ perception of feedback was also positively influenced {Ozuah, 2007 #105}. Although generalizability of these findings can be challenged, what emerges undisputed is assessment following direct observation.

Despite the different uses made of these cards, the common thread running through them is the observation and assessment of the trainees followed by contextual feedback and remedial action. The quality of feedback improves following use of these cards and they can also serve as a means of helping faculty to improve their teaching {Paukert, 2002 #17}.

Observation of single clinical encounters

Mini-CEX: mini-CEX is a snapshot observation of a trainee in an actual practice setting. The trainee engages in a patient care activity (e.g. data gathering, physical examination, counseling etc.). The assessor observes the performance, scores it, and then provides educational feedback. Usually the encounter takes 10-15 minutes and another 5-10 minutes may be spent on providing feedback to the trainee.

Trainees are assessed many times during the course of their training by many assessors. The mini-CEX assessment focuses on the core clinical skills that trainees demonstrate with real patients {Norcini, 2003 #37}. It can be easily implemented in a variety of clinical settings and can therefore be integrated into the normal working pattern.

For the mini-CEX, a global rating is given rather than the checklist-based recording typical of the OSCE. Whereas checklists usually capture a nominal rating (right or wrong), global ratings capture ordinal information. In the process, some subjectivity may be involved. In fact, the assessor can use his/her discretion and calibrate the ratings according to the performance expected depending on the level of trainee, case setting (ambulatory, emergency etc.), and complexity of the patient’s problem(s). This makes the whole process more authentic.

The initial use of mini-CEX was made in assessing trainees during postgraduate training programs in internal medicine. A structured format was used for ratings, which were made on a 9-point scale, with 1-3 being considered unsatisfactory and 6-9 considered as superior or excellent. The performance was assessed across a number of dimensions including interviewing skills, physical examination, clinical judgment, counseling, organization and efficiency and overall competence. There was flexibility in the sense that not all domains were assessed during all encounters. Depending on the context and place, specific areas were assessed. Aggregation of ratings across domains and across cases was guided by the purpose of assessment.

Since its introduction, there has been considerable experience with the use of the mini-CEX. Norcini et al reported a number of presenting complaints covering almost all systems and various contexts (outpatients, emergency, behavioral problems etc.) which have been assessed using mini-CEX. It has also been useful in assessing trainees when patients have multiple problems.

There is increasing realization that mini-CEX shows acceptable correlation with other methods of assessing competence. Kogan et al {Kogan, 2003 #41} reported a modest correlation with examination scores, clerkship ratings and final course grades. Similarly Durning et al {Durning, 2002 #42} reported correlation between individual components of mini-CEX and corresponding evaluations by faculty as well as the results of an in-training examination. Boulet et al {Boulet, 2003. #15} demonstrated a good predictive relationship between standardized patients (SP) checklists and global faculty ratings as well as with faculty ratings of communication. Holmboe et al {Holmboe, 2004 #59} demonstrated successful discrimination of videotaped performance into unsatisfactory, satisfactory or superior using mini-CEX forms.

Mini-CEX demonstrated a small but modest correlation with overall future clinical performance, as judged by standardized patients (SPs), both in the short as well as in the long run {Ney, 2009 #83} However, it is possible that since the SP exams used for high stakes purposes in the study, trainees may have had different degrees of motivation and this may have affected the correlations. A number of other studies have demonstrated the mini-CEX’s ability to discriminate between different levels of performance.

A number of changes have been made in the mini-CEX process over the last many years to make it more useful for different settings and in different countries. Its use has been extended to various settings in undergraduate as well as postgraduate training. The basic character of the tool has however remained unaltered, i.e. making judgments on the basis of an observed clinical encounter and following those judgments with feedback. As with all other methods, task specificity limits the generalizability of a single encounter so multiple observations in different settings and with different faculty members are needed.

Direct Observation of Procedural skills: DOPS was designed to provide formative assessment and feedback on procedural skills. It is completely analogous to the mini-CEX with a focus on procedures rather than clinical skills. The assessors observe a procedure, rate it, and then provide developmental feedback. The observation typically lasts for 10-15 minutes with 5-10 minutes of feedback.

The procedures are generally selected from a predetermined list which often includes those commonly used in practice. Understanding of the indications, techniques, asepsis, analgesia and communication are assessed. Trainees may also be asked about their earlier experience of performing similar procedures.

Logistics and measurement characteristics of DOPS are more or less as for the mini-CEX. A trainee undergoes multiple assessments with different procedures and different assessors. Consultants, senior registrars, specialists, general practitioners or even nurses can act as assessors.

Global ratings of procedural skills have been shown {Larson, 2005 #121} to produce valid results and assessors are able to distinguish between various levels of performance. Other studies {Goff, 2002 #122} have also shown that global ratings can distinguish between levels of performance.

Clinical work sampling (CWC): CWC has been developed as a tool to capture assessment information as the opportunity arises without taking recourse to retrospective recall. Many such opportunities occur in daily interactions among health professionals but go largely unreported. Various forms are used to capture the essential points in trainees’ interactions {Turnbull, 2000 #109}. Although there may be some local modifications, broadly these consist of admission rating form, ward rating form, multidisciplinary rating form and patients’ rating form. The first 2 are rated by the supervisors while multidisciplinary team and the patients rate the next two. Generally one of each form per patient is required.

There has been a mixed experience with the use of this method. While two-thirds of the forms were returned for others, the patients rating forms were returned for only one-tenth of the situations. Admission and ward rating forms provide acceptable reliability with reasonable numbers of assessments.

Discussion of individual cases

Case based discussion (CbD): This is a variation of what was called chart stimulated recall (CSR). In CbD, the trainee is required to select 2-3 cases they have seen recently and give the patient records to the assessor in advance of the assessment. Since the discussion is case based, trainees are sometimes encouraged to select patients they have seen a number of times. The case should represent a patient problem for which the trainee should be competent. The assessor selects one case (or two if time permits) for discussion. A statement on why that case has been chosen and what competencies it allows to be demonstrated helps the assessor in choosing.

The discussion is generally focused on one aspect of the case, for example the choice of investigations or a particular therapeutic method. The assessor probes the reasoning behind the actions taken by the trainee. As a general rule, the discussion revolves round the case and does extend to hypothetical situations. The trainee is rated on any one of a number of different scales such a four-point scale used by {Mehay, #84} and comprised of a) insufficient evidence, b) needs further development c), competent, and d) excellent. A pre-requisite for making such assessment is to have a description of competencies in each of the assessed area appropriate for the level of training.

CbD offers certain advantages in assessing clinical reasoning skills since it encourages a focus on the application of knowledge, decision making, and other related issues. Since record keeping is an important activity as well, it’s evaluation is also included. Like other tools used for WPBA, CbD also uses single encounter for making judgments about quality of clinical acumen, investigations, treatment, referrals and ethical issues.

There are many similarities with other tools of WPBA. The patient problems can be selected from a core list. There are supposed to be around multiple encounters, each lasting 10-15 minutes with 5-10 minutes of feedback. Consultants, senior registrars, specialists or general practitioners can act as assessors. The feedback after the encounter focuses on strengths, suggestions for development and an action plan.

There are some subtle differences between CbD and what is generally known as a ‘case presentation’. Whereas during a case presentation, ‘what would you do next’ is the usual question, CbD looks at what the trainees have actually done. Hence, CbD is not a replacement for a case presentation because they focus on different aspects of competence. CbD is not intended as a test of knowledge, or as an oral or clinical examination. It is intended to assess the clinical decision-making process and the way in which the trainee used medical knowledge when managing a single case.

CbD aims to explore professional judgment exercised in clinical cases. Recognizing uncertainty, application of medical knowledge, application of ethical frameworks and ability to prioritize options are some of the components addressed during a CbD. The question-answer portion of the encounter is meant primarily to help the assessors gather evidence rather than to teach.

There have been reports on the validity of CbD from North America. The score distribution and pass rates were consistent with other methods including oral examinations and record audit {Norman, 1989 #86}. CbD also correlated well with an SP examination and was able to discriminate between levels of performance. Solomon et al. {Soloman, 1990 #85} also found reasonable correlation with an oral exam and, when combined with an oral exam, good correlations with written and oral exams conducted 10 years earlier.

Mini-Peer Assessment Tool (mini-PAT): Peers are a good source of information about one’s performance and have been in use for many years. In the recent times, there has been a renewed interest in them and in making the collection of this information more systematic.

Mini-PAT (Peer Assessment Tool) is one such tool and it has been modified from the SPRAT (Sheffield Peer Review Assessment Tool), an established multisource feedback instrument to assess senior doctors {Davies, 2005 #88} and it has been content validated against the Foundation Program curriculum in the UK ({Archer, 2005 #87}). In effect, it requires trainees to acquire anonymous feedback (it is collected centrally) from their co-workers against a range of competencies. The trainee completes a self-rating and receives aggregated information from their peers and against national norms. The focus is on assessing professional competence within an environment of team work. Comments made by assessors are also included anonymously but verbatim. These are then reviewed together by the trainee and the supervisor to agree on strengths, areas for improvements and action plan. This process is undertaken twice a year. Like most other tools for WPBA, mini-PAT is meant to provide formative feedback to the trainee and is not intended to be used for summative purposes.

Peer assessments have been used for a long time now in a number of programs. Their use has been mixed. At some places, most of the negative reports about trainee behaviors have emanated from peers while at others, this type of assessment has been used to recognize trainees with outstanding professionalism.

Validity of mini-PAT has been established in a number of publications. Ramsey et al {Ramsey, 1993 #108} found higher mini-PAT scores among certified doctors as compared to non-certified ones. SPRAT, the predecessor of mini-PAT has also been shown {Archer, 2005 #87} to be feasible, reliable and not susceptible to influence by extraneous factors.


A portfolio serves as a tool for collecting and presenting evidence of learning and competencies at all levels of training. It has been defined as “a collection of evidence that is gathered together to show a person’s learning journey over time and to demonstrate their abilities.” {Rees, 2005 #113}. In a way, portfolio is a general term since it can include many different types of content depending on the purpose for which it is being used. It can contain educational experiences (procedures, CMEs, conferences etc.), reflections on those experiences, publications, critical incidents, performance on WPBA methods, or MCQ tests. {Norcini, 2010 #39}. The information may pertain to a single encounter or over an extended period of time and may be put using either representative samples or in entirety. Portfolios can be paper based or electronic.

Portfolios can be evaluated by assessors, who make judgments about them. These judgments may relate to the occurrences, qualities, or fitness of the educational experiences and outcomes of a trainee. The power of portfolio lies in the reflective component, which separates it out from log-books or dossiers {Izzat, 2007 #112} in that they allow learning from actions.

Portfolios are a response to the felt need that assessment should enhance and support learning as well as measure performance {DAVID, 2001 #114}. If done well, they counteract the reductionist approach to assessment by facilitating a broad view of complex and integrated abilities while simultaneously taking account of the level and context of learning. Portfolios can provide evidence for learning and progress towards desirable educational outcomes.

Portfolios have high face validity since they are a direct reflection of what is actually required to be done in the workplace. Their validity and reliability depends entirely on the quality of their contents. At our present level of understanding, portfolios are best used for formative purposes. It is possible to increase their quality by making them more standardized, developing acceptable scoring criteria and training of both trainees and assessors. It has also been suggested to use qualitative methodologies like triangulation, member checking, and prolonged engagement will increase their dependability {Driessen, 2005 #111}.


Based on direct observation, feedback is the backbone of WPBA. Providing timely and quality feedback is crucial to learning, especially in areas related to patient care. In general, feedback is provided by an observer who compares the performance of the trainee with an expected standard. The expected performance standards may be either explicitly stated or in some cases arise out of the observer’s own professional judgment. The trainee is expected to reflect upon this information and then use it to reshape behavior. The effect of feedback on physician performance has been a fairly well researched topic and the BEME review published in 2003 identified 683 studies on this between 1966 and 2003. {Veloski, 2006 #21}

Since it is the trainee who ultimately has to use feedback, it is important to present it in a form which will make it acceptable and actionable. Although one of the widely held beliefs about feedback is that it should be non-judgmental, some degree of judgment is always involved if it is to be useful. However the person giving the feedback can present it in a way which will not be off-putting. If it is too negative, there is a danger of the trainee shutting down or becoming defensive and attempting to justify the actions. Learners generally welcome good feedback {Gordon, 2003 #22}.

Various model of giving feedback have been described. The simplest one is the ‘sandwich model’, where the observer tries to provide critical feedback between layers of praise. While this has the appeal of simplicity, it is now recognized that following a line of praise, the trainee generally recognizes what is coming next and may not pay attention to the positives. This takes away the reinforcing utility of feedback. Pendleton’s framework is another model commonly used for this purpose. In this, the trainee first identifies what went well. Next the trainee indicates what could have been done better and how the performance could be improved. The trainer then provides suggestions for improvement. In the end, both the trainer and the trainee agree on areas for improvement and formulate an action plan. While this model is theoretically sound, it is criticized for being too rigid. There is sometimes a comparison of what was done well with what was not done well, taking away the non-judgmental nature of the feedback. Like the sandwich model, trainees can predict what is coming and may even be insincere in reporting and listening to what went well.

It is interesting to see an analogy between doctor=patient communications and educational feedback. In a clinical communication, the first step is to build a therapeutic relationship through empathy and rapport. In educational feedback, one needs to build a climate of trust and comfort for the learner by being objective and reducing emotionally charged situations {Kaprielian, 1998 #24}. The rapport building skills used in a clinical situation can also be used in the educational setting. The “PEARLS’ mnemonic {Prochaska, 1992 #23} (Partnership for joint problem solving; Empathetic understanding; Apology for barriers to learner’s success; Resect for learner’s values; Legitimation of feelings and Support for efforts at correction) is a useful framework for this.

Whatever model of feedback the observer chooses, it is important to own the feedback (use of ‘I’ statements). Equally important is to be clear and avoid general statements. Using the current observation as the basis of feedback avoids inferential comments. Being descriptive rather than evaluative (you did not make eye contact with the patient rather than you were not interested in the patient), avoiding interpretation (I think you meant) and advice giving (I think you should) and focusing on behavior that can be changed are some other attributes of good feedback.

Quality assurance

Unlike paper and pencil tests, WPBA are more prone to measurement errors, which can compromise their validity and reliability. The problem is further compounded by relatively small sample sizes. While a number of psychometric methods can be applied to improve the reliability of such assessments {Boulet, 2003. #15} some believe {Govaerts, 2007 #16} that the approach to WPBA should take a constructivist, social-psychological perspective and integrate elements of theories of cognition, motivation and decision making. A central assumption in this proposition is that performance assessment is a judgment and decision making process, in which rating outcomes are influenced by interactions between individuals and the social context in which assessment occurs.

Research has shown many problems related to the reliability and accuracy of performance ratings ({vanderVleuten, 2000 #92}). Assessors tend to give above average ratings which fail to distinguish between students despite obvious differences in performance ({Nahum, 2004 #93}). Raters appear to use a 1 or 2 dimensional concept of performance and do not tend to distinguish between more detailed performances dimensions ({Silber, 2004 #94}). Leniency, halo-effects and range restriction are other rater errors that contribute to the low reliability of performance ratings. There are reports to suggest a lack of rating consistency between raters and within raters, across different occasions, with reliability coefficients approaching zero ({Littlefield, 1991 #99}). Even the validity of interpretations could be doubted because of content specificity and small sample sizes.

Govaerts et al {Govaerts, 2007 #16} have argued that many of these issues stem from the quantitative psychometric framework, which aims at getting a ‘true’ score reflecting ‘true’ performance. Psychometrics demands a specified level of consistency from a technically sound measurement and assumption of error when measurements fail to give consistent results. Trainees’ ability is assumed to be fixed, permanent and acontextual and any changes due to context or rater interactions are considered as unwanted sources of bias. Efforts to improve in-training assessments have been made from this perspective and include development of precise rating formats and rater training to improve consistency.

Studies on expertise provide some interesting insights to the design of WPBA systems. Task specific expertise is a key variable in information processing. Experts demostrate a rapid automatic pattern recognition and are likely to take more time to gather and analyze information, when confronted with unfamiliar problems {Murphy, 1986 #77}.Studies on teacher supervision {Kerrins, 2000 #78} have shown that inexperienced supervisors provided literal description while analyzing verbal protocoals whereas experienced ones interpreted their findings and made evaluative comments. Experts focused on learning; non-experts focused on discrete teaching. Experts spend time to monitor, gather and analyze information; non-experts focus on providing correct solution. Experts have more elaborate and well-structured mental models, replete with contextual information.

More enriched processing and better incorporation of contextual cues by experienced raters can result in qualitatively different, more holistic feedback to trainees,focusing on a variety of issues. Secondly, thanks to more elaborate performance scripts, expert raters may rely more often on top-down information processing or pattern recognition when observing and judging performance -especially when time constraints and/or competing responsibilities play a role {Govaerts, 2010 #76}. recall of specific behaviors and aspects of performance. Optimization of WPBA may therefore require rating procedures and formats that force raters to elaborate on their judgments and substantiate their ratings with concrete and specific examples of observed behaviors.

Assessment of competence under examination circumstances can have higher predictive value for performance in actual practice when factors such as efficiency and consultation time are taken into account. Below standard performance of physicians does not necessarily reflect a lack of competence. Performance and competence are two distinct constructs. {Rethans, 1991 #14}. This has important implications for assessment design. What really matters is how physicians perform in actual practice rather than what they are capable of doing- therefore, assessments in the real settings are likely to reflect and predict future performance better than standardized assessments.

Faculty development remains an important strategy for maintaining quality of WPBA. The two areas which need to be developed include assessment and providing feedback. Some good examples of focused training programs are available {Holmboe, 2004 #123} which target behavioural observation with minimum obtrusiveness, performance dimensions and frame of reference training. Such training has been reported to positively impact the process of WPBA. Provision of educational feedback is a vital component of many of the tools used for WPBA and efforts to improve the quality of feedback will help in making WPBA more effective.

Advantages of WPBA

Assessment is most commonly associated with ‘having to prove’, wherein the assessor tries to prove his hypothesis about the knowledge and skills of the trainee. However, there is an equally important function of assessment and that is to ‘improve’ learning. Even though there may not yet be enough evidence about the better predictive utility of WPBA, it makes sense- at least theoretically- to use this form of assessment as a tool for better learning. [T—I’ll send a few articles that may help here]

An advantage of the workplace-based methods is that they fulfill the three basic requirements for assessment techniques that facilitate learning (Frederiksen 1984; Crooks 1988; Swanson et al. 1995; Shepard 2000): (1) The content of the training program, the competencies expected as outcomes, and the assessment practices are aligned (2); Trainee feedback is provided during and/or after assessment events and (3) Assessment events are used to steer trainee learning towards the desired outcomes. Its potential to foster and shape learning seems its biggest strength. Over the past several years there has been growing interest in workplace-based assessment.

WPBA has generally been seen from an assessment perspective as being analogous to class tests. In this role, it exhibits much strength, especially as a tool to foster learning. However, this also puts it as a disadvantage as ensuring equivalence amongst institutions or even amongst different assessors may be problematic.


While it is true that most of the tools used for WPBA are unstandardized, they are not low in terms of utility. Moreover, the reliability can be improved by sensible and expert use of these tools. In addition, enough and representative sampling of competencies will improve reliability just as it does for OSCEs and MCQs. Triangulating the results with those obtained from use of other instruments and out of workplace assessments also helps in building confidence.

Reports on reliability of these instruments are rather interesting {Pelgrim, 2010 #100}. They often require 8-12 encounters to reach acceptable level of reliability, which compares very well in terms of testing time with standardized and objective instruments. Picking up information which is generalizable across encounters is responsible for this; broad sampling of assessors may even out any inter-rater differences.

While it may be possible to recreate a given clinical situation using simulations and then test it using standardized tools, there are concerns that this encourages introduction of indirect or surrogate measures {Sadler, 1987 #9}. However, low reliability by traditional standards can be considered as one of the impediments for less reliance placed on WPBA. Reliability of work based assessments remains an issue because ‘they depend on subjective judgement of unstandardized material’ {Southgate, 2001 #12}. Reliability of ‘subjective’ assessments can however be improved by a number of interventions. {Wass, 2004 #10} Increasing the size and representativeness of the sample, increasing the number of assessors, using subject experts as assessors, assessor training and caliberation- can all be expected to make for a better reliability.It may also be the time to have a relook at the issuue of reliability {Gipps, 1994 #11} as ‘comparability’ or ‘dependability’, especially when dealing with performance testing.

WPBA has the appeal of being able to directly, relevantly and holistically measure the elements that go into the making of clinical competence. The challenge, however, is how to make this form of assessment more acceptable and dependable. The silver lining has been the turn of methodology full circle- from patient based examinations to simulations to real patients {L W Schuwirth, 2003 #13}

Strategies to improve WPBA have typically followed the psychometric approach, focussing on precision of measurement and trying to limit uncontrollable variables. Low inter-rater reliability, halo effects, leniency and range restriction have been reported for WPBA ({Kreiter, 2001 #72}; {Gray, 1996 #73}; {Silber, 2004 #74};). Attempts to imrove objectivity and standardization through various means has met with only limted success ({Williams, 2003 #75}).

Another important issue may be related to trainees who are rated unsatisfactory during WPBA. For them, alternative methods of assessment may be needed, especially to provide remediation and diagnostic feedback. Traditional tests of knowledge and/or skills and sometimes even SP examinations may be required. It should also be noted that due to their role conflict, many assessors may be reluctant to provide a negative rating to the trainees.

Trust in and acceptance of WPBA by raters and trainees may be a crucial factor in making it more useful. Authenticity, fairness, honesty, transparency and quality feedback have been cited as some of the factors which promote trust ({Messick, 1994 #97}; {Piggot-Irvine, 2003 #98}). And , the concept of trust may be related to consequential validity.

Assessment of performance in the workplace is a complex task because it is affected by a number of factors and is highly contextual. Doctors work in their own environments and their performance depends increasingly on how well they function in teams and how well the health care system around them functions.Rater behavior and rating outcomes have been shown to be influenced by contextual factors. From this perspective, WPBA is not really a measurement issue- rather it is a decision making issue. It has been suggested {Govaerts, 2010 #76} that WPBA may benefit from better understanding of raters’ reasons and cognitive processes that they engage in while assessing rather than from quantitative properties of scores.

WPBA has the advantage of assessing performance under actual work conditions, climbing to the third and fourth levels of Miller’s pyramid. It also allows a perfect fit between curriculum, instruction and assessment. The feedback loop makes early and appropriate remedial action possible.

The utility of any assessment – either an individual tool or an entire program- can be conceptualized as a product of its validity, reliability, feasibility, acceptability and educational impact (van der Vlueten, 1996). What follows is that there is an invariable trade-off between various components. Attempts to increase reliability by tightly structuring the assessments may negatively impact their educational impact. What WPBA loses in terms of standardization, it gains in terms of its educational impact by providing developmental feedback to the trainee. Contrasted to standardized externally administered tests at the end of the program, WPBA integrates teaching, learning and assessment. It has been rightly pointed out that WPBA is ‘built-in’ the program rather than being a ‘bolt-on’ {Swanwick, 2009 #18}.

There has been a renewed interest in using assessment as a tool of improvement. Typically, formative assessment is low stakes, opportunistic and is intended to stimulate learning. It provides specific actionable feedback in an ongoing and timely manner. If this is the purpose of formative assessment then the need for high reliability is lesser compared to the need for educational impact. This conceptualization of the criteria for good assessment has been recently endorsed by an expert group (

A review {Pelgrim, 2010 #100} of the properties of direct observation assessment yielded interesting results. The feasibility studies have generally been positive. Although the evidence is weak, training seems to improve their effective implementation. Skills of providing feedback and ability to reliably discriminate between levels of performance have been identified as key areas for training. Learner training is likely to increase the educational effect of such assessments. Mini-CEX and CEC have been reported to show strong criterion validity. Increase in scores over

Print Email Download

Share This Essay

Did you find this essay useful? Share this essay with your friends and you could win £20 worth of Amazon vouchers. One winner chosen at random each month.

Request Removal

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please click on the link below to request removal:

Request the removal of this essay.

More from UK Essays

Paid Writing Services

Free Content

About UK Essays

Order Now

Instant Price