1. Critically review a test item that you have designed for inclusion in an assessment instrument
It is nearly two decades since the St Vincent Declaration (1989) called for a marked reduction in morbidity with diabetes related problems to be achieved through better patient management. The available evidence suggests that the process of care in Britain is still very variable in quality. (J. Mason, 1999) et al. conducted trial on diabetes foot care, and provided an important message that vigilant and trained health care professionals can identify the emerging risk factors for ulceration at relatively low cost. The National Institute for Clinical Excellence (NICE 2004) clinical guideline on type 2 diabetes – foot care on the prevention and management of foot problems recommends that ‘healthcare professionals and other personnel involved in the assessment of diabetic feet should receive adequate training’.
If you need assistance with writing your essay, our professional essay writing service is here to help!Find out more
Hence, it is imperative that the Diabetes Specialist Registrars [SPR], the future diabetologists, who look after diabetic feet in the community and secondary care, should be trained and adequately assessed if they want to make an impact on patient outcomes. Thus, the need arises for a training CME [Continued Medical Education] day for the management of diabetes foot and assessment before they treat patients with these problems. The main aim of the assessment is to optimize the capabilities of all learners and practitioners by providing motivation and direction for future learning. The assessment has to be formative in this case guiding future learning by providing reassurance, promoting reflection and shaping values.
At the end of the training day (Appendix 2) the plan is to conduct a modified, observed, long case type of assessment tool with 3-4 items and knowledge based assessment. One such item an observed modified long case is described in Appendix 1.
The plan to use a real case is to illustrate the some of the day to day difficulties and uncertainties one faces during clinical judgement. The time breakdown is to prepare them for thinking about various aspects in given time. They are assessed in different generic skills as well as speciality knowledge and management of the case. Appendix 1 [page 24-26]
Properties – Even though it is used as an item here, the long case has traditionally been used as a summative tool by itself in the past and its properties in those situations have been studied extensively. Hence, the properties like reliability and validity have been discussed briefly with others of the items such as facility.
The reliability of a long case can be improved by structuring the long case i.e. elements of discussion and questioning (Olson et al 2000, Wass and Jolly 2001). Thus observation and structuring is applied here to improve the reliability. This increases student perception of fairness.
Validity – This can be addressed by introduction of examiners who observe trainee performance throughout and so the trainee is observed by the team in all parts of the long case out here (Olson et al 2000).
Facility – It could be made low facility depending on the complexity of the case. Discrimination – is not really applicable since it will be used as part of formative tool but can be made into a highly discriminatory one by following global assessment. The other properties of the assessment tool are not discussed as it is used mainly as an item.
Item Response Theory – This is not discussed here as it is mainly used in the context of Multiple Choice Questions. There is not much in the literature about the strengths and weaknesses of the long case when used as an item in a formative assessment tool. So here is an attempt to look at both the positive aspects and potential drawbacks of this item should this be used in the context of this formative assessment.
The positive aspects of this item – Observed Modified Long Case
Holistic and Robust – This item in the long case is real and looks at competency and some aspects of performance.
Multidisciplinary Approach – mimics how the clinics are conducted in most hospitals and assess candidates’ knowledge across specialities.
Written and Verbal Constructive feedback – Ende (1983) proposed that the process of feedback should be expected and well timed, based on first hand data (observable behaviour) and regulated in quantity. In a systematic review by Best Evidence in Medical Education (BEME) collaboration (Veloski, et al., 2006) the positive impact of the feedback process was confirmed, with the most marked effects if provided by an authoritative source. These principles are strictly followed in this item. The method of feedback can be either Pendleton or ALOBA- Agenda-Led Outcome Based Analysis.
Some of the potential drawbacks of this item-How they can be overcome
Risk of assessing the short term memory – This argument stems from the fact that the assessment is done just after the CME program as suggested.
Based on Bligh (2000), the above graph, demonstrates the retention power if tested early after the learning experience. Hence the argument: the item should be administered immediately after the CME to increase retention.
Resources and Standardisation – The number of people including specialists involved and the organisation will require resources, including money, but some of the funding could be organised by the educational fund for the SPR in the region. Some contribution can be arranged from other sources including pharmaceutical company help. Since it is a formative assessment tool, it need not be rigidly standardised in assessment and it is to promote the professional development of the SPRs.
Why rigid time limits and not just global assessment? – This rigid structure is to give an opportunity for each of specialists to have an opportunity to observe and feed back. In real life, the time spent on each task may be different but giving stringent time limits also gives the opportunity to observe how the candidates perform within these time constraints. The global assessment is avoided so that the feedback component needs to be broken down to each level to make sense and promote specific areas for improvement.
Bias – Since some of the examiners and experts would have taught these SPRs in the course there may be bias from these examiners after observing the candidates’ interaction over a day and might want them to do well or otherwise in the assessment. This can be avoided by formal training of the examiners.
Organisational consideration – The main difficulty will be getting the team of assessors together as given in the item and making sure they are trained in giving constructive feedback to the candidates. Getting a real life patient to participate in such a scenario might not be difficult as patients are often happy to share their experiences.
If used in conjunction with variety of other items, perhaps it can be used as a cog in the wheel of summative assessment for the high stake assessment e.g. as a part of portfolio in final year Rotation In Training Assessment [RITA].
It fulfils the learning objectives of the course and what the item intends to measure.
This item also comes close to what is usually measured in items of work place based assessment [WPBA].
Element 2 – Assessment option
Using relevant theoretical and/or research literature, critically review one instrument of assessment used in clinical education. LAP and Modified LAP
BACKGROUND – In the traditional long case candidates spend 30-45mins with a patient from whom they take a history and examine. An examiner is not present and the candidate is not observed. The student summarises and is examined by a pair of examiners over a 20-30 minute period. The usual format of practice in long cases is to examine patients that are already known to the examiner or are in the examiner’s own specialty. The long case, patient based examinations have been used for decades both in undergraduate and postgraduate settings both as a formative and summative tool. They were specifically used for final certification exam for postgraduates both here and elsewhere. They had their strengths in that it evaluates performance with real patients and enables candidates to gather information and develop treatment plans under realistic circumstances (Norcini, 2002). However, this method has drawn lot of criticism. The problem lies with inter case reliability (Wass, et al., 2001) and when subjected to psychometric analysis, these examinations were found to be unreliable and so have fallen from favour (Turnbull, et al., 2005). Particular problems were with the reproducibility of scores generated by the long case. To improve this required large sampling which itself required a lot more resources which made the long case difficult to use as a summative assessment tool.
Recent work suggests that the long case is still a highly relevant tool in that it appears to test a different clinical process to that of the structured short case examination (Wass and Jolly, 2001). A study, performed with undergraduates in London found that, by observing the process in the long case, the above problems could be overcome. This probably led to a return of long cases in the form of OSLER – Objective Structured Long Examination Record introduced by Gleeson(Gleeson, 1997) and LAP the Leicester Assessment Package.
The LAP was originally developed to assess the consultation competence of general practitioners in the UK. In the LAP, the patient is not known to the examiners and at least one of the examiners is not an expert in the specialty being examined. It has been designed for use in ‘live’ and/or video recorded consultations with either real or simulated patients. It was subsequently adapted for use in undergraduate teaching. The LAP is designed to provide assessment by directly observed consultations with real patients/simulated patients, but to present this in a structured format closer to an OSCE, which also allows other aspects of performance to be assessed. Seven prioritized categories of consultation competence which need to be mastered are assessed with marking (Appendix 3).
The modified LAP varies from the LAP in a couple of areas. Performance is assessed against predefined standards which are different compared to the LAP. The examiners attend training before they become assessors in the modified LAP. The guidelines to the examiners [how to mark and assess] and the assessment forms are appended (Appendix 4 & 5). Inevitably, some overlap occurs between components of differing categories. (Bhakta, et al., 2005) acknowledge that no single assessment format can adequately assess all the learning objectives within the course blueprint, a combination of assessments (including OSCE, EMQ, essays, short case, and long case) are currently used to assess the students’ competence. The author’s main objective is to use theoretical and research literature to critically review the LAP and modified LAP used in the assessment of clinical practice. This review is based on the seven key concepts (Van der Vleuten, 1996; Schuwirth and van der Vleuten, 2006; PMETB, 2007) listed below:
1. PURPOSES – It can be used for both formative and summative assessment. (Fraser, et al., 1994) argued that the modified LAP is designed for both purposes.
The focus of the examination in the intermediate clinical exam for undergraduate is to promote further improvement as they have done only one clinical year [e.g. WMS and Leicester Medical School]. Hence, the feedback is handed to the students which helps them to focus on their strengths and improve on their performance The 3rd and 4th year medical students of these Schools believed that it was likely to enhance their consultation performance (McKinley, et al., 2000). It has also been used as a formative tool in improving professional competence in different countries and different specialties (Lau, et al., 2003, Redsell, et al., 2003).
When (Teoh and Bowden, 2008), arguing for the resurrection of the long case, state that the observed long case such as the Modified LAP doesn’t encourage the ‘reductionist’ approach as in Objective Structured Clinical Examination [OSCE ]. Thus, it can be an ideal summative assessment tool for the high stake exam but perhaps has to be used in conjunction with other tools as discussed below. Additionally, in most cases, the assessment is an end of year ‘high-stakes’ assessment and, for failing students, there is generally only a short time available for remediation. The feedback provides a way to focus them on the areas of their consultation skills and prepare for their remedial exam.
2. ALIGNMENT – The Education Committee of the General Medical Council (GMC) sets and monitors standards in all UK medical schools. Medical students must be able to demonstrate their competency and professionalism through a list of educational outcomes set out in the publication of ‘Tomorrow’s Doctors’ (2003 and 2009) prior to graduation. The intended outcome envisaged by WMS is to produce a generation of doctors who have knowledge, proficient clinical skills and the appropriate attitudes and behaviours ready for work as Foundation Year one doctors. The modified LAP forms a part of the summative assessment in assessing mainly clinical skills e.g. examination techniques.
As described, it has various components and proper, planned blueprinting against the learning objectives of the MBChB course and competencies of the various specialties (Wass, Van der Vleuten, Shatzer and Jones, 2001). Thus, assessment and curriculum design should be intricately interwoven and the assessment of course drives the learning (Wass, Van der Vleuten, Shatzer and Jones, 2001). Similarly, in postgraduate exams, it usually follows knowledge based assessment in the form of MCQs and careful alignment should be done considering the curriculum set by the institutions like the Royal Colleges.
3. PROPERTIES –
The property of an assessment or more commonly described as the ‘utility’ or usefulness of an assessment, was originally described by Van der Vleuten (1996) as a product of its validity, reliability, educational impact, cost-effectiveness and acceptability. In later years, the term ‘feasibility’ is explicitly acknowledged and has been described as an added component of an assessment’s utility in clinical education (Schuwirth and Van der Vleuten, 2006; PMETB, 2007).
Validity – Validity represents the extent to which a measurement actually measures what it intended to measure. In medical education, this signifies the degree of meaningfulness for any interpretation of a test score. (Downing, 2003)
A recent study demonstrated that observation does measure a useful and distinctive component of history taking clinical competence over and above the contribution made by the presentation (Wass and Jolly, 2001). It would seem logical that, rather than relying on a presentation alone, observation of the candidate while eliciting the history and carrying out the examination would be a more valid assessment of the candidate’s competencies in LAP. There is data in the literature for the face validity and content of the seven categories and the various components of consultation competence as contained in the Leicester Assessment Package (Fraser, McKinley and Mulholland, 1994). Whether or not the test scores obtained, in any particular LAP, are an accurate representation of ‘real world’ competency is subjected to a vast array of variables (Downing and Haladyna, 2004). For instance, the design of the test items, number of representative cases, experience training and leniency of examiners, consistency of simulated patients [ used mainly in psychiatry], completeness of marking schemes and the characteristics of the candidates, can all affect the validity of a LAP, making it a valid assessment in one education institute, but not another. However McKinley et al in their study of modified LAP in the general practise setting concluded that students will be exposed to a valid set of challenges to their consultation skills during consultations with minimum six largely unselected patients(McKinley, Fraser, Vleuten and Hastings, 2000).
Concurrent validity – whether the results are consistent with those tests of similar constructs? There are studies comparing OSCEs with observed long cases [which is similar to LAPs in some ways]; however, the author has not come across studies comparing different types of observed long case e.g. OSLER vs. Modified LAP. Does it predict the future performance – need to be assessed by more studies by following up the generation of students where the LAP is used as an assessment tool. Also, more studies are needed to investigate the construct validity of LAP.
Reliability – Reliability refers to the degree of consistency within a measurement tool, the extent to which an instrument is capable of repeatedly producing the same test score even when administered at different times and locations, with different candidates (Schuwirth and van der Vleuten, 2006). An assessment approach may be considered reliable when it yields consistent results regardless of when it is used, who uses it and which item or case is assessed. The importance of a specific type of reliability depends upon what is being assessed and the method by which it is being assessed. Generally speaking, the reliability or generalisability coefficient [since there is multiple potential sources of variability in this assessment tool] of 0.8 or higher are desirable (Shea and Fortna, 2002). Since the LAP has its roots in the long case and considering that it has evolved as a modified observed long case the author will attempt to address how the deficiencies in reliability of long cases were addressed to make it a LAP. Attempts to improve the reliability of the long case and make it into an effective LAP fall into three categories. First, studies have considered how many additional long cases would be required, with Kroboth et al (1992) suggesting that 6-10 long cases (each of 1-1.5 h) would achieve a generalisability coefficient of 0.8. Thus it would take a minimum of 4 different cases with at least 2 assessors in each to be reliable and therefore, careful sampling of representative cases with the use of blueprints is of paramount importance (Cangelosi, 1990).Second, commentators have attempted to increase the number of long cases, but have done so by employing a format that draws on shorter assessments (20-45 min) and multiple cases (4-6) assessed directly after each other in a single session (McKinley et al, 2000; Wass & Jolly, 2001; Hamdy et al, 2003; Norcini et al, 2003). Third, elements of the discussion and questioning aspects of the long case have been standardised in an attempt to improve reliability and student perceptions of fairness (Olson et al, 2000). Thus, having all relevant domains and enough numbers would increase the reliability and validity of the instrument. All these are incorporated in LAP and modified LAP to make a reliable instrument. This is further demonstrated by McKinley et al that the required levels of reliability can be achieved when the modified LAP is used by multiple markers in assessing the same consultation, that is, the package produces inter-assessor reliability. Their generalizability analysis indicates that two independent assessors assessing the performance of students across six consultations would achieve a reliability of 0Â·94 in making pass or fail decisions. Also in this study ninety-eight percent of students perceived that their particular strengths and weaknesses were correctly identified, 99% that they were given specific advice on how to improve their performance and 98% believed that the feedback they had received would have long-term benefit(McKinley, Fraser, Vleuten and Hastings, 2000). The example of assessment criteria and guidelines of the modified LAP used in the study are incorporated as Appendix 4 and 5.
Reliability of the LAP would not be severely threatened if the details of the test items have leaked out to the candidates unintentionally. However, it might if they have seen this patient before in the clinical setting. There were chances of this happening as the same banks of patients were used. This has been rectified by updating the bank and recruiting different patients. Broad sampling across cases is essential to assess clinical competence reliably (Wass and Vleuten, 2004).
Feasibility – The design and running of the modified LAP has significant resource implications. The crux of the issue for this kind of assessment is feasibility and cost effectiveness in terms of finance, space and manpower. Lots of time and effort are required to prepare and administer the instrument with high quality. Recruiting enough assessors, real patients, [simulated patients in case of psychiatry] and the equipment is always a huge challenge. Finding the trained assessors, who mostly tend to be busy clinicians, to take time from their heavy work schedule is also a challenge. However, lots of these issues can be overcome if this test is administered regularly i.e. sequential testing. This is further helped by having a database for the patients and assessors. Good advanced planning will also go a long way.
Acceptability – John B Cookson, et al. (2008), in a letter to the BMJ, say that feedback from students who have faced this examination in Leicester for the past 9 years strongly indicate that it is perceived as a fair and acceptable test of their abilities. From the organizational point of view it has been acceptable because of sequential testing. There is a healthy debate about replacing this with OSCEs or at least using OSCEs to supplement this for the assessment of clinical based practice.
Educational Impact – From the students’ point of view, the high-stakes LAP certainly exerts a great educational impact. The outcome of the examination will be used to decide the fate of the students. The LAP provides a platform for students to demonstrate their knowledge, skills, attitudes and behaviours in a single direct observational setting. Among other advantages it encourages students to develop the essential links between history, physical examination, diagnosis and management in each clinical challenge as the consultation progresses, not at some remote point thereafter. This is true as they practise for these exams in this way with peer group or supervision. It definitely has a huge educational impact as the students keep the final assessment in mind and practice the required domains in a structured way from various colleagues and peers. The impact is even greater in some medical schools like WMS as the feedback is given to the students in the formative intermediate exams. This enables the students to reflect on their performance and improve upon them.
4. STANDARDS – The Standards can be criterion-referenced (absolute standard) or peer-referenced (relative standard). ‘Borderline approach ‘, ‘fixed percentage ‘, ‘Angoff’s ‘ and ‘Hofstee methods’ are but a few of the many methods described in the literature for standard-setting (Norcini, 2003). Livingstone and Zieky (1989) proposed that the higher the stakes of the assessment, the greater the significance in using criterion-referenced standards. Thus criterion referenced can be used for LAP in the setting of high stakes exams like final professional exams in WMS or postgraduate exit exams
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.View our services
SAMPLING – It is impractical to combine all of the learning outcomes into a single, summative assessment. On a practical note one has to be careful while allotting the students and examiners for the different station/patients/item so that they get a wide variety of sampling that gives a different case mix. Thus, it also needs to be made sure that they are observed and examined by different examiners to increase the sampling.
Thus a careful and effective blueprinting should be done to make LAP a high utility instrument. Moreover, qualitative triangulation of evidence [sampling] from different sources such as satisfactory performance in each block, including other types of exams likes OSCE for practical skills will further improve the reliability (Stern, et al., 2003).
Evidence from a single point is not sufficiently generalisable to be extrapolated to all occasions (PMETB, 2007). Studies regarding validity and reliability are costly and difficult to design. ‘Triangulation’ is an excellent solution to critique the evidence collated qualitatively, where evidence from at least 3 occasions/ resources indicating the same outcome is analyzed (PMETB, 2007). Each LAP item should be treated independently, as an entity of its own, and, although literature can prove LAP do possess the potential to be highly valid and reliable, such study would need to be conducted on separate occasions to demonstrate convincing evidence. This is continuous ongoing process at most of the institutions including WMS.
It is suggested that using the LAP for direct observation of the consultation would be a useful tool to assess whether the student has successfully acquired the necessary competencies expected at the end of undergraduate or postgraduate training. One of the interesting components is to attempt to judge the overall relationship with the patient. Attitudes are most likely to be conveyed to the patient through the doctor’s behaviour and should, therefore, be assessed by the observation of behaviours in the clinical setting (REZLER, 1976). Nevertheless, this approach relies on judgements, made by ‘experts’, of non-standardised material and is, therefore, open to question. Because professionalism is a complex construct, it is unlikely that a single assessment will adequately measure it even though this assessment makes the attempt. Systematic assessment of professionalism should also include many different assessors, more than one assessment method and assessment in different settings (Lynch, et al., 2004). Hence, these assessments are a continuous process throughout the MBChB and, indeed, afterwards. In essence the LAP, as an assessment tool, is close to competency based assessment and demonstrates ‘shows how’ in the Miller triangle.
Long cases on their own have been criticised for poor reliability of examiner assessments and the lack of direct observation by the examiner of the trainee patient encounter [reducing the validity of the assessments].
There is evidence that adding an observing examiner to the history and physical examination part of the long case assessment increases reliability and helps to reconcile the complex interactions between the context and the skills/knowledge (construct) that the long case attempts to measure (Wass and Jolly, 2001).
The LAP is one such tool where there is observation during history and physical examination and structured assessment and proves to be of high utility. This is supported by some studies in the formative setting(McKinley, Fraser, Vleuten and Hastings, 2000)
The LAP, when analysed via its properties, is a good tool to assess observed clinical practice. It might not be so effective for practical skills and, for these, it probably needs to be supplemented by work based assessments or even OSCEs.
When supplemented with other assessment tools [Triangulation] LAPs can be used effectively for summative assessment in high stake assessments such as the final examinations for medical students.
The main drawbacks are feasibility, difficulty in blue-printing and cost effectiveness.
It encourages students to develop the essential skills together rather than combining them afterwards.
Bhakta, B., Tennant, A., Horton, M., Lawton, G. and Andrich, D. (2005) Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education. BMC Medical Education, 5 (1): 9.
Fraser, R. C., McKinley, R. K. and Mulholland, H. (1994) Consultation competence in general practice: establishing the face validity of prioritized criteria in the Leicester assessment package.[see comment]. British Journal of General Practice, 44 (380): 109-113.
Gleeson, F. (1997) AMEE Medical Education Guide No. 9. Assessment of clinical competence using the objective structured long examination record (OSLER). Medical Teacher, 19 (1): 7-14.
J. Mason, C. O. K. A. M. A. H. A. B. R. J. Y. (1999) A systematic review of foot ulcer in patients with Type 2 diabetes mellitus. I: prevention. Diabetic Medicine, 16 (10): 801-812.
Lau, J. K. C., Fraser, R. C. and Lam, C. L. K. (2003) Establishing the content validity in Hong Kong of the prioritised criteria of consultation competence in the Leicester Assessment Package (LAP). Hong Kong Practitioner, 25 (12): 596-602.
Lynch, D. C., Surdyk, P. M. and Eiser, A. R. (2004) Assessing professionalism: a review of the literature. Medical teacher, 26 (4): 366-373.
McKinley, R. K., Fraser, R. C., Vleuten, C. v. d. and Hastings, A. M. (2000) Formative assessment of the consultation performance of medical students in the setting of general practice using a modified version of the Leicester Assessment Package. Medical Education, 34 (7): 573-579.
Norcini, J. J. (2002) The death of the long case? BMJ, 324 (7334): 408-409.
Norcini, J. J. (2003) Setting standards on educational tests. Medical Education, 37 (5): 464-469.
Redsell, S. A., Hastings, A. M., Cheater, F. M. and Fraser, R. C. (2003) Devising and establishing the face and content validity of explicit criteria of consultation competence in UK primary care nurses. Nurse Education Today, 23 (4): 299-306.
REZLER, A. G. (1976) Methods of attitude assessment for medical teachers. Medical Education, 10 (1): 43-51.
Shea, J. A. and Fortna, G. S. (2002) 3 Psychometric Methods. International handbook of research in medical education, 97.
Stern, D. T., Wojtczak, A. and Schwarz, M. R. (2003) The assessment of global minimum essential requirements in medical education. Medical Teacher, 25 (6): 589-595.
Teoh, N. C. and Bowden, F. J. (2008) The case for resurrecting the long case. BMJ, 336 (7655): 1250.
Turnbull, J., Turnbull, J., Jacob, P., Brown, J., Duplessis, M. and Rivest, J. (2005) Contextual Considerations in Summative Competency Examinations: Relevance to the Long Case. Academic Medicine, 80 (12):
Wass, V. and Jolly, B. (2001) Does observation add to the validity of the long case? Medical Education, 35 (8): 729-734.
Wass, V., Van der Vleuten, C., Shatzer, J. and Jones, R. (2001) Assessment of clinical competence. The Lancet, 357 (9260): 945-949.
Wass, V. and Vleuten, C. v. d. (2004) The long case. Medical Education, 38 (11): 1176-1180.
PROPOSED ITEM – AN OBSERVED MODIFIED LONG CASE
Following Continued Medical Education day for Diabetes Specialist Registrars-SPRs[CME Appendix 2]
Aimed at senior SPR that is year 4-5 just a year before completion of their training
Can be used in their portfolios for Continued Professional Development CPD
Generic skills assessed- Communication, Professionalism, Clinical reasoning in uncertain environment, Teamwork and Multidisciplinary Approach
Assessment involves 4 items of observed long modified long cases and MCQ-paper aiming mainly at knowledge base. One such item – an observed modified long case has been described below.
Expected learning outcomes for this formative assessment item-
Able to assess vascular, neurological status of foot and in a patient with diabetes
Diagnose pedal pathologies in the
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: