Investigating the effectiveness of three different test types (SBA, EMQ, SAQ).

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

  1. Introduction

Testing in the medical and science field is very important because students need to be able to apply information clinically which will make them safe practitioners (Hayes and McCrorie 2010). Examinations should be effective. In this regard, Froncek, Hirschfeld and Thielsch (2014) have constructed a 12 point quality assessment criteria for constructing effective exams. A multitude of test types can be used in an exam, either individually or in combination none of which are hard to construct (Fronceka, Hirschfeldb, Thielschc 2014). These tests will vary however in the cognitive order they will test: lower order cognitive skills (LOCS) or higher order cognitive skills (HOCS) (Palmer and Devitt 2007).

Papers written on test types and their effectiveness are few and mainly focus on which test type is best within a certain category. Some of the questions asked: is BSA more suitable in assessing HOCS in the medical field than traditional true/false MCQ (Tan and McAleer 2008), or does student performance differ by test type, between SAQ and modified essay questions (MEQ) (Wallerstedt, Erickson, Wallerstedt 2012)? Other papers have evaluated whether learning styles influence student performance in MCQ and SAQ tests and have shown no influence (Wilkinson, Boohan, Stevenson 2013). Papers comparing these 3 test types in pairs of 2 or even 3 are scarce and the few available do not all focus on test effectiveness. SBA and EMQ have been shown to assess factual knowledge but can be constructed to assess HOCS, while SAQ mostly assesses HOCS (Hayes and McCrorie 2010). SBA has been shown to assess more complex knowledge than traditional MCQ and to offer greater coverage of the curriculum than EMQ (Davies and Murphy 2011). However, EMQ is still preferred over traditional MCQ because it can test HOCS (Hayes and McCrorie 2010).

A well-built method should give information on the characteristics of the volunteers (Froncek, Hirschfeld, Thielsch 2014) and a rigorous description of the tests (Wass, McGibbon, Van der Vleuten 2001) however most methods fail to give an account of one of the categories or contain a very short and generalised method.

  1. Aims of the study

The majority of studies have been done on SBA and prove its superiority to traditional MCQ because it tests HOCS (Davies and Murphy 2011). While the occasional papers show that SBA is preferred to EMQ because it can cover more curriculum, EMQ is preferred over MCQ because it can also test HOCS (Hayes and McCrorie 2010). SAQ is only compared to MEQ (Wallerstedt, Erickson, Wallerstedt 2012). As studies analysing the 3 test types are scarce, the following aims have been proposed for the study:

  1. To compare the student performance in 3 popular types of testing (SBA, EMQ, SAQ) in the same paper.
  2. To enrich the available literature with a paper looking at 3 testing methods that may require the application of different cognitive abilities.
  3. To observe whether student performance is reliant on the order of presentation of the 3 test types.
  1. Hypotheses

The hypotheses formulated to carry out the study are:

  1. SBA is a more effective method to assess student performance compared to EMQ in exam conditions.
  2. EMQ is a more effective method to assess student performance compared to SAQ in exam conditions.

The associated null hypotheses (alternative hypotheses) are:

1. EMQ is a more effective method to assess student performance compared to SBA in exam conditions.

2. SAQ is a more effective method to assess student performance compared to EMQ in exam conditions.

  1. Selection process

Defining the sample population as tightly as possible allows for better selection of individuals and restricts variations (De Smith 2015). The sample population will be selected from Dundee University students currently enrolled on an undergraduate course in either BSc Anatomical Sciences, BSc Anatomical and Physiological Sciences or BMSc Medicine; all students have to be from the same university and to be taught the same curriculum. The 3 test types will be integrated in the end of the semester summative exam the students will already be taking. In order to participate, the students only need to give consent for their already obtained results to be used for the study. Sample size will be 84 students, which represents the majority of students enrolled on the course. If more students were available, the sample size would have been calculated according to Altman’s normogram at a significance level of 80% and standard deviation values obtained from previous studies (Noordzij, Tripepi, Dekker, Zoccali, Tanck, Jager 2009).

  1. Volunteers characteristics

Both males and females students within the age range 18 to 23 years old will be accepted in the study. It is important that students have not been engaged in non-cognitive activities or jobs for an extended period of time as that may skew the data. Furthermore, this age is reflective of individuals who, in general, have more time and fewer commitments, and who can prioritise and invest in studying.

  1. Inclusion/Exclusion

First and second year students will not be recruited for the study due to the high probability of being below 18 years old and having not specialised in Anatomy. Students should all be from an undergraduate course, with no substantial age difference nor experience of human anatomy from prior education, placements or jobs. Scottish universities see an increasing number of EEA and international students enrolling within their schools. It is therefore important that the study accommodates both these student categories, regardless of the student’s educational background.

  1. Assumptions or precautions

Prior to taking part in the study, the students will be asked to sign a consent form stating the voluntary nature of the study, their right to withdraw as volunteers and to withdraw their data at any time with no consequences, and a data usage and sharing code of practice. No credit is awarded in any other subjects for their participation in this study and the students’ grades and progression will not be effected if they decline to take part. Student performance in the academic year 2014/2015, when this testing system has been launched, will be comparable with future years as long as the curriculum is not changed and the same test is used. This allows for follow-up to observe if after several years students’ performances continue to show the same results which would allow for a bigger study to be reported.

  1. Methodology to be used

The 3 test types to be used in the study have all been placed in 1 exam that will replace the current summative exam of third year Anatomy and Anatomy/Physiology and the fourth year summative exam for Medicine, testing students’ knowledge of Anatomy. Students will have 2 hours to complete the exam and should devote equal measures of 40 minutes for each of the 3 sections. The assessment will take place on a computer, however the order of the tests will be randomised to account for tiredness as a limitation. Therefore, groups of 14 students will receive the test and start on a certain test type (group 1: test 1- SBA, test 2 EMQ, test 3 – SAQ), the next group will start on a different test type (group 2: test 1 EMQ, test 2 – SAQ, test 3 – SBA) etc. to account for all 6 combinations. An Anatomical Sciences Tool (AST) has been developed based on Bloom’s Taxonomy assessment tool (Crowe, Dirks, Wenderoth 2008) for creating a pool of questions for all 3 test types. The questions will vary in difficulty and will test both lower order cognitive skills (recall) and higher order cognitive skills which require conceptual understanding (application, synthesis). The questions will be the same for all students and the material assessed will be from the second semester curricula on pelvis, lower limbs, head and neck. Great attention will be given so none of the 3 tests types assesses harder topics. The number of questions for each test type will be calculated so as to fit the same time frame, therefore SBA will have 52 questions (0.75 point/correct answer), EMQ will have 40 questions (1 point/correct answer) and SAQ will have 4 questions (10 points/correct answer). Students are informed about the possibility of their results being used in the study only after they have undergone the summative exam to limit them from underperforming or over performing. None of the questions have been previously asked (Wallerstedt, Erickson, Wallerstedt 2012) as an example or in the form of a past paper. None of the questions in any of the initial tests undertaken will give hints or answers to the next questions and there will be no diagrams in any of the test types. Ethical approval for the study will be obtained from the Joint Ethics Committee of College of Art, Science & Engineering and Medical School of Dundee University.

  1. Data collection and analysis

All tests types will be on the computer using the Question Mark assessment software. The tests will be analysed manually for minor spelling mistakes which may award students extra points. The percentage grade will be recalculated to obtain the test results to be used further in the study. The data will be analysed using the R studio software for statistical analysis. The mean, median and standard deviation will be calculated for each of the 3 test type results. A bar plot can be created for better visualisation of the results. The variance was calculated for Anova; the latter will be used to compare the 3 test type results for difference between the groups, while the correlation coefficient will be calculated to show how strong these differences are between BSA and EMQ, between EMQ and SAQ, and between BSA and SAQ. A boxplot would also show if the difference is significant. A chi-square test will be used to compare observed data with the expected data from the hypothesis. A p value where p<0.05 will be considered significant for any of the 3 test types.

  1. Results

The results of this novel study will show a difference between the 3 test types, it will enrich the existing literature with a valuable contribution and it will test whether performance declines due to tiredness in normal exam condition. They will show which tests are most effective and least effective to be used in exam conditions among the 3 and whether exams should be made shorter for better results. The results will allow universities to better their exam in order to test a varied range of questions according to content, depth of knowledge and skill set, especially useful in the medical and science context.

  1. Potential outcomes and recommendations

A disadvantage with SAQ is the reduced number of questions asked within the time limit; this may negatively affect content validity and reliability (Wallerstedt,

Erickson, Wallerstedt 2012). Lengthy examinations with more questions can increase reliability, however for test such as BSA, increasing the testing time beyond 3 hours is not useful because the increase in reliability is very small.

The effectiveness of each test type has been assessed individually and by comparison to past papers. BSA results as the most effective, assessing a broad range of questions and both HOCS and LOCS (Davies and Murphy 2011). EMQ is similar to BSA except it does not assess questions that are as varied (Davies and Murphy 2011). SAQ however only assesses HOCS and while they assess very in depth knowledge, they cover very few topics. Future studies could compare the individual test results from this study with other test types such as true/false MCQ, MEQ, Objective Structured Clinical Examination (OSCE) (Wass, Mcgibbon, Van Der Vleuten 2001) or could analyse exam effectiveness in the 3 individual test types to 1 exam containing the 3 test types into one. Sample data should be collected and analysed over multiple years and through comparison and observation to decide whether the examination needs to evolve in order to ensure it continues to meet the best educational and assessment standards (Davies and Murphy 2011).


Crowe A, Dirks C, Wenderoth MP, 2008 Biology in Bloom: Implementing Bloom's Taxonomy to Enhance Student Learning in Biology CBE Life Sciences Education 7:368–381

Davies N, Murphy MG, 2011 Update on the MRCOG examination Obstetrics, Gynaecology & Reproductive Medicine 21:212–213

De Smith MJ, 2010 Statistical Analysis [e-book] Available through: statsref website [accessed 20th March 2015]

Froncek B, Hirschfeld G, Thielsch MT, 2014 Characteristics of effective exams—Development and validation of an instrument for evaluating written exams Studies in Educational Evaluation 43:79-87

Hayes K, McCrorie P, 2010 The principles and best practice of question writing for postgraduate examinations Best Practice & Research Clinical Obstetrics & Gynaecology 24:783–794

Noordzij M, Tripepi G, Dekker FW, Zoccali C, Tanck MW, Jager KJ, 2009 Sample size calculations: basic principles and common pitfalls Nephrology Dialysis Transplantation 25:1388-1393

Palmer EJ, Devitt PG, 2007 Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple choice questions? Research paper BMC Medical Education 7:49-56

Tan LT, McAleer JJ, 2008 The introduction of single best answer questions as a test of knowledge in the final examination for the fellowship of the Royal College of Radiologists in Clinical Oncology Clinical Oncology 20:571-576

Wallerstedt S, Erickson G, Wallerstedt SM, 2012 Short Answer Questions or Modified Essay Questions— More than a Technical Issue International Journal of Clinical Medicine 3:28-30

Wass V, Mcgibbon D, Van Der Vleuten C, 2001 Composite undergraduate clinical examinations: how should the components be combined to maximize reliability?

Medical Education 35:326-330

Wilkinson T, Boohan M, Stevenson M, 2014 Does learning style influence academic performance in different forms of assessment? Journal of Anatomy 224:304–308