An analysis of potential bias in faculty evaluations

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The literature review at hand will highlight the importance of student evaluation of teaching (SET) and will shed light on some biases that relate to student, course or the instructor and in turn try to hide the true picture in terms of inaccurate results. The aim is to find a relationship between biases in faculty evaluations and the sources of these biases.

Pounder (2007) carried out a research to find out whether student evaluation of teaching is worthwhile or not? The research revolved around a triad i.e. student related, course related and instructor related factors within which potential sources of bias were highlighted. The study was purely qualitative and descriptive in nature and the biases were justified with reference to previous researches on this topic. Under student-related factors some sources of bias were "student academic level and maturity" i.e. senior students are lenient raters as opposed to junior students; "gender effect" in which female students tend to rate their female professors high. Whereas male students are neutral in their ratings; and students tend to give lower ratings to teachers they dislike. Some course related factors are 'class size" i.e. the larger the class size the poor SET scores as the interaction level between instructor and student decreases; "course content" i.e. teachers who teach quantitative courses face a tougher time and get lower ratings than those who teach qualitative courses; and "class timing" which highlights that courses taught at the end of the week get better ratings as students are more relaxed then. Moreover, teacher related characteristics influence ratings like "grade inflation" in which teachers mark students leniently in order to get good SET scores; "gender stereotyping" is another factor which highlights that students have certain perceptions related to gender i.e. females are associated with warmth, care and so on and if these perceptions are not met then they are reflected in the SET scores; 'age, experience, rank" signifies that older teachers get better ratings than those who are less experienced; "teachers' influencing tactics" i.e. making the course or exams easier for students just to get popular and to get a good rating; and 'perception of instructor" affects SET scores significantly. Furthermore, the research emphasizes that the SETs have faced a lot of criticism in terms of their accuracy, and other measures like classroom experience and leadership must be incorporated to get a better picture. Moreover, the answer to the research conducted is that SETs should be used along with other measures to get accurate results. Other measures and SETs are equally important in analyzing the whole situation and one without the other is of no use.

Ahmadi, Helms and Ralszadeh (2001) investigate on the business students' perceptions of faculty evaluations. Students have different views about the weightage given to SETs and their usefulness and the study focused on the students' feelings about evaluations. A survey questionnaire was distributed to a sample size of 385 at University of Tennessee. The students were polled according to GPA, gender, ethnicity and other demographics. A 95% confidence level was decided that would lead to a sampling error of less than 5%. In order to maintain anonymity and confidentiality the students were instructed not to write their names on the questionnaire. The results depicted that it took students about 2.23 minutes to fill the form which included about 35 questions regarding teacher's effectiveness and six questions about students. Moreover, results portrayed that 79.4% students believed they were objective and serious while filling the form and they considered these evaluations to be important for teachers. On the other hand, students disagreed at rating the faculty higher for giving less work or because of fear of their grade being affected. The research highlighted that students were confused about the fact that whether these rating will improve instruction and wanted more accountability. Students differed in their responses concerning the influence of peer pressure on ratings of faculty and the time for evaluation. Handwritten comments were considered to be the most important as they provided better feedback. Furthermore, 51% students suggested that other modes of evaluation like those by Deans or results of certification exams and so on should be used. In addition, many students felt the need to publish the results of student evaluation on the newspaper or the web.

Al-Issa and Sulieman (2007) conducted a research on the perceptions and biasing factors of SETs. The methodology used was that a questionnaire with a Likert scale was designed and distributed to 819 students at the American University of Sharjah. The students had majors in arts, science, business and management and belonged to the Gulf region, the Levant, Africa and the Indian sub-continent. The aim of the questionnaire was to find out students views of SETs and the non-instructional factors leading to bias. Students were given 15 minutes to fill in the form at the beginning of classes. The percentages of responses for each question were reported and mean responses were calculated. As far as perception is considered 79% students believed that the university should continue with the process of student evaluation and only 32% of the students were of the view that instruction improves as a result of SETs. Moreover, the overall mean for each biasing factor was calculated to see which variable has the highest mean and affected the most. Academic status was found out to be the most influencing variable followed by gender and GPA. Other biasing factors highlighted in the survey were expected grade in the course, age, personality and terms with the instructor. The findings indicated that evaluations at the university were biased and different responses were made by students of different culture or linguistic backgrounds.

Liaw and Goh (2003) studied the evidence and control of biases in student evaluations of teaching. The research carried out used the data from the first teaching exercise of Faculty of Economics & Administration at the University of Malaya. A questionnaire was floated whose most important statements were organization, knowledge, presentation, clarity and appropriateness. Individual means for each statement are calculated by weighting the score with the percentage of responses and the mean of percentages is the overall teaching rate. Moreover, an analysis is done through multiple regressions. In order to deal with variance problems standard errors are used in the calculation of t-statistics for tests of significance. If a characteristic is significant then it is a source of bias otherwise not. The results indicate that only class size is significant at the 10% level and the non-significance of other variables show that they are not sources of bias. Due to the problem of multicollinearity, regression was re-estimated but similar results were found. Thus, the central finding was that class size is a major biasing factor in evaluations.

Dickey and Pearson (2005) conducted a research on the Recency effect in college student course evaluations. This effect basically means that students tend to remember and rate teachers on the nearest event that takes place before the evaluations. The purpose of the study was to find out if students were sensitive to the recency effect. A number of 113 students taught by the same instructor were randomly selected at the University of Florida. Moreover, five other instructors and random students were selected for an interview. The evaluation form designed had two main sections: organization of course and teaching skill and the total score was obtained by summing up items of each section. The mean and standard deviation of all these items were calculated. The experiment was based on both qualitative and quantitative research methods. In the quantitative analysis two separate classes (training plus diary and diary only, both pre and post tested) were analyzed. The results showed that students getting training in recency effect tend to avoid it over the long run. In the qualitative analysis an interview was conducted which revealed that less students were aware of this rating error; majority felt that student evaluations were not worthwhile; and instructors believed that SET scores should not be used for salary decisions and should only focus on improving instruction. On the whole, the research signified that trained students who kept a diary gave stable ratings overtime which led to less bias. One limitation of the research was that the entire focus was on student's behavior rather than the teacher's behavior. Furthermore, it is difficult to examine and measure recency error.

Germain and Scandura (2005) studied two important variables i.e. grade inflation and student individual differences that act as systematic bias in faculty evaluations. The research was descriptive and qualitative in nature as it was based on past researches. The study highlights that the use of SETs has increased from 1973 to 1993 by 57% and there is less evidence of research in this area. As far as grade inflation as a bias is concerned, it is seen as a process of reciprocity in which both the teacher and student reward each other. Moreover, student individual differences have been pinpointed like their learning style i.e. different students rate the content and lecture presentation differently. Secondly, some students have learning disabilities and feel that the instructor did not meet their expectations and thus rate the instructor low. Other sources of bias like how useful the course is for students, their reason for attending college, students' rapport with the instructor, gender, age, status and cultural beliefs influence SET scores to a great extent. In addition, the research draws attention to certain ways of combating these biases through educating students how to avoid the halo effect; involving instructors in designing the questionnaire so that they seriously use the feedback given to them; and conducting evaluations after the midterm to ensure accurate results as students and instructors are in a better position to judge each other then. Better instructor evaluations conducted would lead to better education quality and thus more student retention.

Griffin (2004) worked to establish a relationship between grading leniency, grade discrepancy and SET scores. The study showed that there was a strong positive correlation between grading leniency and SET scores. Moreover, two theories were examined to study grade discrepancy. Firstly, attribution theories in which students punished instructors for lower grades and reward themselves for higher grades. On the other hand, retribution theory in which students punished teachers for lower grades but reward the instructors in case of higher grades. A questionnaire with a five point scale and 12 statements was designed and distributed to a sample size of 754 students at some southeastern United States university. Both expected and deserved grades were assessed and certain variables were codified as either 1 or 0 to be treated as dummy variables. There was complete confidentiality in terms of the results of the form as they were distributed in the absence of teachers and displayed only after the course results had been submitted. In order to reach a conclusion a multilevel regression was run to analyze differences in ratings within and across classes. The results indicated a weak relationship of course content and a strong relationship of fair evaluation of students with grading leniency. Complicated findings were seen for grade discrepancy as the smallest difference was estimated for teacher's knowledge and the largest difference for fair evaluation of students. On the whole, a strong positive relationship was estimated for grading leniency and ratings. Moreover, the presence of self-serving bias was seen in the evaluations that affirm the attribution theory.

Rovai, Ponton, Derrick and Davis (2005) carried out a comparative analysis of the student evaluation of teaching in the virtual and traditional classrooms. The purpose of this study was to compare the answers that students gave for open-ended questions in relation to both online and face-to-face courses. The aim was to find the presence of any biases due to the delivery medium that hindered the true picture. An instrument containing five open-ended neutral questions was used to collect data and the same instrument was used for both online and face-to-face course evaluations. It was distributed to 4500 students of some university of eastern Virginia at the end of the term period. Certain themes like praise, constructive criticism and negative criticism were developed for the purpose of analysis and a two-way chi-square contingency table was used to determine the observed and expected values for online and on-campus evaluations. The results indicated that online courses got a greater percentage of praise and negative criticism, whereas, the on-campus evaluations received a higher percentage of constructive criticism. The study pinpointed that generally online course evaluations experienced more negative ratings than on-campus evaluations. A major limitation of the study was that many discrepancies were observed within a class where one aspect was criticized by some students and praised by the other at the same time. Moreover, positive ratings were seen in cases where anonymity was not maintained. On the other hand, in cases where anonymity was ensured students freely passed negative comments regarding their instructors which had no end. In addition, communication problems were experienced by many students and another major issue highlighted was that students did not share their dissatisfaction with the instructor during the semester when he/she could try to solve it but rather preferred commenting towards the end of the semester. The research emphasizes that the use of closed-ended questions is a better approach than open-ended questions. In addition students should be guided as to how the two delivery mediums differ in order to avoid biases because evaluations for online courses are biased. Furthermore, feedback for effective teaching has been emphasized. The research conducted faces a threat of biased results as only 70% students filled out the forms properly. Also teacher's communication and writing skills and closed-ended questions are recommended for further researchers which have been the limitations of the research at hand.

Arnold (2009) carried out a study to find out the impact of examinations on student evaluations of teaching. The aim of the research was to find out whether evaluations conducted before or after the exams had any effect on SET scores. A survey was conducted on 3000 students at the Erasmus School of Economics (ESE) and the focus was on within class differences which were made possible by demeaning the evaluation scores with class averages. Firstly, a test for equality of means for student characteristics was done which showed that timing of exams was not an important biasing factor. In addition, the test for equality of means for evaluation scores in total, evaluation scores by students who pass and those who fail was carried out. The mean, standard deviation and p-values for both pre and post exam scenarios were calculated. The results indicated that timing of exams had absolutely no effect on students who passed and had little effect on those who failed. Also there was no significant difference in the grades of students who did the evaluation before or after the exams. Furthermore, a regression analysis was made to inquire about the purpose of the research. The results pinpoint that there is a positive relationship between grades and SET scores and most of the findings affirm the results of the test for equality of means. Other student characteristics were tested in the regression model which did not change the results of the study. The study emphasizes that post-exam evaluations need to be done in order to get accurate results as students can better judge what they have gained from the course at that point in time.

Gordon (2000) conducted a research on the student evaluations of college instructors. Different researches were referred to and secondary data and statistics were used. The study identifies two purposes of SETs namely formative i.e. to improve instruction and summative i.e. for salary, tenure decisions and so on. Moreover, non-normative evaluations are recommended which can be the case if only ranking questions are used in SETs. Also literature does not support the claim of SET being unreliable. A special emphasis has been given to Dr. Herbert Marsh "Student Evaluation of Educational Quality (SEEQ)" factors that indicated that evaluations by students are multidimensional i.e. there is no one single factor to measure good teaching. Also the quantity of learning was the most significant characteristic and course difficulty was the least significant. Moreover, an instructor was rated in a similar fashion over the years. Many non-instructional factors were seen to be the sources of bias like course difficulty, reputation of the teacher, a previous interest in the course, GPA, class size and so on. Perception of the instructor is viewed to play a very important role as in Dr. Fox experiment an instructor was selected to deliver a lecture which lacked quality in terms of content but he/she was very enthusiastic. Students gave him/her higher scores just on the basis of "Enthusiasm" which led to flawed ratings. On the whole, the study suggested that evaluations should be used for formative purposes to improve the teaching quality which is the ultimate aim and students' voice should be there because they are the final consumers of education. Furthermore, percentages regarding whether students perceive SETs to be valid, biased or neither for different years were observed.

Khan and Shah (2005) investigate on the evaluation of teaching behavior of the university teachers as perceived by their students. Students are considered to make better evaluations if they have some set standard to assess. Some characteristics of effective teaching highlighted were clearness of lecture, changeability, enthusiasm, the point of reference of tasks and opportunity for students to learn. The purpose of the study was to find out the degree to which the 14 principles of Morton C. Shipley were used at the University of Gomal. Moreover, to find out the effort put in by teachers in terms of individual differences, teaching styles and guidelines to students. A questionnaire was developed and distributed to random students from the Social Science and Physical Science department. An analysis of any four teachers was required by students. The results implied that no considerable difference was found among the mean scores of teachers belonging to the Physical Science department, whereas, a minor difference was there in the case of the Social Science department. Furthermore, the F-ratios were calculated signifying dissimilarity in the comments of students for both the departments. This highlights the presence of certain biases that lead to varying results.

Rowley (2003) carried out a research on designing student feedback questionnaires. The study accentuates that there is a need for redesigning questionnaires as SETs are widely used in many universities for summative decisions. There are increasing concerns regarding whether students are capable of evaluating their instructors immediately or after three years of studying the course? In addition, biases are found more in questions pertaining to performance rather than other assessment criteria. Questionnaire design is of supreme importance and should answer a few major questions like the objectives of the evaluation process i.e. its serves as a feedback for instructors, provides a platform to students where they can express themselves freely and so on. Secondly, questionnaires should be developed to serve a number of purposes i.e. they ought to be standardized, reliable and comparable. Thirdly, important issues should be addressed in the questionnaire i.e. they should discuss those issues that students are competent enough to judge. Fourthly, the procedure to collect, analyze and utilize data i.e. anonymity should be ensured and protection maintained at all ends. In terms of data collection research suggests a distribution via mail, central point or tutors; in case of analysis a report on the entire data can be used at different places where needed; and for data usage it should be revealed to all the concerned parties. Thus, if the questionnaire is designed in the care of all the proposed questions then it is likely to yield accurate results.