|
| Home > Measurement Resources > Performance Measurement in the News > |
Make Performance Appraisal Relevant |
|
Winston Oberg, United Nations Environment Programme, http://www.unep.org/restrict/pas/paspa.htm FOREWORD The author's position is that performance appraisal programs can be made considerably more effective if management will fit practice to purpose when setting goals and selecting appraisal techniques to achieve them. He presents a catalog of the strengths and weaknesses of nine of these techniques; then he shows how they can be used singly and in combination with different performance appraisal objectives. He maintains that if management will undertake this matching effort, many familiar pitfalls of appraisal programs can be avoided. Mr. Oberg is Professor of Management at the Graduate School of Business Administration, Michigan State University. These frequently voiced goals of performance appraisal programs underscore the importance of such programs to any ongoing business organization:
It has been estimated that over three fourths of U.S. companies now have performance appraisal programs. (1) In actual practice, however, formal performance appraisal programs have often yielded unsatisfactory and disappointing results, as the growing body of critical literature attests. (2) Some critics even suggest that we abandon performance appraisal as a lost hope, and they point to scores of problems and pitfalls as evidence. But considering the potential of appraisal programs, the issue should not be whether to scrap them; rather, it should be how to make them better. I have found that one reason for failures is that companies often select indiscriminately from the wide battery of available performance appraisal techniques without really thinking about which particular technique is best suited to a particular appraisal objective. For example, the most commonly used appraisal techniques include:
Each of these has its own combination of strengths and weaknesses, and none is able to achieve all the purposes for which management institutes performance appraisal systems. Nor is any one technique able to evade all of the pitfalls. The best anyone can hope to do is to match an appopriate appraisal method to a particular performance appraisal goal. In this article, I shall attempt to lay the groundwork for such a matching effort. First, I shall review some familiar pitfalls in appraisal programs; then, aganist this background, I shall assess the strengths and weaknesses of the nine commonly used appraisal techniques. In the last section, I shall match the organizational objectives listed at the outset of this article with the techniques best suited to achieving them. (1) See W.R. Spriegel and Edwin W. Mumma, Merit Rating of Supervisors and Executives (Austin, Bureau of Business Research, University of Texas, 1961); and Richard V. Miller, "Merit Rating in Industry: A Survey of Current Practices and Problems," ILR Research, Fall 1959. (2) See, for example, Douglas McGregor, "An Uneasy Look at Performance Appraisal," HBR May-June 1957, p.89; Paul H. Thompson and Gene W. Dalton, "Performance Appraisals Managers Beware," HBR January-February 1970, p.149; and Albert W. Schrader, "Let's Abolish the Annual Performance Review," Management of Personnel Quarterly, Fall 1969, p.293. Some common pitfalls Obstacles to the success of formal performance appraisal programs should be familiar to most managers, either from painful personal experience or from the growing body of critical literature. Here are the most troublesome and frequently cited drawbacks:
(3) See Herbert H. Meyer, Emanuel Kay, and John R.P. French, Jr., "Split Roles in Performance Appraisal," HBR January-February 1965, p. 123. A look at methods The foregoing list of major program pitfalls represents a formidable challenge, even considering the available battery of appraisal techniques. But attempting to avoid these pitfalls by doing away with appraisals themselves is like trying to solve the problems of life by committing suicide. The more logical task is to identify those appraisal practices that are (a) most likely to achieve a particular objective and (b) least vulnerable to the obstacles already discussed. Before relating the specific techniques to the goals of performance appraisal stated at the outset of the article, I shall briefly review each, taking them more or less in an order of increasing complexity. The best-known techniques will be treated most briefly. 1. Essay appraisal In its simplest form, this technique asks the rater to write a paragraph or more covering an individual's strengths, weaknesses, potential, and so on. In most selection situations, particularly those involving professional, sales, or managerial positions, essay appraisals from former employers, teachers, or associates carry significant weight. The assumption seems to be that an honest and informed statement -either by word of mouth or in writing- from someone who knows a man well, is fully as valid as more formal and more complicated methods. The biggest drawback to essay appraisals is their variability in length and content. Moreover, since different essays touch on different aspects of a man's performance or personal qualifications, essay ratings are difficult to combine or compare. For comparability, some type of more formal method, like the graphic rating scale, is desirable. 2. Graphic rating scale This technique may not yield the depth of an essay appraisal, but it is more consistent and reliable. Typically, a graphic scale assesses a person on the quality and quantity of his work (is he outstanding, above average, average, or unsatisfactory?) and on a variety of other factors that vary with the job but usually include personal traits like reliability and cooperation. It may also include specific performance items like oral and written communication. The graphic scale has come under frequent attack, but remains the most widely used rating method. In a classic comparison between the "old-fashioned" graphic scale and the much more sophisticated forced-choice technique, the former proved to be fully as valid as the best of the forced-choice forms, and better than most of them. (4) It is also cheaper to develop and more acceptable to raters than the forced-choice form. For many purposes there is no need to use anything more complicated than a graphic scale supplemented by a few essay questions. (4) Jarnes Berkshire and Richard Highland, "Forced-Choice Performance Rating on a Methodological Study," Personnel Psychology, Autumn 1953, p. 355. 3. Field review When there is reason to suspect rater bias, when some raters appear to be using higher standards than others, or when comparability of ratings is essential, essay or graphic ratings are often combined with a systematic review process. The field review is one of several techniques for doing this. A member of the personnel or central administrative staff meets with small groups of raters from each supervisory unit and goes over each employee's rating with them to (a) identify areas of inter-rater disagreement, (b) help the group arrive at a consensus, and (c) determine that each rater conceives the standards similarly. This group-judgment technique tends to be more fair and more valid than individual ratings and permits the central staff to develop an awareness of the varying degrees of leniency or severity -as well as bias- exhibited by raters in different departments. On the negative side, the process is very time consuming. 4. Forced-choice rating Like the field review, this technique was developed to reduce bias and establish objective standards of comparison between individuals, but it does not involve the intervention of a third party. Although there are many variations of this method, the most common one asks raters to choose from among groups of statements those which best fit the individual being rated and those which least fit him. The statements are then weighted or scored, very much the way a psychological test is scored. People with high scores are, by definition, the better employees; those with low scores are the poorer ones. Since the rater does not know what the scoring weights for each statement are, in theory at least, he cannot play favorites. He simply describes his people, and someone in the personnel department applies the scoring weights to determine who gets the best rating. The rationale behind this technique is difficult to fault. It is the same rationale used in developing selection test batteries. In practice, however, the forced-choice method tends to irritate raters, who feel they are not being trusted. They want to say openly how they rate someone and not be second-guessed or tricked into making "honest" appraisals. A few clever raters have even found ways to beat the system. When they want to give average employee Harry Smith a high rating, they simply describe the best employee they know. If the best employee is Elliott Jones, they describe Jones on Smith's forced-choice form. Thus, Smith gets a good rating and hopefully a raise. An additional drawback is the difficulty and cost of developing forms. Consequently, the technique is usually limited to middle- and lower-management levels where the jobs are sufficiently similar to make standard or common forms feasible. Finally, forced-choice forms tend to be of little value- and probably have a negative effect- when used in performance appraisal interviews. 5. Critical incident appraisal The discussion of ratings with employees has, in many companies, proved to be a traumatic experience for supervisors. Some have learned from bitter experience what General Electric later documented; people who receive honest but negative feedback are typically not motivated to do better - and often do worse - after the appraisal interview.(5) Consequently, supervisors tend to avoid such interviews, or if forced to hold them, avoid giving negative ratings when the ratings have to be shown to the employee. One stumbling block has no doubt been the unsatisfactory rating form used. Typically, these are graphic scales that often include rather vague traits like initiative, cooperativeness, reliability, and even personality. Discussing these with an employee can be difficult. The critical incident technique looks like a natural to some people for performance review interviews, because it gives a supervisor actual, factual incidents to discuss with an employee. Supervisors are asked to keep a record, a "little black book," on each employee and to record actual incidents of positive or negative behavior. For example: Bob Mitchell, who has been rated as somewhat unreliable, fails to meet several deadlines during the appraisal period. His supervisor makes a note of these incidents and is now prepared with hard, factual data: "Bob, I rated you down on reliability because, on three different occasions over the last two months, you told me you would do something and you didn't do it. You remember six weeks ago when I. . ." Instead of arguing over traits, the discussion now deals with actual behavior. Possibly, Bob has misunderstood the supervisor or has good reasons for his apparent "unreliability." If so, he now has an opportunity to respond. His performance, not his personality, is being criticized. He knows specifically how to perform differently if he wants to be rated higher the next time. Of course, Bob might feel the supervisor was using unfairly high standards in evaluating his performance. But at least he would know just what those standards are. There are, however, several drawbacks to this approach. It requires that supervisors jot down incidents on a daily or, at the very least, a weekly basis. This can become a chore. Furthermore, the critical incident rating technique need not, but may, cause a supervisor to delay feedback to employees. And it is hardly desirable to wait six months or a year to confront an employee with a misdeed or mistake. Finally, the supervisor sets the standards. If they seem unfair to a subordinate, might he not be more motivated if he at least has some say in setting, or at least agreeing to, the standards against which he is judged? (5) Meyer, Kay, and French, op. cit. 6. Management by objectives To avoid, or to deal with, the feeling that they are being judged by unfairly high standards, employees in some organizations are being asked to set - or help set - their own performance goals. Within the past five or six years, MBO has become something of a fad and is so familiar to most managers that I will not dwell on it here. It should be noted, however, that when MBO is applied at lower organizational levels, employees do not always want to be involved in their own goal setting. As Arthur N. Turner and Paul R. Lawrence discovered, many do not want self-direction or autonomy.(6) As a result, more coercive variations of MBO are becoming increasingly common, and some critics see MBO drifting into a kind of manipulative form of management in which pseudo-participation substitutes for the real thing. Employees are consulted, but management ends up imposing its standards and its objectives.(7) Some organizations, therefore, are introducing a work-standards approach to goal setting in which the goals are openly set by management. In fact, there appears to be something of a vogue in the setting of such work standards in white-collar and service areas. (6) Industrial Jobs and the Worker (Boston, Division of Research, Harvard Business School, 1965). (7) See, for example, Harry Levinson, "Management by Whose Objectives?" HBR July-August 1970, p. 125. 7. Work-standards approach Instead of asking employees to set their own performance goals, many organizations set measured daily work standards. In short, the workstandards technique establishes work and staffing targets aimed at improving productivity. When realistically used, it can make possible an objective and accurate appraisal of the work of employees and supervisors. To be effective, the standards must be visible and fair. Hence a good deal of time is spent observing employees on the job, simplifying and improving the job where possible, and attempting to arrive at realistic output standards. It is not clear, in every case, that work standards have been integrated with an organization's performance appraisal program. However, since the work-standards program provides each employee with a more or less complete set of his job duties, it would seem only natural that supervisors will eventually relate performance appraisal and interview comments to these duties. I would expect this to happen increasingly where work standards exist. The use of work standards should make performance interviews less threatening than the use of personal, more subjective standards alone. The most serious drawback appears to be the problem of comparability. If people are evaluated on different standards, how can the ratings be brought together for comparison purposes when decisions have to be made on promotions or on salary increases? For these purposes some form of ranking is necessary. 8. Ranking methods For comparative purposes, particularly when it is necessary to compare people who work for different supervisors, individual statements, ratings, or appraisal forms are not particularly useful. Instead, it is necessary to recognize that comparisons involve an overall subjective judgment to which a host of additional facts and impressions must somehow be added. There is no single form or way to do this. Comparing people in different units for the purpose of, say, choosing a service supervisor or determining the relative size of salary increases for different supervisors, requires subjective judgment, not statistics. The best approach appears to be a ranking technique involving pooled judgment. The two most effective methods are alternation ranking and paired comparison ranking. Alternation ranking: In this method, the names of employees are listed on the left-hand side of a sheet of paper - preferably in random order. If the rankings are for salary purposes, a supervisor is asked to choose the "most valuable" employee on the list, cross his name off, and put it at the top of the column on the right-hand side of the sheet. Next, he selects the "least valuable" employee on the list, crosses his name off, and puts it at the bottom of the right-hand column. The ranker then selects the "most valuable" person from the remaining list, crosses his name off and enters it below the top name on the right-hand list, and so on. Paired-comparison ranking: This technique is probably just as accurate as alternation ranking and might be more so. But with large numbers of employees it becomes extremely time consuming and cumbersome. To illustrate the method, let us say we have five employees: Mr. Abbott, Mr. Barnes, Mr. Cox, Mr. Drew, and Mr. Eliot. We list their names on the left-hand side of the sheet. We compare Abbott with Barnes on whatever criterion we have chosen, say, present value to the organization. If we feel Abbott is more valuable than Barnes, we put a tally beside Abbott's name. We then compare Abbott with Cox, with Drew, and with Eliot. The process is repeated for each individual. The man with the most tallies is the most valuable person, at least in the eyes of the rater; the man with no tallies at all is regarded as the least valuable person. Both ranking techniques, particularly when combined with multiple rankings (i.e., when two or more people are asked to make independent rankings of the same work group and their lists are averaged), are among the best available for generating valid order-of-merit rankings for salary administration purposes. 9. Assessment centers So far, we have been talking about assessing past performance. What about the assessment of future performance or potential? In any placement decision and even more so in promotion decisions, some prediction of future performance is necessary. How can this kind of prediction be made most validly and most fairly? One widely used rule of thumb is that "what a man has done is the best predictor of what he will do in the future." But suppose you are picking a man to be a supervisor and this person has never held supervisory responsibility? Or suppose you are selecting a man for a job from among a group of candidates, none of whom has done the job or one like it? In these situations, many organizations use assessment centers to predict future performance more accurately. Typically, individuals from different departments are brought together to spend two or three days working on individual and group assignments similar to the ones they will be handling if they are promoted. The pooled judgment of observers - sometimes derived by paired comparison or alternation ranking - leads to an order-of-merit ranking for each participant. Less structured, subjective judgments are also made. There is a good deal of evidence that people chosen by assessment center methods work out better than those not chosen by these methods.(8) The center also makes it possible for people who are working for departments of low status or low visibility in an organization to become visible and, in the competitive situation of an assessment center, show how they stack up against people from more well-known departments. This has the effect of equalizing opportunity, improving morale, and enlarging the pool of possible promotion candidates. (8) See, for example, Robert C. Albrook, "Spot Executives Early," Fortune, July 1968, p. 106; and William C. Byham, "Assessment Centers for Spotting Future Managers," HBR July-August 1970, p. 150. Fitting practice to purpose In the foregoing analysis, I have tried to show that each performance appraisal technique has its own combination of strengths and weaknesses. The success of any program that makes use of these techniques will largely depend on how they are used relative to the goals of that program. For example, goal-setting and work-standards methods will be most effective for objective coaching, counseling, and motivational purposes, but some form of critical incident appraisal is better when a supervisor's personal judgment and criticism are necessary. Comparisons of individuals, especially in win-lose situations when only one person can be promoted or only a limited number can be given large salary increases, necessitate a still different approach. Each person should be rated on the same form, which must be as simple as possible, probably involving essay and graphic responses. Then order-of-merit rankings and final averaging should follow. To be more explicit, here are the appraisal goals listed at the outset of this article and the techniques best suited to them. Help or prod supervisors to observe their subordinates more closely and to do a better coaching job. The critical incident appraisal appears to be ideal for this purpose, if supervisors can be convinced they should take the time to look for, and record, significant events. Time delays, however, are a major drawback to this technique and should be kept as short as possible. Still, over the longer term, a supervisor will gain a better knowledge of his own performance standards, including his possible biases, as he reviews the incidents he has recorded. He may even decide to change or reweight his own criteria. Another technique that is useful for coaching purposes is, of course, MBO. Like the critical incident method, it focuses on actual behavior and actual results which can be discussed objectively and constructively, with little or no need for a supervisor to "play God." Motivate employees by providing feedback on how they are doing. The MBO approach, if it involves real participation, appears to be most likely to lead to an inner commitment to improved performance. However, the work-standards approach can also motivate, although in a more coercive way. If organizations staff to meet their work standards, the work force is reduced and people are compelled to work harder. The former technique is more "democratic," while the latter technique is more "autocratic." Both can be effective; both make use of specific work goals or targets, and both provide for knowledge of results. If performance appraisal information is to be communicated to subordinates, either in writing or in an interview, the two most effective techniques are the management-by-objectives approach and the critical incident method. The latter, by communicating not only factual data but also the flavor of a supervisor's own values and biases, can be effective in an area where objective work standards or quantitative goals are not available. Provide back-up data for management decisions concerning merit increases, promotions, transfers, dismissals, and so on. Most decisions involving employees require a comparison of people doing very different kinds of work. In this respect, the more specifically job-related techniques like management by objectives or work standards are not appropriate, or, if used, must be supplemented by less restricted methods. For promotion to supervisory positions, the forced-choice rating form, if carefully developed and validated, could prove best. But the difficulty and cost of developing such a form and the resistance of raters to its use render it impractical except in large organizations. Companies faced with the problem of selecting promotable men from a number of departments or divisions might consider using an assessment center. This minimizes the bias resulting from differences in departmental "visibility" and enlarges the pool of potential promotables. The best appraisal method for most other management decisions will probably involve a very simple kind of graphic form or a combined graphic and essay form. If this is supplemented by the use of field reviews, it will be measurably strengthened. Following the individual appraisals, groups of supervisors should then be asked to rank the people they have rated, using a technique like alternation ranking or paired comparison. Pooled or averaged rankings will then tend to cancel out the most extreme forms of bias and should yield fair and valid order-of-merit lists. Improve organization development by identifying people with promotion potential and pinpointing development needs. Comparison of people for promotion purposes has already been discussed. However, identification of training and development needs will probably best -and most simply- come from the essay part of the combined graphic/essay rating form recommended for the previous goal. Establish a reference and research base for personnel decisions. For this goal, the simplest form is the best form. A graphic/essay combination is adequate for most reference purposes. But order-of-merit salary rankings should be used to develop criterion groups of good and poor performers. Conclusion Formal systems for appraising performance are neither worthless nor evil, as some critics have implied: Nor are they panaceas, as many managers might wish. A formal appraisal system is, at the very least, a commendable attempt to make visible, and hence improvable, a set of essential organization activities. Personal judgments about employee performance are inescapable, and subjective values and fallible human perception are always involved. Formal appraisal systems, to the degree that they bring these perceptions and values into the open, make it possible for at least some of the inherent bias and error to be recognized and remedied. By improving the probability that good performance will be recognized and rewarded and poor performance corrected, a sound appraisal system can contribute both to organizational morale and organizational performance. Moreover, the alternative to a bad appraisal program need not be no appraisal program at all, as some critics have suggested. It can and ought to be a better appraisal program. And the first step in that direction is a thoughtful matching of practice to purpose.
|
|
ZPG
Home | Online
Store | Measurement
Resources | Workshops
| GoalTutor
|
Consulting Services | Search/Site
Map
Problem with Website | Email | Contact ZPG | Phone: 610.291.5884 | Privacy Policy |