Home > Measurement Resources > Performance Measurement in the News >

Performance Appraisal: Verisimillitude Trumps Veracity

James S. Bowman
Public Personnel Management
Page 557
Copyright 1999 Information Access Company. All rights reserved. COPYRIGHT 1999 International Personnel Management Association

Few administrative functions have attracted more attention and so successfully resisted solution than employee evaluation. Since performance appraisal is impossible, what actually happens is personnel appraisal. When such hypocrisy occurs, civil service systems predicated on merit are undermined.

This article commences with the evolution of the appraisal function, the root of ethical problems found in service ratings. Common types of evaluation (with their strengths and drawbacks), who does them, and typical rating errors are then examined. This climaxes with a discussion of the fundamental and beguiling reason for these deficiencies. Diagnosis completed, attention shifts to ways to improve appraisals, which leads to a specification of the characteristics of a system that could withstand legal, if not ethical, scrutiny. The analysis closes by sketching future, not necessarily promising, trends.

"Personnel Appraisal (pers'-n-el a-pra'-zel) n: given by someone who does not want to give it to someone who does not want to get it." -- Anonymous

One's work in modern organizations will be evaluated to assess the extent to which individual and collective needs coincide--or conflict. Since many decisions can hinge on these ratings, the process is central to human resource management. Key to employee compliance, performance improvement and system validation functions, such reviews are mechanisms to reinforce organizational values. They provide data on the effectiveness of recruitment, position management, training, and compensation (where such information is most frequently used).

Clearly, then, employee evaluation is a chief function of management. While an effective process can benefit an agency, creating, implementing, and maintaining it is no easy task. Programs serving multiple purposes may do none of them particularly well. In business, for instance, less than 20 percent accomplish their goals, and under 10 percent of organizations judge their appraisal systems to be effective.[1] There is no reason to believe that the situation is any different in government.

Personnel appraisal, in short, is one of a manager's most difficult issues precisely because it is both important and problematic. Few managerial functions have attracted more attention and so successfully resisted solution than employee evaluation.[2] Patently, personnel systems predicated on rewarding merit, are undermined when questionable appraisal practices take place. What these widely-used and intensely-disliked systems reveal is that instead of being a solution, they are often part of the problem.

Not surprisingly, paradoxes abound as: people are often less certain about "where they stand" after the appraisal than before it; the higher one rises in a department, the lower the likelihood that quality feedback will be received; and most employees perceive little connection between performance and pay.[3] Despite--or perhaps because of--the vexing, intractable nature of personnel appraisal, political pressures to "just do it" are substantial. While the general public knows appraisal problems from its own work experiences, it nevertheless makes an odd assumption: since evaluations are done successfully (somewhere) in business bureaucracies, they should especially be used in government agencies.

This article begins with the evolution, as eerie as it is, of the appraisal function. Common types of appraisals, who does them, and typical, if robust, rating errors are then examined. This climaxes with a discussion of the fundamental and beguiling reason for these problems.Diagnosis completed, attention then shifts to ways to design and improve evaluation programs. This leads to a specification of the characteristics of a system that could withstand legal scrutiny. The article closes by sketching future trends in personnel appraisal.

Evolution

The root of the paradoxical nature of service ratings--rarely do they deliver in practice what is promised in theory--is a legacy of the spoils system. Aghast at widespread looting, plunder, and corruption during that era, good government groups, armed with scientific management techniques like job analysis, sought to guarantee competence by insulating employees from political influence.

In order to assure that public employees would not be rewarded or punished for partisan reasons, the reformers worked hard to establish merit systems under which supervisory discretion was severely limited and closely policed by non-partisan civil service commissions...empowered to defend merit principles.[4]

As the merit system evolved, the emphasis was on recruiting meritorious people and protecting them from partisan entanglements. Less attention was devoted to divining ways to evaluate their work; after all, the system was designed to select competent workers in the first place.

It should not be surprising, then, that although concern for appraisal has existed for a long time (Congress mandated evaluations as early as 1842), the topic for decades "was a theoretical and administrative backwater, ignored by scholars and practitioners alike."[5] The dramatic growth of government during the Great Depression and World War II, however, culminated in considerable interest in appraisal programs so that by the 1950s many jurisdictions had adopted them.

Characteristic of the times, an underlying faith in science to control, direct, and measure human performance resulted in the continuing search for, if not the perfect evaluative scheme, at least ways to improve existing technology. Thus, many of the early systems, based on personal traits (discussed in the next section), were widely criticized for failing to differentiate between employees since virtually everyone received a "satisfactory" rating.

Aimed at correcting this, the 1978 Civil Service Reform Act sought to evaluate employees not on subjective characteristics but on objective, job-related performance standards. This effort, in turn, produced its own set of problems so that in 1993 the National Performance Review declared it to be dysfunctional to the success of governmental programs. For its part, NPR in calling for simplified, decentralized, team-based evaluation de-emphasizing the need for result-oriented appraisals; this approach, as discussed below, may be no more successful than it has been in business.

Today, service ratings remain as certainly "the most maligned area of personnel and...seem to be tolerated only because no one can think of any better, realistic alternatives."[6] Yet abandoning the function altogether may not be a solution, since human beings have always made informal or formal evaluations of others. The challenge is to decide what to appraise in a manner that meets the needs of the organization and the individual.

Common Types of Appraisal

Since there are few jobs with clear, comprehensive, objective output measures that eliminate the need for judgment, the most widely used methods are judgmental in nature. What differentiates them is the degree of subjectivity that is likely in the judgments to be made. The approaches can be readily grouped as (1) trait-, (2) behavior- and (3) result-based systems. Recognize, however, that there is considerable variety in available techniques, and not only are they frequently combined with one another, but there are also different systems that may be used for various types of employees.[7] Only the most well-known are examined here; yet even these, albeit in differing degrees, produce evaluations that are either deficient (not all pertinent factors are considered) or contaminated (irrelevant considerations are included).

Trait-Based Systems

This method requires judgments on the degree to which someone possesses certain desired personal characteristics deemed important for the job. Despite the inherent subjectivity of this format, it continues to be practiced because human beings routinely make trait judgments about others in daily life. The approach, although often inscrutable, seems intuitively sensible as a result.

There are colorful iterations of such graphic rating scales based on the characteristics chosen, their definitions (if any), and the number of categories (adjective or numeric) used. None, however, overcome serious validity and reliability questions. Thus, because it is difficult to define personality characteristics (much less the extent to which someone has them), subordinates may become suspicious, if not resentful, especially because this technique has little value for the purpose of performance improvement. Human traits, after all, are relatively stable aspects of individuals.

This is not to suggest that vivid personal traits are unimportant in job performance; people can hardly perform without them. Indeed, the use of flexible, subjective criteria seems inevitable, especially for ambiguous managerial jobs. The problem is valid measurement. When used with accurate job descriptions and trained evaluators, such ratings may become more credible. However, even when the traits measured are job-related (e.g., job knowledge, dependability), a landmark court opinion (Brito v. Zia, 1973) criticized their subjective nature because the results were not anchored in or related to actual work behavior.[8] Just as trait rating is no longer likely to be used alone, neither is the narrative essay technique; in fact, in one form or another written descriptions often supplement most appraisal formats.

Behavior-Based Systems

Unlike trait-focused methods, which emphasize who a person is, behavior-oriented procedures attempt to discern what someone actually does. The relatively tangible, objective nature of these systems makes them more legally defensible than personality scales. In point of fact, civil rights legislation of the 1960s and 1970s led to the developmentof a number of tools that concentrate on behavioral data, two of which are considered here.

The Critical Incident Technique (CIT) is used to record behaviors that are unusually superior or inferior. It can be implemented in a responsive and flexible manner; supervisors can be trained to pay more attention to incidents of exceptional behavior in some performance areas at certain times and in other areas in different periods.[9] Notably a critical incident log may be helpful in supporting other appraisal methods.

Important drawbacks, however, include its "micro-management" feature as supervisors keep a "book" on people; mistakes, rather than achievements, may be more apt to be recorded since employees are supposed to be competent. Another concern is that subordinates may engage in easily-documented activities, while hiding errors and neglecting tasks not readily observed. Then too, valuable, steady performers, not generally involved in spectacular events, may be overlooked. Halachmi also notes that the record could be incomplete or unreliable either due to the rater's knowledge or the nature of the appraisee's job--either one of which make comparisons between individuals problematic. The anecdotal nature of the method, in short, is both its strength and weakness.

The behaviorally-anchored rating system (BARS) builds on the incident method as well as the graphic rating scales discussed earlier. It defines the dimensions to be evaluated in behavioral terms and uses critical events to anchor or describe different performance levels. When introduced in the 1960s, BARS was claimed to be a breakthrough technology since raters could match observed activity on a scale instead of judging it as desired or undesired.[10] Since the scales are developed from the experience of employees, it was also thought that user acceptance was likely. Because the system is job-related, it remains relatively invulnerable to legal challenge.

Yet the method is often not practical as each job category requires its own BARS; either for economic reasons or the lack of employees in a specific job, the approach is often infeasible. Secondly, Gomez, et al.[11] argue that if personal attributes are a more natural way to think about other people, then requiring supervisors to use BARS (or for that matter any non-trait technique) is merely a sleight-of-hand that introduces psychometric errors (discussed below). Indeed, they cite research that finds both employers and employees prefer trait systems. Other studies demonstrate that employers and employees do not make much of a distinction between BARS and trait scales.[12] Not surprisingly, there is little evidence to support the superiority of this technique over other approaches.[13]

Finally, most experts in both business and public administration do not find that the potential gains in using BARS warrant the substantial investment required in time and resources. Thus, where this technique is used, it often plays a residual role limited to either a small number of selected job categories and/or to the developmental function of personnel appraisal. Overall, then, whatever else trait- and behavior-based systems may do, they are largely silent on the question of what an employee is to accomplish.

Results-Based Systems

Neither a measure of personal characteristics nor employee behaviors, outcome-oriented approaches, attempt to calibrate one's contribution to the success of the organization. Although "results" have always been of keen interest to administrators, management by objectives (MBO)[14] promises to achieve substantial organization-individual goal congruence. Introduced in the 1950s, this most common results-focused approach establishes agency objectives, followed in cascading fashion by derivative objectives for every department, all managers, and each employee. These systems require specific, realistic objectives, mutually-agreed upon goals, interim progress reviews, and comparison between actual and expected accomplishments at the end of the rating period.

Despite its rationality, as well as evidence of effectiveness,[15] MBO like other appraisal techniques has serious drawbacks:

  • While development of objectives may not be as technically demanding as BARS, the process nonetheless is quite time-consuming as an effective program takes 3-5 years to implement (accordingly, few organizations adopt the formal hierarchical process to ensure organization-department-manager-employee linkage).
  • There likely will be conflicting objectives, differing views on the appropriateness of the objectives, and disagreements about the extent to which objectives are mutually agreed upon and fulfilled.
  • Because it focuses on short-term goals, a compulsive "results-no-matter-what" mentality can produce predictable quality and ethical problems as anything that gets in the way of the objective gets shunted aside (in any public or private service organization, how a job is done often is as critical as its output).
  • Not only is establishing equally challenging objectives for all people difficult, but also expectations that they will invariably improve (an MBO-induced "treadmill") can lead to user acceptance problems.
  • The technique can stifle creativity as employees may define their job narrowly (as they "work to quota") leaving some problems undetected and unresolved.
  • Teamwork is apt to suffer if employees become preoccupied with personal objectives at the expense of collegiality (they may fulfill their goals, but not be good all-round performers).
  • Since performance outcomes do not indicate how to change, the method may not assist in the employee development function.

MBO, nonetheless, remains a popular technique to appraise managers since their roles are often ambiguous and it does provide a measure of accomplishment against predetermined objectives.

Commentary: "Man plans, God laughs." Jewish proverb

To summarize, Exhibit 1 specifies the promise, problems, and prospects of trait, behavior, and results approaches to appraisal. While the intuitive appeal of trait rating is considerable, it is highly susceptible to both contamination and deficiency errors; its future potential, accordingly, is limited to a supplemental role in the review process due to subjectivity and vulnerability to court challenge.

Exhibit 1. Promise, Problems, and Prospects of Person-centered

Appraisal Systems

Characteristic System Promise Problems Prospects

Trait- High Intuitive Contamination Supplemental

based high appeal and deficiency role

low errors

Behavior- High Job Susceptible High

based average related to deficiency technical

average errors demands

Results- High Face Deficiency Emphasizes

based average validity problems

accomplishments

average

to high

Systems based on employee behavior also hold substantial promise since they are job-related--something most judges expect. Yet they too are likely to play a modest role in the years ahead largely because of their susceptibility to deficiency errors and, in the case of BARS, high technical demands coupled with limited applicability. Results-derived approaches, like the others, have face validity, but often suffer from a host of deficiency and implementation problems. Still, they do emphasize actual accomplishments, as opposed to personalities or behaviors, and therefore may survive litigation.

While combined techniques may offer advantages, available research does not support a clear choice among methods.[16] Since each has its own strengths and weaknesses, selecting one to cure a problem likely will cause another; there is no fool-proof approach. Notice too that all three systems are backward-looking; because there is no systematic continuous improvement process, they may be self-defeating because they perpetuate the organizational status quo. Ironically, the better traditional appraisals are done, the more likely that organization will remain the same(!). Hausser and Fay wistfully argue that the search for the perfect instrument--a goal that has eluded industrial psychologists for over 50 years--is now largely regarded as futile.[17] Instead, they suggest, efforts to improve the overall appraisal process likely will provide much larger returns than developing (and redeveloping) seemingly better rating forms every time a new high official takes office.

Paradoxically, then, the technique used is decidedly not the central issue in personnel appraisal since the type of tool does not seem to make much difference.[18] Summarizing a National Research Council study, Nigro and Nigro report that

The council found no convincing evidence to support arguments that distinguishing between behaviors and traits has much effect on rating outcomes. It found that psychologically, supervisors form generalized evaluations which strongly color memory for and evaluation of actual work behaviors. It also found that there is little evidence to suggestthat rating systems based on highly job-specific dimensions produce results that are much different from those using global or general dimensions.[19]

That is, available evidence indicates that judgments about performance are not necessarily correlated with results precisely because these decisions rely on cognitive abilities, which are notoriously error-prone (see below).[20] Not surprisingly, the choice of a tool is less important than the fact the employees often have little confidence in the abilities of managers to effectively implement them.[21] The National Performance Review found, for instance, that "performance ratings are unevenly distributed by grade, gender, occupation, geographic location, ethnic group, and agency" (shoe size was not mentioned).[22]

Appraisal software programs, nonetheless, promise to enable managers to select predigested forms (or to design their own), walk them through form completion (including tips and hints, provision of preprogrammed phrases and prompts for examples), and verify their work with arithmetical, logical consistency, and legal checks. However, in a balanced review of these programs, Grote, notes that they run on algorithms with no knowledge of the organizational culture, job standards, or individual performance--problems likely to intensify in a virtual workplace.[23] Indeed, they make the process too easy; managers should devote real thought to appraisals, not merely point and click. And, the software contributes nothing to the most important part of service ratings: the manager-employee interview (discussed in a subsequent section).

Raters

Since common appraisal methods are judgmental in character, "Who makes this judgment?" Traditionally there was one answer: the subordinate's immediate supervisor. However, other knowledgeable information sources include the ratee, peers, computers, and outsiders.

Based on the belief that the employee has important insights about how the job should be done, self-appraisals can provide valuable data, particularly when the supervisor and employee engage in joint goal setting. Yet these evaluations are subject to distortions including self-congratulation or, less likely, self-incrimination. It is well established, for instance, that many people attribute good performance to their own efforts and blame poor performance on other factors. These biases can be moderated if objective standards exist and the ratee is regularly provided genuine feedback. Still, because these evaluations tend to focus on personal growth and motivation, they are best used for developmental, rather than administrative, purposes.

As work in some organizations has changed from a stable set of tasks done by one person to a more fluid ensemble of changing requirements done by groups of employees, peer or team evaluation becomes appropriate. In a high trust agency culture, where co-workers have access to relevant information, such assessments can be accurate. When these conditions do not exist, supervisors likely will be reluctant to give up control. And, subordinates will often see these techniques as disruptive competition that can easily be sabotaged by lenient ratings or converted into "popularity contests." Thus, these reviews are often most useful when done anonymously and for developmental reasons.

The objective of electronic monitoring, third, is to increase productivity, improve quality, and reduce costs; it does so by continuously collecting performance data, pinpointing problems, and providing immediate feedback. When such monitoring provides objective performance appraisals, employee satisfaction and improved morale may result. Today computer-generated statistics are the basis for evaluations of millions of office workers engaged in clerical, repetitive tasks; the virtual worksite of the future is almost certainly going to expand the collection and use of such information. When implemented without reasonable safeguards (e.g., employee access to data, rights to challenge erroneous records, rating decisions made on the basis of non-electronic as well as electronic information), thesesoftware programs can create an "electronic sweatshop" environment damaging creativity, morale, and health. If employees feel helpless, manipulated and exploited, then most techniques eventually will be circumvented.[24]

Finally, multi-rater or 360[degrees] evaluation systems--those that gather information from subordinates, peers, and citizens--by definition provide more data than other approaches. More data may produce more reliable, but not necessarily more valid, information. The administratively complex nature of these systems is compounded by a lack of convergence between the different sources. That is, managers may be confronted with a host of seemingly conflicting opinions--all of which may be accurate from their respective viewpoints. Still, systems that assure respondent anonymity and encourage participant responsibility no doubt supply some useful feedback for both improving management processes and employee development. And, there is growing acknowledgment of the value of the technique; the first scholarly reference to it was in 1993, but today the term is commonly used in the field. In short, while one's immediate supervisor is apt to play an important role in the rating process, feedback from other sources is increasingly seen as a way to obtain a more holistic understanding of performance.

Rating Errors

The use of ratings assumes that evaluators are reasonably objective and precise. Regardless of the appraisal instrument used, though, a large number of well-known errors occur in the process based on (1) cognitive limitations, (2) intentional manipulation, and (3) organization influences. When they happen--and they are difficult to prevent--not only is the rater's judgment called into question, but also the resulting evaluation may leave the ratee unable to accurately judge his/her own performance.

When confronted with large amounts of information, people generally seek ways to simplify it. Cognitive information processing theory maintains that appraisal is a complex memory task involving data acquisition, storage, retrieval, and analysis. To process these data, subjective categories are employed, which in turn produce no less than five problems. Thus, compatibility ("similar to me" or liking) error is a potent one since both compatibility and ratings are person-focused. Indeed, most employees believe their supervisor's liking of them influences evaluations.[25]

The next mental shortcut is the spillover (halo or black mark) effect--i.e., if the ratee does one thing exceptionally well (halo) or poorly (black mark), then that unfairly reflects on everything else. The recency effect takes place when a major event occurs just prior to the time of the evaluation and overshadows all other incidents. Contrast error exists when people are rated relative to other people instead of against performance standards. Finally, actor/observer bias (partially alluded to earlier) occurs when subordinates as actors often point to external factors while supervisors as observers attribute weak performance to employees.

The second general source of rating problems is that appraisals in many organizations are adroitly seen as a political--not necessarily a rational--exercise; results are intentionally manipulated, higher or lower, than the employee deserves. The goal is not measurement accuracy, but rather management discretion and organizational effectiveness.

Accordingly, leniency or friendliness error (the "Santa Clause" effect) is the consequence of a desire to: maintain good working relationships, maximize the size of a merit raise, encourage a marginal employee, show empathy for someone with personal problems, or to avoid confrontations (and appeals) with an aggressive worker.[26] Conversely, severity error (the "horns" effect) may be emphasized as a way to either send a message to a good performer that some aspect of his/her work needs improvement or to shock an average employee into higher performance. Over 70 percent of managers in one survey reported that they deliberately inflated or deflated evaluations for such reasons.[27]

Note that the inherent conflict of interest present in supervisory evaluations is a powerful political reason likely to make the leniency effect prevail over other psychometric errors. That is, if all (or most) subordinate evaluations are inflated, then the supervisor may look like an effective manager; if the appraisals are not so inflated, then his/her management abilities may be called into question.[28] However, the employer has an obligation to conduct appraisals with due care. This duty may be violated (as a result of the Santa Claus effect) when a poor performer receives satisfactory ratings and subsequently is subjected to attempts at termination.

Finally, this leads to examination of a set of organizational influences that cause at least four problems. The first is insufficient management commitment to performance appraisal . In light of the difficulties with various evaluation schemes, much skepticism, futility, and even doubts about the possibility of performance appraisal exist.[29] Investing heavily in these systems, then, does not make a lot of sense for some administrators. The daily press of business, makes it a peripheral, not central, responsibility; it is often isolated not only from getting the job done but also from organizational planning and budget strategies. There are few incentives--and sometimes genuine disincentives--to use appraisal as a management tool. Employee evaluations are done for the sake of evaluation--an irrelevant, once-a-year formality to complain about, complete, and forget in the service of administrative rules.

Such an attitude leads to the error of central tendency (if not leniency) where nearly all are rated satisfactorily--if for no other reason than higher or lower scores may require time-consuming documentation. This "error" is, in turn, reinforced by the no money effect--i.e., there frequently are insufficient funds to distribute and/or they are awarded on an across-the-board basis.

Overall, cognitive, political, and organizational limitations help explain the reasons for rater error. While some of these constraints can be addressed in training, something more fundamental lies at the root of personnel appraisal difficulties: human nature. Its pertinent aspects are revealed by risk aversion, implicit personality theory, conflicting role expectations, and personal reluctance.

Since defending one's judgment in open court is not something most relish, it is natural that supervisors reduce risk by being aware of all possible pitfalls in the appraisal process. A paradox arises, however, when playing safe through leniency may invite a legal challenge on the grounds that appraisals did not differentiate employees by performance.[30]

Second, implicit personality theory suggests that people generally judge the "whole person" based on limited data (stereotyping based on first impressions or the halo effect); ratings then tend to justify these global opinions rather than accurately gauge performance. Conflicting role expectations, third, are inherent in the appraisal process as evaluators must reconcile being a helpful coach with acting as a critical judge. In playing these roles, administrators (as noted earlier) also evaluate themselves; human nature suggests that better-than-deserved ratings will occur for one's own managerial skills may be called into question should employees receive poor evaluations.

Last, appraisal systems are complicated by the understandable distaste that people have when asked to formally evaluate others. Since there is no such thing as infallible judgment, when administrators must take responsibility for judging the worth of others, "it is dangerously close to a violation of the integrity of the person."[31] Most people, especially in light of all the other questions about the reliability and validity of personnel appraisal, are as reluctant to judge others as they are to be judged themselves. It is onerous, in other words, to "play God." Little wonder, then, that the sentiment expressed in the quotation at the outset of this article is shared by many: "appraisal is given by someone who does not want to give it to someone who does not want to get it."

To summarize, since evaluation in many jobs is not amenable to objective assessment and quantification, ratings typically incorporate non-performance factors--for all the reasons discussed above. When this occurs it leads to a violation of the most revered principle of this field of HRM: appraisals evaluate performance, not the person.[32] Verisimilitude trumps veracity.

Improving the Process

Designing an appraisal system requires not only establishing policies and procedures, but also obtaining the support of the entire workforce and its union(s). Top officials must publicly commit to the program by devoting sufficient resources to it and by modeling appropriate behavior. Managers, in turn, need to be convinced that the system is relevant and operational. Employees likewise should see it as in their interest to take it seriously. A profile (or "slice") taskforce, representing all of these groups from different parts of the department, can conduct a needs assessment by collecting agency archival and employee attitudinal data. It should then revise an existing system (or create a new one) based on the findings and test it on a trial basis. This could be done in jurisdictions that allow customization to agency needs (over half of state governments for example) or as part of a reinventing government laboratory experiment. It is possible to finesse and marginalize formal requirements entirely (see Exhibit 2); beating the system may be faster, more flexible, and just as effective as formally reforming it.

Exhibit 2. Beating the System

In one major unit of a large hospital, a charismatic department manager decided that whatever the administration of the hospital did, he was going to run his facilities department on the basis of TQM. Well in advance of the hospital's annual tedious performance appraisal drill, he gathered his troops together, reviewed the hospital's sorry form, and then told them that what it represented was the starting point for them to practice their kaizen--continuous improvement--skills. "What do we need to do, given the fact that this basic form is mandated, in order to complete it well enough to keep the personnel monkeys off our backs but also get some good out of the process for ourselves?" he asked his team. He funded a series of weekly pizza meetings for a task force of facilities employees who were charged with developing an answer to his question that everyone supported enthusiastically.

Source: Grote, D. The Complete Guide to Performance Appraisal (New York: AMACOM,1999), 351.

The design chosen involves numerous key technical questions, many of which were discussed earlier. These include selection of the most useful tool(s), as well as raters, based on system objective, practicality, and cost. Training is needed in an effort to minimize the various kinds of errors previously examined. Yet, it is generally acknowledged that mere awareness of these problems is unlikely to affect behavior; instead, raters must engage in and receive feedback from role plays, simulations, and videotaped exercises. Evaluators also need training in interpersonal skills in order to effectively conduct appraisal interviews.

Monitoring performance, the period between plan approval and formal appraisal, includes frequent positive or corrective feedback based on performance not personality. When done conscientiously throughout the year, the actual evaluation will then simply confirm what has already been discussed.[33]

Finally, the evaluation is culminated by the appraisal interview. In preparing for the meeting, the employee may complete a self assessment and managers should collect necessary information and complete, in draft form, the rating instrument. Although a collaborative problem-solving approach is effective, most managers use a one-way "tell and sell" technique where they inform subordinates how they were rated and then justify the decision.[34] No matter the approach, supervisors should use the event to support the policies and practices of the entire system and be trained in goal setting, communication skills, and positive reinforcement.

Summary and Conclusion

To distill the foregoing argument, the characteristics a personnel appraisal system should contain to satisfy both employers and employees--and to survive a court challenge--are specified below. As discussed, however, implementing this HRM function is fraught with difficulty. Readers are invited to evaluate the extent to which the following standards are met by agencies in their jurisdictions:

1. The rating instruments, which should strive for simplicity not complexity, are derived from job analysis.

2. Training is provided to all employees about the systems and to managers in its use.

3. The appraisal is grounded in accurate job descriptions and the actual ratings are based on observable performance.

4. Evaluations are completed under standardized conditions and are free of adverse impact.

5. Preliminary results are shared with the ratee.

6. Some form of upper level review, including an appeal process, exists that prevents a single manager from controlling an employee's career.

7. Performance counseling and corrective guidance services exist.

While many systems may not compare favorably to such standards, recall that the crux of the appraisal problem is not system design. Instead, since evaluation is a matter of human judgment, the conundrum is how the plan and the information it generates is used.

The perennial, melancholy search for the best technique, nonetheless, relentlessly (sometimes shamelessly) continues. As we peer into the century ahead, personnel appraisal will become either more or less complex. Should the long-standing preference for person-centered evaluations persist, then organizational downsizing and workforce changes will likely complicate appraisals. The virtual workplace--unbound by time and space--is apt to exacerbate this situation.

Downsizing has been a one-two punch. Personnel offices have shrunk, placing more responsibilities on line managers; at the same time the number of supervisors have been reduced, requiring the remaining ones to evaluate more subordinates.[35] The potential for both system design and implementation problems, as a result, has increased.

Several changes in the composition of the workforce also imply a more challenging climate for appraisals. Thus, employees are becoming increasingly diverse and evaluating people of all colors and cultures is surely more arduous than assessing a homogenous staff. Also, the fastest growing part of the working population are contingent employees--temporaries, short-term contract workers, volunteers--who, by definition, present evaluation challenges.

Exhibit 3. Evaluating Organizations, Not Individuals

"Body swayed to the music. O brightening glance, how can we know the dancer from the dance?" --William Butler Yeats

Individual appraisal is a complex issue. Even when done with great care, it can be devastating to people and destructive to organizations. While it may be true that management practices are seldom discarded merely because they are dysfunctional, it is also true that the reinventing government movement (Chapter 1) provides an opportunity to re-examine orthodox approaches to appraisal.

The premise of organization-centered evaluation is that quality services are a function of the system in which they are produced. Systems consist of people, policies, technology, supplies, and a socio-political environment within which all operate. Note that these parameters are beyond appraisee control; indeed, the employees themselves are hired, tasked, and trained by the organization. A person-only assessment, stated differently, is deficient if the goal is to comprehend all factors affecting performance. In a well-designed management system, virtually all employees will perform properly; a weak system will frustrate even the finest people.

Traditional, person-centered appraisal methods are based on a faulty, unrealistic assumption: that individual employees are responsible for outcomes derived from a complex system. Since an organization is a group of people working to achieve a common goal, the managerial role is to foster that collaboration. If the result is inadequate, then it is management's responsibility--and no one else's.

From a systems perspective, the causes of good or bad performance are spread throughout the organization and its processes. Many results in the workplace are outside the power of employees traditionally made responsible for those outcomes. When over 90 percent of performance problems are the consequence of the management system,[A] holding low level minions accountable is a way of evading responsibility; the cause of most performance problems lies not within the individual employee, but within the organization divined by its leaders.

Since employees have little authority over organizational systems, relevant appraisals should provide two kinds of feedback:

  • system performance data automatically generated from statistical process controls (i.e., evaluation is built into the work process itself), and
  • individual performance data--used primarily for developmental purposes--derived from anonymous multi-rater 360[degrees] evaluations (focusing on attributes such as teamwork, customer satisfaction, timeliness, communication skills, and attendance).

The key is to listen to customers of the process and emphasize continuous improvement. By making the system as transparent as possible, the focus is on non-threatening analyses of work processes and people's contributions to those processes. Such an approach would be organizationally valid, socially acceptable, and administratively convenient--key criteria for any appraisal method. Importantly, it would change the process from an often adversarial one to a more constructive collaborative effort.

Reflecting American individualism,[B] the field of HRM has focused on people rather than systems. It is politically unlikely, therefore, that organizational appraisals will supplant individual ratings (indeed, when performance appraisals were abolished at one well-known federal government demonstration project in California, the project was terminated partly because productivity improved). Yet a number of public agencies (National Oceanic and Atmospheric Administration, Internal Revenue Service, Social Security Administration) and private companies (Motorola, Merrill Lynch, Procter and Gamble) have modified their approach to appraisals. To better reflect a systems perspective, they have incorporated teamwork (in addition to individual achievements), citizen/customer feedback (in addition to supervisory opinions), and process improvement (in addition to results) dimensions into their evaluations.

A more complete "reinvention" would be to clearly state a performance standard, and then assume that most employees will do the job for which they were hired. Greg Boudreaux, a manager at the National Rural Electric Cooperative, continues by saying that for the small number who do not do their jobs, "...investigate why. Some will need further training or management counseling. Some may be an actual problem. But deal with those problems on a case-by-case, and not through a generic, faculty, performance appraisal system."[C]

Indeed, the approach described here is partly consistent with the most recent appraisal fad: performance management. This strategy emphasizes that managing performance (not merely doing an end-of-the-year evaluation) is key to organizational success. Thus, performance management is a continuing cycle of goal setting, coaching, development, and assessment. From a systems perspective, however, it exemplifies the "wrong-problem problem." In a triumph of hope over experience, it tries to solve the wrong problem precisely by focusing on the individual, not the organization.

[A] Deming, W.E. The New Economics (Cambridge, MA: MIT/CAES).

[B] This is an area where our myths may be more dangerous than our lies. The lone frontiersman and the outlaw gunslinger--largely products of Hollywood--were far less important in the American West than farmers raising barns together and shopkeepers settling in small towns. The myth also does not explain the wild popularity of team sports in contemporary life.

[C] "What TQM Says About Performance Appraisal ," Compensation Review and Benefits Review, May/June, 1994, 20-24.

Alternatively, should organizations begin to shift away from person-centered appraisal and toward organization- or process-centered appraisals, individual evaluations may be less complex in the years ahead--or perhaps abolished altogether (see Exhibit 3). Whether the appraisal function becomes more or less difficult in the 21st century, it is only worth doing if it is an integral part of the management system and if it helps both the organization and the individual develop to full potential.

Notes

[1] Longenecker, C O. and S. J. Goff, " Why Performance Appraisals Still Fail," Compensation and Benefits Review. November-December,1990, 36-41 and Schellhardt, T. D. "Annual agony: it's time to evaluate your work and all involved are groaning," Wall Street Journal, November 19, 1996,. A1, A5.

[2] Halachmi, A "The practice of performance appraisal ," in Rabin, J. et al. (eds.), Handbook of Public Personnel Administration (New York: Marcel Dekker, Inc, 1995), 321-355.

[3] Daley, D (1992). Performance Appraisal in the Public Sector: Techniques and Applications (Westport, CT: Quorum/Greenwood, 1992).

[4] Nigro, L G. and F.A. Nigro, The New Public Personnel Administration (Itasca, IL: F. E. Peacock Publishers, Inc., 1994), 113.

[5] Riley, D D., Public Personnel Administration (New York: Harper Collins College Publishers, 1993), 115.

[6] Cox, R W., J.J. Buck and B.N. Morgan, Public Administration in Theory and Practice (Englewood Cliffs, NJ: Prentice Hall., 1994)., 70.

[7] It is neither feasible nor desirable, therefore, to discuss all of these instruments; to do so would be to encourage the notion that the problem of performance measurement is merely one of technique

[8] Despite all of these problems, the technique has obvious intuitive appeal since traits may simply be a shorthand way of describing a person's behavior This may explain why some psychologists contend that personality rating scales are not only reasonably valid and reliable, but also that they are more acceptable to evaluators ((re: Cascio, W.F., Applied Psychology in Human Resource Management 5th ed. (Upper Saddle River, NJ: Prentice Hall, l998)).

[9] Halachmi, opcit.,p. 326.

[10] 0Ibid, p. 330.

[11] Gomez-Mejia, L R., D. B. Balkin, and R.L. Cardy, Managing Human Resources (Upper Saddle River, NJ: Prentice Hall, 1995).

[12] Wiersma, U and G. Latham, "Practicality of behavioral observation scales, behavioral expectation scales, and trait scales," Personnel Psychology vol. 39, l986, 619-628.

[13] Borman, W C., "Job behavior, performance, and effectiveness," in M. D. Dennette and L. M. Hough (eds.), Handbook of Industrial and Organizational Psychology, vol. 2. Palo Alto, CA: Consulting Psychologists Press, 1991, 271-326.

[14] Fondly known as "massive bowel obstruction," precisely because such a rational system could, in the view of critics, never work with human beings.

[15] Rogers, R and J Hunter, "Impact of Management by Objectives on Organizational Productivity," Journal of Applied Psychology, Vol. 76, 199l, 322-326.

[16] Wanguri, D M., "A Review, An Integration, and a Critique of Cross-disciplinary Research on Performance Appraisals, Evaluations, and Feedback," Journal of Business Communications, Vol. 32, 1995, No. 3, 267-293 and Milkovich, C. T. and A. K.Urgdor, Pay for Performance: Evaluating Performance Appraisal and Merit Pay (Washington, DC: National Academy Press, 1991).

[17] Hauser, D and C.H. Fay, "Managing and Assessing Employee Performance," in H. Risher and C. H. Fay (eds.), New Strategies for Public Pay (San Francisco, CA: Jossey-Bass), 185-206.

[18] Cardy, R L. and G.H. Dobbins, Performance Appraisal : Alternative Perspectives (Cincinnati, OH: South-Western Publishing Co., 1994).

[19] Nigro and Nigro, op cit., p. 135.

[20] Murphy, K R., and J.N. Cleveland, Performance Appraisal : An Organizational Perspective (Boston: Allyn and Bacon, 1991).

[21] Daley, op cit.

[22] National Performance Review, From Red Tape to Results: Creating a Government that Works Better and Costs Less (Washington, DC: US. Government Printing Office, 1993), 32

[23] Grote, D Complete Guide to Performance Appraisal (New York: AMACOM, 1996).

[24] Early examples include (a) data entry personnel who, when evaluated by the number of key strokes, pressed the space bar while making personal calls and (b) telephone operators, when expected to fulfill a quota in a given time period, would hang up on people with complex problems. The National Institute of Occupational Health and Safety estimates that two-thirds of all video display terminals are electronically monitored (Ambrose, M.L., G.S. Alder, and T.W. Noel, "Electronic Performance Monitoring: A Consideration of Rights," in Schminke, M. ed., Managerial Ethics: Moral Management of People and Processes (Mahwah, NJ: Laurence Erlbaum Associates, Inc., 1998, 61-80).

[25] Several comprehensive studies have found that racial and sex discrimination, once common in evaluations, are no longer pervasive (Pulakos, E.D., et al., "Examination of Race and Sex Effects on Performance Ratings," Journal of Applied Psychology, vol. 74,1989, 770-780 and Waldman D. A. and B.J. Avolio, Race Effects in Performance Evaluations: Controlling for Ability, Education and Experience," Journal of Applied Psychology, Vol. 76,1991, 897-911.

[26] Leniency (a.k.a. "grade inflation") in academe is "the refusal by faculty members to behave like adults, that is like people with enough integrity to disappoint other people. It is as though some professors want to believe that everybody deserves to be first. Everybody doesn't" (Carter, S. Integrity (New York, Basic Books, 1996), 79).

[27] Longenecker, C O. and D. Ludwig. "Ethical Dilemmas in Performance Appraisals Revisited," Journal of Business Ethics Vol. 9, 1990, 961-969.

[28] The saying, "When you point your finger at me, remember that your other fingers are pointing back at you," is appropriate here

[29] Nigro and Nigro, op cit., pp. 114-116.

[30] Halachmi, op cit., p. 325.

[31] McGregor, D (1957). An Uneasy Look at performance Appraisal . Harvard Business Review, Vol. 35, 1957, 90

[32] The pervasiveness of this problem accounts for the use of the term "personnel appraisal," not " performance appraisal " in this essay.

[33] In the private sector, those companies that emphasized frequent feedback outperformed those that did not in all financial and productivity measures (Campbell, R.B. and L.M. Garfinkel, "Strategies for Success in Measuring Performance," HR Magazine, June, 1996, 98-104).

[34] Wexley, K, "Appraisal Interview," in Berk, R.A. (ed.), Performance Assessment (Baltimore: Johns Hopkins University Press, 1986),. 167-185.

[35] US. Merit Systems Protection Board, Federal Supervisors and Strategic Human Resources Management (Washington, DC: The Board, 1998).

[*] adapted from Evan Berman, James S. Bowman, Montgomery VanWart, and Jon West (2000). Human Resource Management: Processes, Problems, and Paradoxes (Thousand Oaks, CA: Sage). Copyright, James S. Bowman

James S. Bowman
Askew School of Public Administration and Politics
Florida State University
620 Bellamy Building
Tallahassee, FL 32306-2032

James S. Bowman is professor of public administration at Florida State University and editor of Public Integrity, a new quarterly journal sponsored by three leading professional associations. "Human Resource Management: Paradoxes, Processes, and Problems" (Sage, 2000) is his latest co-authored work. A past National Association of Schools of Public Affairs and Administration Fellow, as well as a Kellogg Foundation Fellow, Bowman serves on all editorial boards.

top of page