Practice
Computing Applications

How to Measure the Relationship Between Training and Job Performance

Software companies could better predict the effect of training on their technical professionals' on-the-job performance and design training courses for future employees.
Posted
  1. Introduction
  2. Comprehensive Model
  3. Results and Implications
  4. Composite Index
  5. Lessons Learned
  6. References
  7. Authors
  8. Figures
  9. Tables

A strategic business challenge facing all software companies is how to train their employees to keep pace with the software industry’s ever-changing knowledge and development requirements. Organizations increasingly recognize that formal training is critical not only to the success of their software professionals but to their competitive position in the marketplace. One result is increasing pressure on training departments to deliver high-quality training and education. At least one study of software professionals found that training had become a top priority for their companies [6].

While the training of technical employees is not a new challenge, measuring that training for effectiveness and efficiency remains a daunting task. Today, the training function must focus on sustainable competitive advantage by strategically aligning itself with overall corporate business goals.

These goals often cut across departments. Senior managers involved in training question whether the training function derives its inputs from suppliers (recruitment) or from consumers (production) in improving its end deliverables. If they cannot discern a difference in the performance of participants within three to six months of the conclusion of their training, the reasons they didn’t must be explored and the issue escalated up through the ranks of corporate management. Failing to link training and development to some kind of improvement, training managers risk their own futures in their companies, as well as their companies’ business performance in the marketplace [3]. “Companies toss away millions of dollars and don’t get any real return on investment. Without linking your training and development to some kind of improvement within the department and company, you’re just spinning your wheels” [3].

In order to align training with business needs, managers in these companies must understand the relationships among three constituent functions: recruitment (supply side); training (enabling side); and production (demand side). Measuring these relationships serves several objectives: identify major parameters influencing the training and education process; optimize the training and education function; and help establish improvement initiatives affecting the training and education function. Here, we present a model developed at Infosys Technologies Ltd., a major software industry service provider based in Bangalore, India, whose deployment yields increased the consistency and predictability of training effectiveness. Infosys employs approximately 10,000 professionals at its software development facilities in India.

In 2001, we conducted a study involving 1,596 Infosys trainees in the company’s Education and Research Department (E&R). At that time, E&R ran a rigorous 14-week foundation-level technical education program for all new hires involving a generic set of courses, a stream-specific set of courses, and a project simulation. We retrieved qualitative and quantitative data on the trainees from various sources within Infosys, including the human resources department and practice units, as well as E&R.

Back to Top

Comprehensive Model

The hallmark of an effective training program is the ability to produce graduates who later excel on the job. Figure 1 outlines the chronological sequence of events of interest we considered in the Infosys study. We devised the Training Effectiveness Relationship Measurement (TERM) model in Figure 2 for the study. The principal linkage it examines is between training performance and on-the-job performance. However, training performance is influenced by recruiting-related variables, along with other contextual variables concerning how the training is provided. The difficulty directly measuring training results reminds us of Scottish mathematician and electrical engineer Lord Kelvin, who said: “When you can measure what you are speaking about, and express it in numbers, you know something about it, but when you cannot measure it, your knowledge is of a meager and unsatisfactory kind.” Any study is thus only as good as the metrics employed by the people conducting it. We thus sought to identify the variables that affect training performance, as well as the variables that might be considered consequences of effective training.

On-the-job performance. We sought to identify the drivers of job performance within Infosys. We assessed job performance in two ways. First, we looked at the appraisals reported by supervisors or project managers of on-the-job performance for employees who underwent the foundation-level program conducted by E&R. We also examined the compensation band, an indicator of employee performance, labeling it “appraisal grade.”

Grade point average. To evaluate training performance at the end of the training curriculum, we recommend examining two types of grade point average (GPA) calculations. At Infosys, we computed a generic GPA (GGPA) to assess participants’ performance on certain generic courses (such as programming fundamentals, database management, and systems analysis and design) and a stream-specific GPA (SGPA) for stream-specific courses on such topics as Internet, mainframe, and open systems. Our standard practice was to look at only the cumulative GPA (CGPA) as a snapshot of overall performance. We based our decision to examine generic and stream-specific performance on a number of studies (such as [4]) in related fields, distinguishing between general computer self-efficacy and task-specific efficacy. General computer self-efficacy is defined in the literature as “an individual’s judgment of efficacy across multiple computer application domains” [4], whereas task-specific computer self-efficacy is defined as “perceptions of ability to perform specific computer-related tasks in the domain of general computing” [4].


The hallmark of an effective training program is the ability to produce graduates who later excel on the job.


Participant-specific variables. The Infosys study turned up factors specific to individual participants, including age, relevant work experience, and formal education.

Instructor quality. The training literature generally acknowledges that one of the most important variables influencing training effectiveness is the instructor’s ability to impart information to students, or the quality of instruction. Based on participants’ evaluations of instructors, we computed a score for each of the study’s 1,596 participants reflecting the quality of instruction each of them received during the training period.

Back to Top

Results and Implications

The study’s results were expressed as correlations, or simple metrics examining the relationship between two variables. A value close to zero indicates the absence of any relationship, while positive numbers signal a positive relationship and vice versa.

Table 1 reports the correlations between the two recruiting variables and the various GPAs earned during training. We assessed performance in recruitment primarily through technical and analytical dimensions. We assessed performance in training using CGPA, GGPA, and SGPA scores. The results suggest that technical skills (assessed during recruitment) are positively related to training performance, more so with stream-specific performance than generic performance. A surprising finding was that analytical performance in the recruitment stage is not related to training performance.

A principal study objective was assessing the relationship between the training performance of employees in Infosys E&R and their eventual performance on the job. We assessed performance in training using CGPA, GGPA, and SGPA and on-the-job performance using appraisal grade and appraisal score. The table lists the correlations between training and on-the-job performance variables.

The table clearly indicates that on-the-job performance, assessed through both metrics-appraisal grade and appraisal score, is positively associated with training performance. The link between stream-specific performance and on-the-job performance is stronger than the link between generic performance and on-the-job performance. This stronger link might result because projects always need more skill-specific capabilities than the generic capabilities required over the longer term.

One factor clearly influencing training effectiveness is the number of trainees in a particular class or section. Traditionally, Infosys E&R considered only batch size, or the total number of new trainees. Training literature suggests the size of a class or section is the factor most likely to matter in measuring effectiveness; even if batch enrollment is high, E&R had enough human resources to staff small multiple section sizes by splitting the batch. Individual participants could thus get the kind of attention and treatment that would be available only to a small batch. Therefore, we identified section size as the appropriate variable.

We examined all E&R batches for 2000, looking at the relationship between stream-specific GPA and section size and observing that for the same education-delivery mechanism, the stream-specific GPA fluctuates within a reasonable band until section size reaches 65–75 trainees. After that point, the stream GPA showed a sharp decline, suggesting that, in the context of similar programs, class size in excess of 65–75 trainees may not yield optimal on-the-job performance.

We computed instructor ability on a pilot basis to see if it had any bearing on training effectiveness. Table 2 lists the results from two of the largest batches, including from January 2000 (193 trainees) and June 2000 (385 trainees). Instructor ability thus has a positive correlation with respect to stream-specific courses. That is, the instructor’s ability is related to the GPAs the participants earned in stream-specific courses. The surprising result—that there is no relationship between instructor ability and generic GPA—can also be observed from the zero correlation between these variables.

This result should not suggest that instructor ability does not matter in generic courses. Instructor ability matters more for stream courses, a conclusion supported by the results of our pilot batches. It might be that generic courses need instructors with good class-delivery skills, whereas the stream courses need instructors capable of providing individual attention through office hours, lab sessions, and other options.

Our correlation analysis found moderately positive effects for both age and experience; the effect of experience was marginally greater than for age. In other words, if trainees had relevant work experience or were older, their job performance was slightly better than the other candidates. Table 3 lists the correlations we obtained for these variables.

A notable finding is the strong correlation—0.47 out of 1.0—between generic GPA and stream GPA, the greatest of all the correlations we observed in the study. Implied is a strong relationship between performance in generic courses and performance in stream courses. That is, generic courses serve as a foundation for good performance in stream courses. It is not that generic courses are not as important as stream courses, just that they reflect a different emphasis. That emphasis is learnability, or the trainees’ ability to learn how to use new technologies over a given amount of time. The training literature suggests that a key objective of training programs is to help participants develop mental models into which they can place learned concepts [2]. Therefore, by definition, their effects are observed over a longer period of time than those we considered in the Infosys study.

The results we’ve discussed here support the TERM model described earlier. We found that the relationships posited by the TERM model were present in the Infosys training program.

Back to Top

Composite Index

We recognize that training departments are likely to have difficulty finding their way through all the metrics we have suggested. We thus propose a single comprehensive metric of training effectiveness—the Composite Index of Training Effectiveness (CITE). Why do high-tech training managers need such a composite index? A single measure of training effectiveness would help them quickly assess a particular training program. A single number would make it relatively easy to track and monitor performance over batches, streams, time, and other measures and report it to various stakeholders, as well as to higher management. But training managers must also realize that this comprehensive index reflects the scope of the factors considered relevant to training effectiveness. Moreover, the index makes it possible to compute the effect or increase or decrease in any of the input parameters on the composite index being computed.

CITE is a function of four main training factors: participant performance in E&R; participant performance on the job; instructor evaluations; and course evaluations. We based the weights that reflect the importance of each of them on insights we derived from analyses and expert opinion. The analyses described earlier helped us determine the relative importance of some of the criteria, especially of stream GPA in relation to generic GPA established early on in the Infosys study. Thus, any weighting scheme we might have chosen would have had to reflect this relative importance. The analyses also considered the importance of appraisal scores, as well as appraisal grades.

We also weighed instructor ability. The expert opinion of instructors teaching courses in E&R provided valuable guidance. Figure 3 shows the weighting scheme we devised based on a consensus of analyses and expert opinion. These weights might be different for other organizations.

CITE is further subdivided into two measures: internal to E&R, or InCITE, and external to E&R, or ExCITE. The other reason for this subclassification is that one measure can be computed for each batch, as soon as it is completed, while the other requires input from the human resources department and can be computed on an annual basis. One measure provides immediate feedback on effectiveness, while the other, though more comprehensive, can be computed only later.

We calculated a range from 0 to 150 for the CITE measure in Infosys E&R. To demonstrate this new metric, we computed CITE for the January 2000 and June 2000 pilots. From January to June 2000, the InCITE score went from 113.77 to 119.59, while the ExCITE score went from 113.72 to 117.54 for the same batch. A useful benchmark for comparing the CITE number for any given period is a threshold value that reflects the level below which performance is considered sub-par. We calculated this value by considering a GPA of 3.0 as the lowest acceptable value and 4.0 as the lowest acceptable value for instructor and course evaluations, respectively. Using these values, we computed a threshold value for InCITE of 100.

Back to Top

Lessons Learned

We derived clear directions for extracting the factors that proved important for measuring relationships in training effectiveness. Meanwhile, the TERM model and associated indices produced three main insights:

  • Levers of training and on-the-job performance. Key levers, or drivers, reflect training performance and on-the-job performance;
  • Measurable reference point. This overall index provides a way to understand the training effectiveness of an entire batch of trainees; and
  • Integrated measurement. This index captures the overall training effectiveness by taking into account both internal and external factors.

How did our study’s results help Infosys management? First, by adopting our model, the company was able to assess the overall quality of its training program. Next, based on the TERM model, specific project groups in the company were better able to structure their project-specific and product-specific training programs to yield the maximum return. Moreover, by measuring the correlation metrics results, company management was better able to prioritize the improvement initiatives in the training function based on the effect they could be expected to have on training effectiveness.

One such initiative later implemented was called “bring practice into the classroom.” Since stream-specific training and on-the-job performance are strongly correlated, this initiative focused on enriching the stream-specific training program, including typical real-life problems/issues, code samples demonstrating key concepts, and situations the project team members would be likely to face while executing a project.

Our results also helped Infosys management prioritize the courseware revision of the foundation program courses. Moreover, it helped establish stronger inter-functional links. One such link now employed in both E&R and production is a comprehensive exam during the training function. The questions used in this exam were derived from practitioner sources with a focus on resolving problems routinely faced in the field. These questions were then presented to participants who, after completing the training program, join the various related projects. A practitioner-based question bank allows participants to experience project environments at an early stage of the training program, rather than seeing them for the first time on the job. Meanwhile, we have found the TERM model is generic enough to be implemented at other software companies to help their managers evaluate their own training effectiveness.

Back to Top

Back to Top

Back to Top

Figures

F1 Figure 1. Selection-training-performance pathway.

F2 Figure 2. TERM model.

F3 Figure 3. Weights for components of training effectiveness.

Back to Top

Tables

T1 Table 1. Correlation of recruiting, job performance, and training.

T2 Table 2. Correlation of instruction quality and training performance.

T3 Table 3. Correlation of job performance and age and experience.

Back to top

    1. Agarwal, R., Sambamurthy, V., and Stair, R. The evolving relationship between general and specific computer self-efficacy: An empirical assessment. Info. Syst. Res. 11, 4 (Dec. 2000), 418–430.

    2. Davis, S. and Bostrom, R. Training end users: An experimental investigation of the roles of the computer interface and training methods. MIS Quarterly 17, 1 (Mar. 1993), 61–85.

    3. Greengard, S. Web-based training yields maximum returns. Workforce 78, 2 (Feb. 1999), 95–97.

    4. Marakas, G., Yi, M., and Johnson, R. The multilevel and multifaceted character of computer self-efficacy: Toward clarification of the relationship between general and specific computer self-efficacy. Info. Syst. Res. 9, 2 (June 1998), 126–163.

    5. Piccoli, G., Ahmad, R., and Ives, B. Web-based virtual learning environments: A research framework and a preliminary assessment of effectiveness in basic IT skills training. MIS Quarterly 25, 4 (Dec. 2001), 401–426.

    6. Shah, J. Software training is the next big hurdle: SC managers wonder how much expensive schooling is necessary. Electronic Buyers News (Mar. 26, 2001), 3.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More