The Uniform Guidelines on Employee Selection Procedures actually outline which types of test validation strategies are acceptable. They are: Criterion –related Validity – The degree to which job performance can be linked to performance on a selection instrument (in other words, how well the selection tool predicts acceptable performance). Since this is proven empirically, stats will be necessary. Content Validity – The degree to which the specific content/items on a selection tool can be linked to specific aspects of the job.
Construct Validity – Demonstrating that a selection procedure measures the amounts of identifiable qualities candidates have that have already been judged to be critical to acceptable performance.
Transported Validity and/or Validity Generalization – The degree to which an assessment used to screen a similar employee at a different organization can be found to correlate to your own position. Though not currently recognized by the Guidelines, the approach has been defended successfully in court. This will rely upon mathematical/statistical data analysis.
If you run into any organization using construct validity in the real world, please contact Renegade Psychology because TheAppliedPsychologist would love to bear witness to it. Since not everyone agrees on what construct validity is, such an event would be momentous. Most organizations rely upon the first two types of validity.
Criterion-related validity is generally viewed as the strongest validity evidence by the courts. To validate a selection instrument using this approach, Organizations must determine that high performers on their assessment tool will be high performers on the job. One way to accomplish this is to administer the assessment to future employees and track their performance ratings over time (predictive validity). Another is by administering the test to current employees; in theory, the highest performers on the job should have the highest scores on the selection tool (concurrent validity). It doesn’t even matter what the assessment is supposed to measure: as long as it predicts performance, it is defensible.
The procedure-performance relationship is found using statistical computations leading to a number called the correlation coefficient. This number denotes the suspected relationships between an item or procedure and performance. A correlation of 0 means there is no relationship between the item and employee performance in your sample. A coefficient of 1 means there is total and complete correlation between the item and job performance. A negative number means there is a negative correlation: in other words, the lower performers do better on the item/procedure than the high performers!
The numbers required proving a relationship between the test scores and job performance are much lower than most industry outsiders would expect: the courts have often accepted positive correlations coefficients of .2 and/or .3 as high enough to conclude a test is valid – so long as the finding is statistically significant (not due to chance). These numbers indicates that people who perform well on an item are slightly more likely to perform better on the job.
Does that sound like a tiny improvement to you? It shouldn’t. Predicting and evaluating job performance is an inexact science at best. There are too many variables that can affect performance, not to mention performance ratings. It is impossible to separate these extraneous effects from actual human beings to get better correlations. [This is why Renegade Psychology always cautions practitioners to make sure clients understand that consultants are not miracle workers: a test will improve your odds that an employee will work out. However, it is also important to remind clients that over a period of years, even a slight improvement in each selected applicant can result in huge savings, a better work climate, and larger profits over time.] Though this might seem obvious, people are often shocked when they see how “low” the correlations for their test items are. Really? In a world where half of the medical patients who receive back surgery do not improve? You really expected an employment selection procedure to be more likely to predict future success than a medical procedure?
It takes 64 subjects to reach 80% confidence in the outcome of a criterion validity study. Thus, the more subject you can submit to the procedure, the higher your confidence in whatever correlation you find. To be safe, you should strive for at least 200 candidates. Performance and selection outcomes must also be reliable (consistent): at least .6 (60%) for the performance ratings, and at least .7 (70%) for the selection items/tests/procedure. There are many other mathematical details behind a criterion-related validity study, but this overview should help non-practitioners recognize what they should look for in a criterion-related validity study.
The big glaring hurdle to a successful criterion-related validity study in the real world is the inexact science of performance evaluation. Does a company even have performance reviews on file? If so, did they accurately gauge the performance of employees? Were they taken seriously? At most companies, the answer to these questions is no. Many managers don’t want to give harsh reviews, or jeopardize bonuses near Christmas, or bother with the paperwork. Even more often, performance evaluations are generic rather than job-specific, rendering them almost useless. There are many strategies professionals use to mitigate these conditions. The most frequently utilized one is they develop their own performance review system for the client, train the supervisors, and have them re-rate all their employees using the newly-developed criterion. As you can probably surmise, this can be incredibly time-consuming – especially in environments where time spent doing anything but work is either money lost, a potential safety liability, or both. If these elements of a job prevent a criterion-related approach, you should probably settle for a different one.
Content validity is not quite as powerful as criterion-related validity, but is a common and legally-defensible approach for the many situations in which criterion-related studies are not feasible or cost-effective. The biggest limitation is that content validity can only (legally) be used to create job-specific procedures. However, content validation of a procedure absolutely must be preceded by a full-blown job analysis (JA). Once the JA is complete, your trained subject-matter experts (SMEs) will link the items and questions on your selection procedure directly to observed duties and skills used on the job. They will also be asked to rate them in order of importance, frequency of use, and level required of each to perform the job competently. Plenty of sample job rating scales and surveys may be found in books and on the interwebs. If you had only 7-10 experts participate in the initial job analysis, you should get some more to complete your job analysis survey for test validation purposes.
For testing purposes, it is important to measure only the KSAs that are important and critical to the job, with an emphasis on the critical ones. It is also crucial to determine the level of each skill needed by selected candidates on Day 1, rather than the level of each skill they should eventually reach.
Again, remember to only use content validity to link specific test items to SPECIFIC job duties. If this cannot be done, use criterion-related validity instead. Better yet, use the other assessment tools at your disposal that are staring you in the face. Content-validated tests must either closely resemble the job or must closely resemble output or products generated from the job.
Transported validity is a newer technique, but less powerful than the first two. Basically, it takes existing tests used to select employees for job similar to the positions your selection process intends to fill, combines criterion-related validity studies that were done on the existing tests, and mathematically generalizes those results to your organization’s relevant jobs. Sounds easy, right?
Trouble is, the Uniform Guidelines indicate that validity evidence must be ‘local and specific’ if there is local and specific adverse impact caused by your selection procedures. So even though transported validity can be useful, and can bolster any evidence your organization can provide that is specific and local, it doesn’t provide a strong legal defense on its own if adverse impact is found. It is usually best to ‘transport’ your results from your own prior studies to new locations, instead of relying on outside studies.
Transporting validity can only be done if 1) job analyses reveal the important work KSAs and duties to be the same across locations/organizations; 2) a fairness study was conducted in the original location (or will be conducted in the new location); 3) the context of each job is taken into account; and of course 4) the test is valid to begin with.
[TheAppliedPsychologist once worked with a company that used standardized verbal and math tests to select customer service employees. The validity data was supplied from a vendor, who obtained their data from clerical employees. No local validation study was performed, no local fairness study was performed, and it was clear that the compared jobs were very different. There was no evidence that the empirical portion of the transported validity study – the part where one is supposed to combine statistical analyses from different validity studies across locations, and artifacts were corrected – had ever been carried out. Basically, they had grabbed a couple of tests off the shelf, looked at their own KSAs, said, “Looks good!” and used the tests for years. I shouldn’t have to tell you that clerical positions aren’t the same as customer service jobs. Don’t do that.]
Renegade Psychology recommends transportability studies as a strategy of last resort. They are best only when transporting the results from an original study that took place within your organization for the same job or a similar job. If the validity data from the original job was gathered in a radically different environment or in a different organization, transported validity may only be slightly better than nothing.
SUMMARY In order to ensure your fancy selection tool lowers your bad hire rate, and to avoid lawsuits, it’s a good idea to conduct a proper validity study every few years. The above guidelines should help you to identify the features of a legit validation study.