Wednesday, October 3, 2012

Test Construction and Validity

Theory Basis:  Psychometrics

In the August 2012 Blog Post on Evaluating Training – What it all about? we took a look at a method to determine if the training that was completed was transferred to practical use on the job. This post is going to focus in on the concept of validity as it applies to instructor-developed tests prepared by internal talent.

Validity Defined:

Something is “valid” (has validity) when can actually support the intended point or claim; acceptable as cogent: "a valid conclusion".  

Narrowing Down the Topic

As in other posts I began by mind mapping the topic.  The map grew at a geometric rate and I came to the realization that some of the topics on the mind map could yield a map of their own.  I narrowed down the information to support the content of this blog and came up with the following:

To provide a frame of reference to the topic of this post you can follow the topics in blue text. Beginning with the realm of educational evaluations, we are going to focus on evaluating the individual, specifically the student, and the evaluation that takes place at the conclusion of the instruction; summative.  From the types of summative evaluations, we’ll concentrate on the internal, instructor developed tests that are criterion-referenced tests and how to ensure their quality through being valid.


A common misunderstanding of the term criterion is its meaning. Many, if not most, criterion-referenced tests involve a cut-score, where the examinee passes if their score exceeds the cut-score and fails if it does not (often called a mastery test). The criterion is not the cut-score; the criterion is the domain of subject matter that the test is designed to assess.

Making the Connection to ISD

For us, the subject matter is the tasks derived in the analysis phase of your chosen design approach.  It is the behavioral repertoire (the kinds of things each examinee can do).  This repertoire is found in the action verbs of the objectives developed from the task identified in the analysis process.

Making our test’s validity cogent comes from two qualities: content validity and construct validity.

Content validity is the extent the items on the test are representative of the domain or universe that they are supposed to represent (the criteria).  To impart content validity to your test you may only as questions related to the objectives.

Construct validity is the extent the test measure the traits, attribute, or mental process it should measure.  This comes from the construction of the test items.  To be valid, a test item’s action verb must be congruent (matching) with the verb in the learning objective.  Beck (1) refers to as “item-objective congruence”; he goes on to say it is “the most important item characteristic.  Graphically it could look like:

How good is good enough?

Is it necessary to test on all the behaviors in a criterion?  In general I would say “Yes”.  For those of us that follow HM-FP-01, Section 3.2, Examination Preparation, Administration, and Control the answer is; “It depends”.  Part 4, Developing Examinations provides guidance:  If your developing items for an item bank all learning objectives will have 3 exam items for each objective. For individual exams, 80% of the learning objectives should be covered.  Based on the above, a note in Step 8 indicates “All objectives should be adequately covered in the exam items.  (I guess you get to define adequately J )

Setting the cut-score is mostly a case of judgment or negotiation. Your best bet is to follow your established standard.  Some standards can be found in the training program descriptions (TPDs) others in company procedures and guides.  If you would like to investigate this topic further I suggest reading Methods for Setting Cut Scores in Criterion-Referenced Achievement Tests a comparative analysis of six recent methods with an application to tests of reading in EFL by Felianka Kaftandjieva.


Next time you prepare a test take the time to evaluate your questions for validity by ensuring questions come from the criteria and there is congruence between the objective and test item.




Beck R. A. (1982). Criterion-Referenced Measurement: The State of the Art. Baltimore, Maryland: John Hopkins.


No comments:

Post a Comment