Construct validity refers to the degree to which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on which those operationalizations were based. Like external validity, construct validity is related to generalizing. But, where external validity involves generalizing from your study context to other people, places or times, construct validity involves generalizing from your program or measures to the concept of your program or measures. You might think of construct validity as a "labeling" issue (Ramsey, 2012). When you implement a program that you call a "Head Start" program, is your label an accurate one? When you measure what you term "self esteem" is that what you were really measuring?
Discussion
I would like to tell two major stories here. The first is the more straightforward one. I'll discuss several ways of thinking about the idea of construct validity, several metaphors that might provide you with a foundation in the richness of this idea. Then, I'll discuss the major construct validity threats, the kinds of arguments your critics are likely to raise when you make a claim that your program or measure is valid. In most research methods texts, construct validity is presented in the section on measurement (Valli & Buese, 2007). And, it is typically presented as one of many different types of validity (e.g., face validity, predictive validity, and concurrent validity) that you might want to be sure your measures have. I don't see it that way at all. I see construct validity as the overarching quality with all of the other measurement validity labels falling beneath it. And, I don't see construct validity as limited only to measurement. As I've already implied, I think it is as much a part of the independent variable -- the program or treatment -- as it is the dependent variable. So, I'll try to make some sense of the various measurement validity types and try to move you to think instead of the validity of any operationalization as falling within the general category of construct validity, with a variety of subcategories and subtypes (Ramsey, 2012).
The second story I want to tell is more historical in nature. During World War II, the U.S. government involved hundreds (and perhaps thousands) of psychologists and psychology graduate students in the development of a wide array of measures that were relevant to the war effort. They needed personality screening tests for prospective fighter pilots, personnel measures that would enable sensible assignment of people to job skills, psychophysical measures to test reaction times, and so on. After the war, these psychologists needed to find gainful employment outside of the military context, and it's not surprising that many of them moved into testing and measurement in a civilian context (Pittler & White, 1999).
The nomological network provided a theoretical basis for the idea of construct validity, but it didn't provide practicing researchers with a way to actually establish whether their measures had construct ...