Psychological evaluations occur in any setting where it is necessary to objectively evaluate human behavior. However, evaluations are often categorized into the following domains described here.
Diagnostic Evaluations focus on assessing the normal and abnormal nature of a person's overall functioning. The specific nature of these evaluations vary depending on the issues to be addressed, but they involve at least the following components: a clinical interview, a mental status examination, and one or more psychological tests. These evaluations can be brief with the psychological testing limited to one or two short questionnaire which focus on specific concerns, or they can be long, comprehensive evaluations consisting of lengthy objective and projective measures of personality. Sometimes measures found in other evaluations may be included for more breadth, such as intellectual assessment instruments or brief neuropsychological screening measures.
Forensic Evaluations are often very limited in scope, with the specific nature of the evaluation varying on the clinical and legal issues at hand. As with diagnostic evaluations, these include a clinical interview, but may or may not involve a mental status or psychological testing. When psychological testing is included, psychologists often take measures to evaluate the accuracy of the testing in order to account for possible deception. Because of their limited scope, these evaluations are often short and quick. However, custody evaluations can often prove to be the exception. Family courts and/or private attorneys often require detailed information regarding almost everyone involved in a custody dispute to come to equitable arrangements. These evaluations may involve testing step-parents and distant third-parties that may only be minimally involved (e.g., a new companion of a parent or grandparents that have involvement with the children, etc.). Thus, these evaluations can be lengthy and very detailed, involving interviews, testing, and periods of family observation.
Neuropsychological Evaluations are often the most focused, but lengthiest of evaluations. Designed to assess the neuropsychological functioning of an individual, these evaluations consist of batteries of tests that can require up to 8 hours to complete. Comprehensive in depth and breadth, they focus on the process of neurological functioning and should not be confused with neurological tests, often performed with MRI's and/or CAT scans, which focus on structure. Neuropsychological evaluations are performed by trained neuropsychologists are require extensive training for appropriate administration and interpretation. The two most commonly used batteries are the Halstead-Reitan Neuropsychological Battery and the Luria-Nebraska Neuropsychological Battery. There are cases where a complete battery is not necessary and brief screenings suffice; however, in such cases the screening is part of a more comprehensive evaluation focusing on another issue.
Educational Evaluations are often the most limited in scope, focusing only on the intellectual functioning of individuals, with limited forays into areas that may impact the educational requirements of the person. These evaluations are generally performed in schools and are often done to determine the services a child may need, the type of class in which he or she should be placed, and/or to determine the type of guidance the child may need in the future.
Vocational Evaluations focus on helping individuals make career choices whether as adolescents and young adults trying to determine a possible career path or adults considering a change. Such testing may occur in college counseling centers or in employment settings.
Components of an Evaluation
Clinical Interviews are the primary method of gathering psychological information and no assessment is complete without one. Interview vary from completely unstructured approaches in which the clinician follows the story told by the individual to structured interviews in which specific questions are asked in a precise order. However, most interviews lie somewhere in-between these extremes. The core of a clinical interview is history gathering, focusing on the development of the person and on the development of the presenting problems. The depth to which this history is explored depends on the nature and context of the evaluation.
Mental Status Examination is often conducted as part of the clinical interview or may not even be directly addressed if the clinician is able to assess the mental status of a patient from observation alone. A mental status is a means of assessing the person's current thought processes, emotions, and interpersonal qualities. An individual's mental state can impact the rest of an evaluation and provides a clinician with a gauge to qualitatively assess and interpret data from other areas of an evaluation. The mental status can also provide clues to areas that may need to be addressed in follow-up sessions or outside referrals.
Objective Personality Tests are paper-and-pencil self-report inventories that consist of true-and-false or multiple choice questions. They come in a variety of forms, from lengthy measures of global functioning, such as the Minnesota Multiphasic Personality Inventory (MMPI-2) and the Personality Assessment Inventory (PAI) to short measures focusing on specific concerns, such as the Beck Depression Inventory (BDI-II). Generally no more than one global measure is included in a clinical battery and this measure often forms the core of such a battery. Furthermore, global measures are available in a number of different forms that can be used with different age groups.
These measures often have strong psychometric properties and are interpreted by comparing them to data gathered from a population sample that is considered the norm. The degree to which this comparative population is truly the norm has implications for accurate interpretation and this is one of the reasons that measures such as these are continually revised. For example the original MMPI was normed on a 1950's population that was no longer reflective of the current United States census, resulting in a re-Norman when the MMPI-2 was developed.
Projective Personality Tests are a diverse set of tools. The commonly used approaches include inkblots (i.e., the Rorschach) in which the individual must describe what is seen in a given inkblot, story-telling tests such as the Thematic Apperception Test (TAT) in which stories are told regarding a series of pictures, word association tests, and drawing tests such as the Draw a Person or the Kinetic Family Drawing. These measures are often used as supplements to objective tests, with the Rorschach one of the most frequently administered measures.
These tests are controversial in their psychometric properties, and most clinicians define them as clinical tools rather than tests. Interpretations are often based on clinical judgement with only minimal objectivity. A notable exception is the Rorschach, for which admirable attempts have been made toward objectifying this test. However, projective measures are commonly employed and, in the hands of a skilled clinician, found to yield clinically relevant ideograph data that could not be assessed using other methods.
Aptitude Tests are measures specifically designed to assess an individual's cognitive and intellectual functioning. These tests can be divided in two sub-categories: intelligence tests and achievement tests. The former measure a person's intellectual functioning in terms of their ability to to learn and provide information in the form of an IQ, while the latter measure what a person has learned and provide information in terms of grade equivalents. As with other psychological measures, these tests can be lengthy and comprehensive or short screening measures. Commonly used intelligence tests include the Wechsler Adult Intelligence Scale - III (WAIS-III) and the Wechsler Intelligence Scale for Children - III (WISC-III), while major aptitude tests include the Wechsler Individual Achievement Test (WIAT) and the Wide Range Achievement Test (WRAT).
The historical antecedents of modern day testing, these tests are considered the hallmarks of psychometric strength. Yet, controversies abound regarding their use: How does one define intelligence? Is IQ an accurate measure? Are intelligence tests fair to minorities?
Specialty Measures include any number of tests designed to address specific questions. These types of tests may compose a battery unto themselves (e.g. the special neuropsychological tests that make the Halstead-Reitan) or may be specialized instruments that supplement other measures (e.g. measures designed to assess for deception in forensic evaluations or tests designed to answer a specific legal question)
Psychological Assessment is a science based on objectively measuring characteristics of human behavior. In order to do so, psychological measures must meet certain criteria to be considered objective measures. A complete discussion of psychometric theory is beyond the scope of this brief exploration, but the following characteristics are important to consider:
Norms are used as a reference against which psychological test data is interpreted. They consist of the test performance of a standardization sample that is reflective of the general population under consideration. Without normative data, the information obtain regarding a person is meaningless. Norm provide a means of assessing a person's relative standing in comparison to others. There are various types of norms that serve specific purposes, with the two most common types presented here:
Development norms are used to assess how far along the normal development path a person has progressed. They include comparing IQ, grade equivalence.
Within-Group norms are used to evaluate a person's performance in terms of a similar comparison group. For example, comparing a child's performance to others his or her age, or comparing a schizophrenic patient to other schizophrenics.
Reliability refers to the consistency of scores obtained by a person when re-examined. The degree to which a test is reliable defines the accuracy with which it assesses the person. A test which is reliable does not necessarily measure what it is supposed (this is validity to be addressed next). For example, a person who consistently throws darts on the border of a dartboard is reliable in her performance, however she is not valid in the sense that she is not doing what is supposed to be done (hitting in the scoring region of the board). There are different methods for assessing reliability, each of which is useful under different circumstances:
Test-Retest Reliability is determined through the administration of an identical test over different occasions and shows the extent to which scores on a test can be generalized over different administrations. Although generally useful, this type of reliability has limitations. Some behaviors fluctuate extensively and it is necessary to take this into consideration when using test-retest reliability. More seriously, are issues related to the impact practice may have on a test. This type of reliability is useful for measures of stable personality traits, but not for measures of aptitude, where practice severely impacts performance on future administrations.
Alternate Form Reliability combines test-retest reliability with the administration of two different versions of a test. In using this method of reliability analysis, it is imperative that the tests truly be parallel versions.
Split-Half Reliability uses statistical procedures to determine reliability from the single administration of one form of a test. This is the most commonly used approach for determining reliability with aptitude tests and the most effective. It is also appropriate in cases where the trait assessed is likely to fluctuate extensively.
Validity addresses what a test actually measures and how well it does that, and tells the examiner what can be interpreted from test scores. Tests are validated in regards to a particular use - one cannot say that a particular test has "high" or "low validity" in general terms (however, in common parlance it is common to refer to an established test as having high validity because it is commonly understood what the test measures). Basically, the validity of any test is determined by comparing it to another test or some observable fact (i.e., validity is always based on external relationships). The types of validity are as follows:
Content Validity refers to the systematic determination of whether the content of a test measures the traits that it is designed to measure. This type of validity is built into the test when it is constructed through the selection of appropriate items. Related to content validity is face validity, which measures the degree to which a test superficially appears to measure the trait at hand. Although it is considered desirable for a test to have face validity, this may not always be the case. For example, on measures geared toward the assessment of malingering and deception, low face validity may aid in more effective detection.
Criterion Validity refers to the degree to which a test predicts the person's performance on future, specified activities. In such cases, the performance on the test is compared with performance on the predicted task.
Construct Validity refers to the degree to which the test measures the underlying theoretical construct. Such validation is often the core of theoretically derived tests such as the MCMI-III and the PAI in personality assessment, or the Woodcock-Johnson Revised Educational Battery in aptitude assessment.