Most frequent assessment related questions and answers

  1. The tests that were used may be based on normative data/group characteristics that do not match. The makeup of the normative group (i.e., the group that the test was standardized on), significantly affect how the test functions. For example, if Test 1 includes people with disabilities in their normative sample (with the idea that this strategy represents the full population), and Test 2 includes typically developing (neurotypical) individuals and excludes people with the target disorder from the normative sample, Test 2 will be more sensitive to the disorder, whereas Test 1 will be more likely to find the child with the disorder to be a member of the typically developing normative group.
  2. The tests that were used may not serve the same purpose. For example, one test might be a diagnostic test that was designed with the purpose to identify/diagnose a disorder whereas the other test might have been designed with the purpose to identify strengths/weaknesses or rate the severity of the disorder.
  3. The tests do not share a similar DESIGN or measure the same skill. If we go back to our previously discussed case study, the reason for the discrepancy in test scores could have been because Test 1 might have been designed to evaluate the severity of a disorder or identify strengths or weaknesses and Test 2 may have been designed to evaluate the presence of a specific language impairment (SLI). Test 1’s normative sample may have included people with disabilities meaning the test is less sensitive to the disorder and more likely to identify a child with a disorder as a member of the typically developing normative group. Test 2’s normative sample consisted of only typically developing (neurotypical) individuals and excluded people with the target disorder from the normative sample, therefore making the test more sensitive to the disorder.

The way the law is worded and the critical part/ wording of IDEA says that …. “assessments and other evaluation materials used to assess a child…. are used for purposes for which the assessments or measures are valid and reliable.”

What does this mean? We often hear specialists state that test XYZ is “a reliable and valid test, therefore it was used in this assessment.” But, this is where the first misconception occurs that we would like to address. Just because a test is reliable and valid, does not mean it is necessarily the right test to use for your student. Ask yourself, what is the purpose of the test and why are you testing? Tests are reliable and valid only relative to the purpose. A test can be valid for one purpose and completely invalid for another purpose. What IDEA actually explains is the purpose of the assessment. That means that the test has to be validated for the purpose of the assessment!

When most of the assessments are done with the purpose of identification of a disorder (e.g., initial or eligibility review), the tests we use for the assessment MUST be validate for the purpose of identifying the disorder the child is suspected of. That means that the tests we use must be diagnostic in nature.

How do we know if a test is valid for the purpose of identification of a disorder? How is this measured/ displayed in test manuals? What potential interpretation do we want to draw from the test result? Since we are trying to figure out whether the student has a disability or not, this is a yes/no question as to whether they have a disorder or not. This is not a question that asks for strengths/weaknesses or asks about the severity rating of the disorder. The statistical tool that provides the evidence that the test score interpretation is valid for the purpose of identification of a disorder is called “Discriminant Analysis” and is the primary tool that provides the evidence that the test score interpretation is valid.  This is the type of information that we need to look for in test manuals.

There will be a distribution of scores for typically developing children and a separate distribution of scores for the clinical group. The curves might overlap a lot or maybe just a little, but at some point along these distributions, there is a point that maximally discriminates between the two groups and that is what we call the cut score. So above that cut score, people are classified by the test as neurotypical and below that cut score, individuals are classified by the test as having an impairment.

So, this is what we need, we need a test that statistically differentiates between the two groups.

Cut scores – It is very important to look at the test specific cut-off score that tells us how the two groups are differentiated. Not all tests have the same cut-off score.

Specificity/sensitivity – Sensitivity is calculated by dividing the number of kids who are identified as impaired / divided by the true number of impaired kids. Acceptable sensitivity/specificity should be anywhere above .8.

While some tests are built to detect severity, others are built to diagnose or differentiate between groups. If scores are higher on one of two tests, it could be because one test’s purpose is to evaluate the severity of the disorder whereas the other test’s purpose is diagnostic. Additionally, the normative sample the test was based around will impact the test’s sensitivity and specificity – so be sure to double check the sample makes sense for the type of test that it is.

Please refer to each test manual to learn about specific test normative samples.  We conduct nationwide standardization projects. Each normative sample usually consists of 1000+ examinees (typically developing), stratified to match the most recent U.S. Census data on gender, race/ethnicity, and region.

Standardized tests can be inaccurate when the test is not used for the purpose it was designed for. When a test is not designed for the purpose of identifying a disorder, we cannot expect the test to provide accurate results.

Take a look at the manual for the discriminant analysis that is displayed as sensitivity/sensitivity (*which should always be over .8).

Since identification of a disorder is a yes/no question, there has to be a line that is drawn to differentiate between typical performance and the performance that is impacted by disability. Cut scores represent the numerical boundary between what is considered to be neurotypical/typical and what is impacted by the disability. True diagnostic tests should use cut scores for identification purposes and report sensitivity and specificity as a measure of test’s accuracy. Standard scores and percentile ranks are a measure of severity or the rate of a person’s performance. Identification of a disorder is not a continuum that rates a person’s performance – the disorder is either present or it isn’t.

Standardized tests are typically used for the purpose of identifying a disorder. These tools are meant to help us answer the yes/no question as to whether or not a child has a disability. When we identify a disorder, we need a good and accurate diagnostic standardized test. When the disorder has been identified, the next step should be a strength based assessment in order to help promote an enabling environment for the student, focus on self-advocacy, self-awareness, problem-solving, etc.

The purpose of a diagnostic evaluation is to:

  1. compare student performance to a group of neurotypical students in the same age-group;
  2. evaluate how the student functions in a neurotypical academic and social setting;
  3. determine eligibility;
  4. develop a profile of strengths and weaknesses; and
  5. determine or rule out a diagnosis.

The purpose of a strength-based evaluation is to:

  1. promote an enabling environment;
  2. focus on changing the environment, NOT the student;
  3. focus on self-esteem, autistic identity and autonomy;
  4. move the burden of change away from the student and foster acceptance and accommodation so that the student can integrate/participate as much as they wish; and
  5. focus on self-advocacy, self-awareness, problem-solving.

Frequently asked questions about the Video Assessment Tools

The 12-month membership includes unlimited use of the following tests:

-All Impact Rating Scales (social communication, language and articulation)

-Articulation and Phonology Video Assessment Tool (VAT)

-Preschool Impact Rating Scale – available in May, 2024

-Language Video Assessment Tool

Yes, please contact us to request a quote.

Please refer to each test manual to learn about specific test normative samples.  We conduct nationwide standardization projects. Each normative sample usually consists of 1000+ examinees (typically developing), stratified to match the most recent U.S. Census data on gender, race/ethnicity, and region.

Our rating scales can be easily completed online. They include videos of questions as well as online rating forms that are scored automatically online. Our articulation test can be administered from a computer/laptop or tablet. The test is composed of short pre-recorded video segments, which contain 45-55 target words. Individuals are asked to label or name specific items in the videos. The clinician listens carefully to the production of each word and records any distortion, substitution, omission, or lisp of the targeted sounds. The clinician also makes note of any phonological process, such as stopping, fronting, initial consonant deletion, or gliding. The assessment yields a raw score, standard score, percentile rank, interpretation value, and test-age equivalent. The clinician can complete the protocol online or print the PDF to have a hardcopy, and then transfer the results to the “Raw Score” page on the assessment website. The raw scores can then be converted automatically to standard scores, which will also reveal a percentile rank.

All standardization project procedures were implemented in compliance with the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, and NCME], 2014). Additionally, all standardization project procedures were reviewed and approved by IntegReview IRB (Advarra), an accredited and certified independent institutional review board, which is organized and operates in compliance with the US federal regulations (including, but not limited to 21 CFR Parts 50 and 56, and 45 CFR Part 46), various guidelines as applicable (both domestic and international, including but not limited to OHRP, FDA, EPA, ICH GCP as specific to IRB review, Canadian Food and Drug Regulations, the Tri-Council Policy Statement 2, and CIOMS), and the ethical principles underlying the involvement of human subjects in research (including The Belmont Report, Nuremberg Code, Declaration of Helsinki).

Our tests are developed by the Lavi Institute under the guidance of Adriana Lavi, PhD, CCC-SLP (author of the Clinical Assessment of Pragmatics (CAPs) test, the IMPACT rating scales and many more.

If you have more questions or would like to connect with a representative please Contact Us

Scroll to Top