Nelson Education

About UsContact UsOrder Information Site MapRep LocatorCareers


Individual Assessments
Group Assessments
Scoring Service
GATB Certification
How to Order
Test Consultants

Nelson Education > Assessment > Glossary of Terms

Glossary of Terms

Here are some of the more frequently used terms and vocabulary of assessment and testing:


A test battery is a collection of assessments assembled for a specific purpose, all of them standardized on the same population. It provides a wider and more detailed coverage of abilities or skills than can be achieved by a single assessment.


A criterion-referenced assessment measures what a child can do against a specified set of objectives or skills. Assessing whether a skill has been mastered will provide useful diagnostic information (see also Norm-Referenced Assessment).


An ongoing assessment which is used to highlight a particular child's strengths, needs and potential. Information gained from formative assessment can be used when discussing and devising the next steps for that child's development. A criterion-referenced assessment is often used for formative assessment (see also Summative Assessment).


Grade equivalent scores (GE) are useful primarily because of three characteristics: 1) they indicate the developmental level of the pupil's performance; 2) they may be averaged for purposes of making group comparisons; and 3) they are suitable for measuring growth.

The grade equivalent of a given raw score on any test indicates the grade level at which the typical pupil makes this raw score. The first digit represents the grade and the second digit the month within the grade in which the typical pupil makes the corresponding raw score. For example, if a pupil makes a grade equivalent of 57, this means that the raw score on the test is the same as that made by the typical or median pupil in grade five at the end of the seventh month.

Similarly, if the pupil mades a grade equivalent of 70, this means that the test performance equals that of a typical pupil just beginning grade seven.

The average yearly growth is 10 points, be definition. Just as talented pupils should be expected to gain more than 10 points in one year, it is unreasonable to expect pupils below average in ability to achieve a full year's growth in that time.

Grade equivalent scores have been criticized, not so much because of their characteristics, but because they may be misinterpreted or misused. The GE should be regarded as an estimate of where the pupil is along a developmental continuum, not of where he or she should be placed in the graded organization of the school.

Suppose, for example, that on a reading test, a grade five pupil makes a grade equivalent socre of 73, and that this score ranks at approximately the 90th percentile in the grade five norms, meaning that 90 percent of the pupils scored lower than 73 and 10 percent scored 73 or higher. This pupil should be considered as being in the upper 10 percent of grade Five. A grade equivalent of 73 does not indicate that the pupil is ready for grade seven work or that he or she should skip grade six.

A second possible misinterpretation stems from the fact that identical grade equivalents earned on different tests do not necessarily represent equally good performance. For comparing a pupil's present status in a group from test to test, the grade equivalent scores may be misleading, particularly if the pupil's performance is well above or well below average. For this type of comparison, percentile ranks should be used.

These limitations are true of grade equivalent scores on any test. It does not follow, of course, that grade equivalents should not be used at all. They are valuable indicators of pupil growth, but they should not be used to determine pupils' standings in their grade or their relative performance on different tests. The percentile norms are provided for the latter purposes.



A way of acquiring insight into children's reading strategies by studying the mistakes they make (their miscues) while reading aloud. By considering the pattern of the miscues made, the teacher may gain more insight into possible strategies for teaching.


A method of assessment whereby pupils obtain standardized scores that allow their individual performance to be compared with that of their grade- and/or age-related peers. These scores are provided in norm tables, which take grade and/or age into account. Information obtained from norm-referenced assessment is particularly useful for comparing the performance of individuals with the national average: this allows standards to be monitored on a national basis.


An assessment whose precise answers have been agreed and which can be objectively and reliably marked using a scoring key or an automated scoring system.


Alternative assessment forms which differ in content but are of the same level of difficulty and provide equivalent standardized scores. These are particularly useful for retesting, where it may be inadvisable to use an identical assessment on the second occasion because of the effects of practice and memory. Parallel forms can also help prevent copying during assessment sessions.


This type of score indicates the percentage of children in a grade or an age group who obtained scores below a particular score. For example, a pupil with a percentile rank of 70 has a score which was as good as or better than 70 percent of the normative sample for his/her grade or age group. Note that a child's percentile on an assessment should not be confused with the term "percentage," which indicates the proportion of assessment items correctly answered.



A score on an assessment which is expressed simply as a total of the marks obtained on that test. The number correct.


A test's reliability concerns the consistency with which it measures whatever it is supposed to be measuring. A reliable assessment is dependable and will yield similar results each time it is used. Perfect reliability is represented by a reliability coefficient of 1.0, but in practice this is never achieved although figures upwards of about 0.85 are commonly obtained.


A standard score scale in which the mean score for each age group on an assessment is set. In most cases it is set at 100 with a standard deviation usually set at 16. For any age group a given numerical value has the same meaning in terms of standing relative to the group. For example, an eight year old and a nine year old, each of whom has a standard age score of 105, have performed equally well in relation to the average for their respective age groups.



A way of expressing how much a normally-distributed sample of scores is spread out. Nearly all of any sample of scores are contained in the range Mean +/- 3 Standard Deviations.


The estimate of the 'error' associated with pupil's obtained score when compared with their hypothetical 'true' score. The SEM, which varies from test to test, should be given in the test manual. The band of scores in which we can be fairly certain the 'true' score lies can be calculated from this figure. For example, we can be 95 percent certain that a pupil's true score lies in the range 'obtained score +/- 2 SEM', and 99 percent certain that it lies in the range 'obtained score +/- 3 SEM'.



A standardized test will have been administered to a representative sample of a defined population in order to calculate norms. Norms give information about the performance of this sample. By using the norms as reference points, teachers can compare the performance of their pupils with the standardization sample. A test can also have a standardized administration procedure, whereby strict instructions have to be followed by the administrator.


A nine-step standard score system with mean of 5 points and standard deviation of 2 points, with all steps (except the extremes) being 0.5 standard deviation wide.


This is used for the recording of the overall achievement of a pupil in a systematic way. It occurs at the end of a scheme of work or phase of education, and a norm-referenced assessment is often used for this final summing up of performance.


A valid assessment measures what it claims to measure. Evidence may be presented in various ways - satisfactory correlations with other assessments of the same abilities or skills; or with teachers' estimates of their pupil's abilities; or with the pupils' subsequent achievements such as their results in public examinations.