Standards & Benchmarks
Every measurement Amal produces is compared against a set of reference levels to produce a growth status. These reference levels are called benchmarks. They are country-specific, skill-specific, and grade-specific, and they determine whether a child’s score places them in the “meets,” “approaching,” “below,” “severe,” or “not assessed” category for each skill.
This page explains how the comparison works, what the five statuses mean, and how the platform handles uncertainty, provisional values, and benchmark updates over time.
The role of benchmarks
After a child completes an assessment and the platform produces an ability estimate for a skill, the measurement is not complete. A number by itself does not tell a teacher what to do next. What matters is whether that number is where it should be for a child of this grade, in this country, at this point in the year.
Benchmarks answer that question. They are the reference levels that convert an ability estimate into a growth status that a teacher can interpret: this child is meeting expectations, approaching them, below them, or at a level that suggests urgent attention.
The benchmark values are supplied by the project’s educational specialist partners. They are not invented by the platform; they are expert judgments about what reading achievement levels indicate for children in Arabic-speaking school systems.
The five growth statuses
| Status | What it means |
|---|---|
| Meets | The child’s score is at or above the target level for this skill, grade, and country |
| Approaching | The child’s score is above the lower threshold but has not yet reached the target level |
| Below | The child’s score is below the lower threshold |
| Severe | The child scored at zero, or was unable to read the first word (a specific rule configured per skill) |
| Not assessed | The score is missing, or no numeric benchmark has been configured for this skill and country yet |
A note on “Severe”
The severe status applies when a child’s score hits a specific floor condition, typically a zero score or an inability to produce the first word of a reading passage. Not every skill has a severe threshold configured; this depends on what the partner has specified for each skill and country. When no severe rule is configured for a skill, the lowest reachable status is “Below."
"Not assessed” is never zero
When a skill has not been measured in a session, or when the partner has not yet supplied numeric benchmarks for that skill-country-grade combination, the platform reports “Not assessed.” This status is excluded from any aggregation, rollup, or summary count. It is never treated as a low score, a zero, or evidence of difficulty.
A child who has not yet been assessed on a skill is simply not yet assessed. The platform does not fill in an assumption about their level.
How the benchmark is resolved
For any given measurement, the platform needs to find the right benchmark for that specific combination of country, skill, assessment type, and grade band. It walks through three steps:
-
Exact match: The platform first looks for a benchmark configured for this exact country, skill, assessment type, and grade band. If one exists, it is used.
-
Country default: If no exact match is found, the platform looks for a benchmark configured for the same country but not specific to this skill. Country-level defaults provide a fallback when a skill-specific benchmark has not yet been configured.
-
Global default: If no country-level default is found, the platform uses a shared global default that applies across all countries. This ensures that a benchmark is always available, so a measurement can always be interpreted rather than left unclassified.
When the resolution falls back to step 2 or step 3, this is recorded in the audit trail. The benchmark record used for the session is permanently stored alongside the score.
Real benchmarks for Jordan and Palestine
Partner-supplied benchmark values for Jordan and Palestine are configured in the platform at setup. These values come directly from the partner workbook. They are never hand-edited in the platform’s documentation or code; if the partner updates the workbook, new values are extracted from it and a new benchmark configuration is created through the administrative interface.
Global defaults, which apply when no country-specific value is available for a given skill, are clearly marked as provisional until the partner supplies final values.
Benchmark versions are pinned per session
Benchmarks can be updated over time as partners refine their standards or as new data becomes available. The platform handles this carefully to ensure that updating benchmarks does not retroactively change the interpretation of assessments that have already been completed.
At the start of every assessment session, the platform records which benchmark version is active for each skill relevant to the session. All scoring and status derivation within that session uses those pinned versions. If an administrator updates a benchmark profile in the middle of the school year, those changes apply to new sessions going forward but do not touch any session that has already started or completed.
Every scored result permanently records the benchmark profile version it was compared against. This means old results can always be examined against the same standard that was in effect when they were produced.
No overall number
The platform deliberately does not produce a single overall number or percentage representing a child’s Arabic literacy. Growth statuses are always per-skill or per-area.
When teachers view a child’s results, they see a structured map of statuses across the seven reading areas and their constituent skills. When school leaders view school-level data, they see distributions of statuses across skills and classrooms. These views show where different children and groups stand on specific aspects of reading, which is what instructional decisions require.
A single combined number would hide the pattern differences that matter most for deciding what kind of support to offer. Keeping the statuses disaggregated is not a limitation of the platform; it is a deliberate measurement principle.
How benchmark updates work in practice
When the partner supplies updated cut scores, a school administrator or platform administrator creates a new benchmark profile through the administrative interface. The new profile can be activated immediately or held in draft until a planned date. Activation is atomic: in a single operation, the old active profile is deactivated and the new one becomes active. No partial states are possible.
The complete history of benchmark profile activations, deactivations, and changes is recorded in a permanent log with the administrator’s identity and the timestamp of each action.
No code changes are required to update benchmarks. The partner-supplied values live in the platform’s configuration, not in the software itself.
What teachers see
Teachers never see the internal identifiers the platform uses to label benchmark sources. The labels that identify where benchmark values came from are stripped from all teacher and school-leader responses before they leave the server. Teachers see growth status labels in Arabic only: Meets, Approaching, Below, Severe, Not Assessed. The behind-the-scenes benchmark reference is an administrative matter, not a teacher-facing one.