How Status Is Decided

When a child completes an assessment, the raw results do not flow directly to the teacher. They pass through a rules-based interpretation pipeline that translates them into a small set of meaningful statuses, one for each skill and one for each area of reading. This page explains what that pipeline does, in plain language, and describes some important properties of how it handles uncertainty.

Why interpretation is needed

A child’s raw score on a set of items tells you something, but not enough by itself. It tells you how many items they got right. It does not tell you whether that reflects a consistent pattern, an off-day, a specific gap in one part of a skill, or a broader difficulty across several skills. Two children with the same raw score can have very different instructional needs.

The interpretation pipeline exists to translate evidence into something a teacher can act on. It considers not just how many answers were correct, but what kind of errors appeared, whether the pattern was consistent, how the skill relates to other skills in the same area, and how much comparable evidence was available to support a conclusion.

The five-stage pipeline

Stage 1: Classifying each response

Every individual response is first classified by what it reveals about the child’s reading behavior. This is not just “right” or “wrong.” The classification looks at patterns within the response: were errors consistent across items testing the same feature, or scattered? Was the child’s performance strong and stable, or variable across items of similar difficulty? Was there a response that looks like a guess rather than evidence of skill?

This first classification happens at the level of individual items before any pattern-level reasoning.

Stage 2: Encoding the classification

The classification from Stage 1 is encoded into one of 12 internal interpretation codes. These codes are the system’s internal language for describing what a set of responses reveals. They are not shown to teachers; they are the input to the next stage.

Stage 3: Checking for consistency

Stage 3 looks across all the items answered for a given skill and asks whether the pattern holds. This is the confirmatory stage.

If the pattern across items is consistent, the system proceeds. If the pattern is contradictory (for example, the child performed well on some items but very poorly on others of similar difficulty, in a way that suggests the evidence is mixed), the system may withhold a commitment and instead signal that more data is needed before a status can be determined.

If the number of comparable items answered is too small to support a reliable conclusion, the system will not guess. It returns a “not enough data yet” signal and waits for more evidence.

Stage 4: Synthesizing a skill status

When the Stage 3 check confirms that the evidence is consistent and sufficient, Stage 4 synthesizes it into one of nine skill statuses:

Status	What it means
Adequate	The child is on track for this skill
Monitoring	The skill is worth watching but is not yet a confirmed concern
Isolated weakness	A weakness in this skill that does not reflect a broader pattern
Monitoring at gateway	A gateway skill that is being watched
Weakness at gateway	A confirmed weakness in a gateway skill
Domain-related weakness	A weakness that fits a broader pattern within the area
Broad weakness	Part of a pattern that spans multiple areas
Contextual weakness	A weakness where contextual factors are relevant to interpretation
Contextual review needed	The pattern calls for teacher review before a status can be confirmed

Each status also carries a confidence level (high, medium, or low) and a decision-strength indicator. A low-confidence status is flagged for closer teacher attention; it means the evidence base supports the conclusion but not with full certainty.

Stage 5: Combining skills into an area status and tiles

Stage 5 aggregates the individual skill statuses within each area into an area-level status. The aggregation uses a structured set of rules, not an average. A strong performance on most skills within an area does not cancel out a critical weakness in a gateway skill; the gateway weakness carries forward.

The area status is then mapped to one of six tile statuses that appear on the teacher dashboard: Meets, Approaching, Below, Severe, Not Assessed, and Contextual. These tiles are the most visible layer of the interpretation, and they summarize the detailed picture beneath them.

”Not enough data yet”

The most important outcome to understand is the one that halts the pipeline entirely: “not enough data yet.”

When the system cannot reach a conclusion with confidence, whether because there were too few items, because the pattern was inconsistent, or because two equally weighted evidence sources pointed in opposite directions with no clear tiebreaker, it returns this signal. It does not guess. It does not produce a status that looks confident but rests on shaky evidence.

This signal stops everything downstream for that child and that skill. No profile is assigned for the child. No intervention is recommended. The teacher sees a clear indication that more assessment data is needed before the system can say anything reliable.

This is not a failure of the platform. It is the platform being honest about what the evidence does and does not support. In reading assessment, a false conclusion can lead to the wrong kind of support, which wastes time and can sometimes be counterproductive. Holding back until the evidence is solid is the more responsible choice.

The “not enough data yet” signal clears automatically when more evidence arrives in a later assessment session. There is no manual override at this stage; the pipeline re-runs on the new evidence and either reaches a conclusion or remains in the “not enough data yet” state.

Context flags

Teachers can attach structured context flags to a child’s record to note conditions that might affect how assessment results should be interpreted. Examples include a note that the child was working through a difficult period at home during a particular assessment window, or that the child was absent during a key session.

These flags are visible to other educators who review the child’s record. They do not modify the score, the ability estimate, or the status that the pipeline produces. They are there to inform the teacher’s own judgment when reviewing the platform’s conclusions, not to change those conclusions automatically.

Behavioral observations

During an assessment, the system can record behavioral observations: unusual distraction, apparent tiredness, technical disruptions, or other administration conditions that the teacher or the system flags. These are internal records. They are never shown to the child or to parents. They do not change any score or status. They are available to educators as context when reviewing results.

Reading comprehension and cross-area consistency

Reading comprehension is an area that depends on several others. A child cannot comprehend text they cannot decode. A child whose fluency is very low will struggle to hold meaning across a passage even if they can decode individual words.

Because of this dependency, the pipeline applies an additional consistency check for reading comprehension: if the decoding area or the listening area is still in a “not enough data yet” state, the comprehension area status also stays withheld. There is no point in declaring a comprehension conclusion when the inputs that comprehension depends on are still unclear.

Rules are versioned and frozen

Every set of interpretation rules the pipeline uses has a version number. When a new version of the rules is published, it takes effect for new computations going forward. Old decisions are never retroactively changed.

Every status record permanently carries the version of the rules that produced it. This means any past decision can be re-examined: an administrator can replay the original inputs through the original rule version and verify that the output matches. This makes the entire interpretation pipeline auditable and transparent over time.

No AI in the interpretation pipeline

The interpretation pipeline is entirely rule-based. No language model or AI system is involved in classifying responses, checking patterns, synthesizing skill statuses, or aggregating area statuses. The rules are written by specialists in Arabic reading development, reviewed, versioned, and frozen before they take effect. They produce the same output for the same input, every time, regardless of when or where the computation runs.