Skip to Content

Quality & Determinism

Two properties underpin trust in a measurement system: that it does not make silent errors, and that it produces the same answer for the same inputs every time. Amal is designed around both. The quality-checks registry catches silent errors before they affect a decision. The append-only record model keeps history faithful. And the repeatable-by-design property means any past decision can be reconstructed and explained.

The quality-checks registry

Amal maintains a registry of 32 named system checks. Each check is a named, documented rule that must hold throughout the platform, covering areas from data sufficiency to language safety to over-testing protection. The registry is a versioned configuration table: every check has an identifier, a severity, a human-readable description in Arabic, and a reference to the CI command that enforces it in the build.

The registry does not contain the enforcement logic. Each check’s logic lives in the module that owns the rule: the skill-status engine enforces the data-sufficiency rules, the bundle catalog enforces anchor-weight integrity, the language-safety layer enforces the parent-report filter, and so on. The registry is a single place to discover what checks exist, what their severity is, and when they fired.

When an engine module detects a violation at runtime, it calls a central routing function with the check’s identifier. The routing function looks up the check’s severity in the registry and applies a fixed routing matrix:

SeverityWhat happens
CriticalThe failure is logged as an alert that is surfaced to the teacher, and a notification is sent to the engineering team
HighThe failure is logged as an alert for the engineering team; it is not surfaced to the teacher
MediumThe failure is logged as an alert
LowThe failure is logged to a trace log only; no alert is written

The routing is determined entirely by the registry. A call-site cannot override it. If a check’s severity changes, the routing changes platform-wide without any code edit in the calling module.

Runtime alerts land in an append-only alert log. An administrator can query the log by check identifier, severity, and date range. This is an admin and engineering view; teachers do not see the raw alert log.

Selected checks and what they protect

Data sufficiency. When there is not enough evidence to make a confident decision about a student’s skill status, the platform records this explicitly as “not enough data yet” rather than forcing a judgment. A check enforces that this sentinel state is propagated to every consumer: a recommendation that would be based on insufficient evidence is blocked, not suppressed.

Acute regression guard. When a student’s reading performance drops sharply, the platform raises this for the teacher’s review. A check enforces that this review trigger is an alert and never an automatic plan change. No support plan is altered automatically because of a single result, even a very negative one.

Bulk activation safety. When a teacher activates a support plan for a group of students at once, the platform checks that students whose data is incomplete are excluded from the batch and that students with active acute-regression alerts are also excluded. Both exclusions are required before a bulk activation can proceed.

Anchor weight integrity. Each support plan is built around one main focus skill and a small number of linked skills. A check enforces that the main skill’s share of instructional time stays larger than any linked skill’s share, so the plan’s focus does not drift.

Parent-report safety. Any text that reaches a parent in a report passes through the language-safety filter. A check enforces that the filter ran and that the output contains no internal identifiers or numeric scores.

Over-assessment guard. Students who receive too many repeated measurement probes within a short window can experience assessment fatigue that affects results. A check tracks the count of probes per student over a rolling window and raises an alert when the threshold is crossed. The probe is always recorded; the guard is advisory, never a block.

Rule-version pinning. Every decision the platform makes is recorded alongside the version of the rules that drove it. This allows past decisions to be reviewed exactly as they were made, even after the rules change. A check enforces that every decision row carries a rule version and that the version matches an active or historically active version.

Append-only records

The tables that hold measurement results, decision events, and session activity are append-only. A record of a student’s skill-status decision, a recorded assessment event, a support plan assignment, a context flag, or an audit log entry is never overwritten or deleted in the normal course of platform operation.

If a correction is needed, a new record is added that supersedes the old one, and the old one remains. The history is always recoverable. A sequence of decisions for a student can be replayed from the raw event log regardless of what the current rule set says.

This design is enforced in the build: a lint rule checks that no code in the modules that own these tables calls an update or delete operation on them. The guard runs automatically on every change to the codebase.

Repeatable by design

Given the same inputs, the platform produces the same output. This holds for the reading-ability estimation, the skill-status decision, the profile assignment, and the support-plan recommendation. None of these steps involve randomness, sampling, or calls to a language model. Every transformation is a deterministic function of the data and the versioned rule set.

This property has a practical consequence: a decision made six months ago can be explained today. The rule version that was active at the time is on record. The input data is on record. Running the same inputs through the same rule version produces the same output. There is no “the model changed” or “we no longer know why that result was produced.”

The warm-up exercises share this property. A given grade level, template, and seed always produce the same exercise sequence. This means a warm-up session can be replicated exactly for debugging without needing to re-run it live.

Language rules are also versioned in the same append-only fashion. The exact filter that was applied to a piece of text on any date can be reconstructed from the rule history.

What Wave 1 ships and what comes later

In Wave 1, the platform check notifier is a log-only implementation. When a critical check fires, the failure is logged and recorded in the alert table, but no network notification is sent to an external channel such as a messaging system. The integration point for that notification is built and ready; the actual connection is post-launch.

Two checks in the registry (the RTI fidelity gate and the fidelity-tracker cross-check) are registered and their routing is ready, but they are not yet connected to the modules that will trigger them. Those connections land when the fidelity-tracking work package ships.

Where to go next