Progress Monitoring

When a teacher assigns a support plan to a child, a question opens immediately: is this plan actually helping? Answering it well requires more than a single check at the end of several weeks. It requires repeated, structured evidence, gathered under consistent conditions, evaluated honestly, and it requires a set of safeguards that stop the platform from drawing conclusions too soon or from the wrong kind of data.

This page describes how Amal tracks whether support is working, the five cross-cutting safety rules that govern every monitoring decision, and the per-plan criteria that define what success looks like for each type of plan.

What progress monitoring covers

Progress monitoring handles evidence that is not about reading fluency. Comprehension, vocabulary, morphology, syntax, structured teacher observation, worksheets, and short in-session checks all belong here. Fluency and accuracy over time are handled separately through a dedicated fluency-check process (see Fluency Checks).

The separation matters because the two types of evidence need different collection methods and different analytical rules. Mixing them would blur what the evidence means.

How evidence is collected

A teacher records a non-fluency check by submitting it after a session. The platform accepts five categories of evidence:

Quick checks catch vocabulary or morphology understanding in a few items at the end of an activity.
Rubric observations give a structured score based on what the teacher watched during teaching.
Teacher observations record a disposition from a closed list, not open notes, because free text about a child is never stored.
Worksheet evidence carries the result of a paper or digital worksheet the teacher scored.
Platform-generated trends draw on patterns from the child’s practice-session responses.

Each category has a weight limit. No single source of evidence can tip the decision past a configured ceiling, no matter how many records it contributes. The weight cap is enforced before any calculation runs.

The five safety rules

Every monitoring decision passes through five rules before any verdict is produced. The rules run in order, and any rule that fires stops further processing. The result when a rule fires is always the same: the platform returns a “not enough to decide yet” signal rather than a conclusion.

The first rule: thin or tied evidence. If the number of comparable evidence points falls below the minimum for the plan, or if two interpretations are equally well-supported, the verdict is deferred. A quiet note appears for the teacher explaining that more data is needed. There is no red flag, no alert, and no implication of a problem. The platform is simply being honest.

The second rule: a sharp drop is a review prompt, not an automatic verdict. If a child’s performance drops by 20 percent or more across two sessions that are comparable to each other, the platform raises an immediate-review signal for the teacher. This is a prompt to look, not a conclusion about what to do. The plan is not exited automatically. The child’s support continues unchanged until the teacher reviews the situation and decides. An automatic failure or an automatic plan change based on one measurement drop is not permitted, anywhere in the platform.

The third rule: benchmark comparisons only at review windows. The three review windows in the school year are the beginning, the middle, and the end. Comparing a child’s progress against a benchmark at some other point in the year is rejected. Mid-year drift in how the evidence looks is not a valid basis for a benchmark-referenced conclusion.

The fourth rule: only comparable sessions are compared. Two evidence records are comparable when they share the same tool, the same conditions, and the same session context. A quick check under normal conditions cannot be directly compared against one taken in a session where the child had significant support. Non-comparable records are kept in the record but excluded from the calculation. The evidence window the platform reasons over contains only points that meet the comparability test.

The fifth rule: a single data point decides nothing. One evidence record, however clear it looks, is not enough for a verdict. A minimum window of comparable points must exist before the engine applies any rule logic. This protects against drawing a conclusion from what might be a good or bad day.

Per-plan monitoring recipes and success criteria

On top of the five safety rules, the platform applies a monitoring recipe and a success criterion that are specific to each type of plan.

The recipe describes which evidence tools produce comparable records for this particular plan, the weight cap per tool, and the re-evaluation cadence. If the submitted evidence comes from a tool that is not in the recipe, that evidence is not used in the verdict for this plan.

The success criterion describes what “this plan worked” looks like in measurable terms. It is specific to the plan type: what counts as a positive response for one kind of support may differ from what counts for another. The platform is fail-closed on a missing recipe or criterion. If the catalog is incomplete, the engine raises an error rather than producing a default verdict.

There are twelve monitoring recipes and nineteen success criteria, one per plan type.

The five verdict values

After the safety rules and the recipe logic run, the monitoring engine produces one of five closed results:

Verdict	What it means
Positive	Evidence consistently supports the conclusion that the plan is working
Partial	Some evidence is positive but the picture is mixed; continued monitoring is appropriate
No response	Evidence shows the plan is not producing a response, even though conditions were fair
Not enough to decide yet	The safety rules deferred a conclusion; more comparable evidence is needed
Immediate review needed	A sharp drop across two comparable sessions requires the teacher’s attention

None of these verdicts change the plan automatically. The teacher sees the verdict and decides what to do next.

What does not happen

Several things that might seem natural are deliberately absent.

The monitoring engine never changes a child’s support plan or support level on its own. It never closes a plan or moves a child to more or less intensive support. Every consequential change goes through the teacher.

The engine never produces a verdict when the safety rules fire. A “not enough data yet” signal is not a failure of the system. It is the correct response when the evidence does not yet support a conclusion.

The engine never compares a child’s result against a benchmark unless the session falls at a recognized review window. Outside those windows, benchmark comparisons are simply not valid.

The monitoring process covers only evidence that has been formally recorded. Informal impressions, retrospective notes, and any input that cannot be tied to a specific session and a specific measurement tool are not part of the calculation.

Versioning and replay

Every monitoring verdict is stored with the exact version of every rule it used: the safety rule set, the per-plan recipe, and the success criterion. A future change to any of those rules does not retroactively change past verdicts. The record of what was decided, on what evidence, under what rules, stays permanent and unchanged. This is what makes historical decisions reviewable: given the same evidence and the same rule version, the platform would produce the same result.

How this connects to the rest of the platform

A sharp-drop signal blocks a child from being included in a group cluster for activation until the teacher has reviewed and resolved the situation. This is intentional: a child whose progress needs immediate attention should not be swept into a bulk group action.

The monitoring verdict feeds the broader support-tier decision. A persistent “no response” verdict, combined with confirmation that the plan was actually delivered, contributes to a teacher’s case for adjusting the level of support. But the monitoring engine itself does not trigger that case. The fidelity check (whether the plan was delivered) and the tier-review process (whether the support level should change) are separate steps. See Support Tiers and RTI Decisions and Evidence and Review for those layers.