Content and Items

Every task that reaches a child comes from a bank of pre-authored, pre-validated items. The platform never writes or assembles a question on the fly. This page describes how items get into the bank, what checks they go through before they can be used, and what happens when a batch of new items is submitted.

Why items are authored offline

The choice to author items offline and import them, rather than generating them during a session, comes from the nature of Arabic reading measurement. A question that has a grammatical agreement error, uses vocabulary above the intended level, or contains a language pattern the child has not yet been taught can produce a misleading result. A question that looks fine in translation but contains a subtle Arabic-script issue can confuse a child at a moment when the goal is to observe their reading, not their ability to interpret a broken prompt.

The offline authoring process means every item that enters the bank has been read by a human who knows Arabic literacy, checked by automated tools built for Arabic text, and cleared before it touches a child. The bank holds only items that passed this process.

The import pipeline

When a batch of authored items is submitted to the platform, it runs through a five-stage validation pipeline. Items that fail any stage are rejected with a specific reason; they do not enter the bank in a degraded or provisional form.

Stage 1: Schema and completeness. Each item must specify the sub-skill it targets, the task mode it belongs to, the scaffold level it is written for, and the grade it is appropriate for. Comprehension items additionally require a passage identifier and metadata about where in the passage the answer can be found and what type of reasoning it requires. Items missing required fields are rejected at this stage.

Stage 2: Item-type structural rules. A multiple-choice item must have exactly one correct answer in the expected field. An item that requires selecting multiple correct answers must list those answers in the expected field and leave the single-answer field empty. Distractor options must be of a type that is compatible with the item’s question format. These rules prevent structural defects from entering the bank.

Stage 3: Language safety. Every text surface that a child would see, including the question stem, the answer options, the distractor options, and any hint text, is scanned against the platform’s language-safety list. This list contains vocabulary and labels that must not appear in tasks shown to children: clinical or comparative terms, and labels borrowed from international testing frameworks not designed for this context. An item with a banned term is rejected, not silently modified.

Stage 4: Arabic agreement. The question stem is passed through an Arabic morphological validator that checks grammatical agreement: gender agreement between verbs, nouns, and pronouns; number agreement; and basic case. If the adapter cannot confirm agreement, the item is rejected rather than allowed through on an assumption. An item with a known agreement violation would produce unreliable results for children working on exactly the skills being tested.

Stage 5: Difficulty estimate. A numeric difficulty estimate is computed for each item from five properties of its text and format. The estimate is not a guess; it is a weighted combination of the Arabic reading level of the vocabulary (using a licensed Arabic lexicon), the reading level of the average word in the stem, the cognitive demand of the question, the structural complexity of the sentences, and the answer-format properties that affect how hard the item is to answer correctly. The five components are summed with fixed weights, and the result determines which difficulty band the item belongs to. There are five bands: very easy, easy, medium, hard, and very hard. Items whose text cannot be scored because the required lexical data is unavailable are stored in the bank without a difficulty estimate and are excluded from sessions that request a specific difficulty band until the estimate can be computed.

Expert sign-off before an item is used

Passing the automated pipeline marks an item as a draft, not as ready to use. Draft items are not served to children. A subject-matter expert reviews the draft and, when satisfied, approves the item for operational use. Only items marked operational appear in the delivery pool.

This two-stage path matters because the automated checks, thorough as they are, test structural and linguistic properties. A human reviewer checks pedagogical fit: whether the item actually tests the skill it claims to test, whether the wrong answers are reasonable distractors rather than obviously wrong options, and whether the language is clear for a child at the intended grade.

The review and approval are logged. The platform records who approved each item, when, and which batch it came from.

What the bank stores

For each operational item the bank holds:

The question stem in plain Arabic, in normalized Arabic, and in fully diacritized Arabic, so the delivery layer can present the form appropriate for the grade and task.
The correct answer or answers.
The distractor options with their types, so the question can be displayed in the right format.
The sub-skill, task mode, scaffold level, and grade.
The difficulty estimate and which band it belongs to.
The passage it belongs to, for comprehension items.
The difficulty components used to compute the estimate, so the calculation can be audited.
Any audio and image files associated with the item.

Dialect tags are optional. An item left untagged is treated as dialect-agnostic and is eligible for any session. An item tagged for a specific regional form is only served when the session’s language context matches.

Batches and idempotency

Items are submitted as batches. A batch is identified by a hash of the submitted file, which means submitting the same file twice produces one batch, not two. If a file was already processed, the second submission returns the result of the first run without re-running the pipeline.

The batch record captures the overall outcome: how many items were accepted, how many were rejected, and the specific rejection reason for each rejected row. Accepted items link back to the batch that introduced them.

What the platform does not do with items

The platform does not modify the text of an accepted item at serve time. It does not combine parts of different items. It does not fill in variables or substitute words to produce a variant of an item. It does not use a language model at any point in the item’s lifecycle, from authoring to serving. The text a child reads is exactly the text the author wrote and the reviewer approved.

Task Delivery: how the platform selects and serves items once they are in the bank.
Intervention Plans: how scaffold levels are set per skill within a plan, which determines which difficulty tier of item a child receives.
How Amal Works: where content and tasks sit in the full measurement-and-teaching cycle.