Smart Clustering
Smart Clustering answers a practical teaching question: which students in this
class can be taught together right now? The modules/cluster module groups a
class’s activatable students into clusters based on a 5-key match, hard-excludes
two safety partitions, and lets a teacher bulk-activate an entire cluster in a
single confirmation step.
The partition: three buckets, one pass
Every recompute makes a single deterministic pass over the class roster. Each student lands in exactly one of three buckets, with no overlap:
| Bucket | Trigger | What happens |
|---|---|---|
individualReview | Student has an open acute_regression alert (SCS_03) | Never clustered, never bulk-activated |
moreDataNeeded | Student is DATA_INCOMPLETE, do_not_decide_yet, or has no recommendation (SCS_02) | Never clustered, never bulk-activated |
| Clusterable | All others | Grouped by the 5-key match |
The two excluded partitions (acute_regression and DATA_INCOMPLETE)
are hard exclusions with no override. A bulk-activate request that names
an excluded student returns 422 EXCLUDED_MEMBER and creates zero
assignments (all-or-nothing). This is a deliberate safety design: these
students need individual attention, not group assignment.
The 5-key cluster identity
Clusterable students are grouped into buckets whose identity is a canonical
clusterKey string formed from five dimensions:
bundle_id: the recommended bundleanchor_skill_group: the primary anchor skill group within that bundlescaffold_tier: the resolved scaffold tier (with an optionalMERGE_L1_L2merge for teacher-led delivery modes, per SCS_04)compatible_delivery_mode: the delivery mode shared by cluster memberssufficient_data: whether all members have enough evidence (DATA_INCOMPLETE students have already been excluded, so this is always true for clusterable rows)
Same 5-key values + same pinned smart_cluster_rule_set version = identical
membership every time. This determinism (V-6) is what makes recompute idempotent:
running it twice in a row produces the same clusters.
The cluster is addressed by its clusterKey. The opaque DB primary key is
not used for bulk-activate or preview operations.
Minimum size gate
A cluster with fewer members than the minimum size (default: 2; configurable
per school to 3–5) is dropped. It does not appear as a 1-member cluster.
Students whose bundle/tier combination produced a below-minimum group are
returned in the moreDataNeeded list from a presentation standpoint, so the
teacher sees them and can address them individually.
Bulk activation
When a teacher confirms a cluster, POST /api/cluster/bulk-activate creates
one bundle_assignment per confirmed member. The implementation reuses the
same single write path as individual bundle activation. No second writer
was created for clusters. The CI lint lint:bundle-assignment-single-write-path
verifies this on every build.
The request body requires confirmedMemberStudentIds. The teacher must
explicitly list the student IDs they intend to activate. This is the V-12
“teacher decides” principle encoded at the API level.
Staleness guard
Between the moment a teacher previews a cluster and the moment they submit
bulk-activate, the cluster may have drifted (a student got a new recommendation,
an acute regression alert opened, etc.). Bulk-activate re-validates membership
against a fresh compute inside the activation transaction. A mismatch in either
direction returns 409 CLUSTER_STALE and no assignments are created. The teacher
re-runs recompute to get a fresh preview and then confirms again.
Per-member adjustments at activation time
The memberAdjustments field on the bulk-activate body allows optional
per-member OVR_* overrides (scaffold tier, delivery, schedule) at the moment
of activation. Each adjustment must carry a reasonCode from the
ClusterOverrideReasonCode enum (a subset of the 9-code catalog). No free-text
fields exist.
Recompute: on-demand and nightly
POST /api/cluster/recompute/{classId} is the idempotent function that computes
or refreshes clusters for a class. It is callable on-demand by a teacher (needing
VIEW_OWN_CLASSES) or admin (needing RUN_BACKOFFICE), and it is the same
function a nightly scheduler calls per active class. Running it multiple times
produces the same result given the same inputs.
Each recompute writes a new generation (append-only). Prior generations
are never mutated. They expire at createdAt + 60d.
Scaffold Distribution Preview
GET /api/cluster/student/{id}/preview returns the Scaffold Distribution Preview
for a specific student: their current cluster assignment plus the per-skill ×
per-task-type tier breakdown across all bundles. This is the data source for the
teacher-facing scaffold distribution panel.
The tier distribution (Level_1/2/3 counts) is teacher/admin only. The
student-facing surface never sees tier labels or cluster identity.
Key endpoints
| Endpoint | Method | Permission | Purpose |
|---|---|---|---|
/api/cluster/recompute/{classId} | POST | Teacher / Admin | On-demand recompute; returns clusters + 2 exclusion lists |
/api/cluster/student/{id}/preview | GET | Teacher / Admin | Scaffold Distribution Preview for one student |
/api/cluster/bulk-activate | POST | Teacher | Bulk-activate a confirmed cluster |
Connections
- Input: Intervention Bundles: each student’s
bundle_recommendationrow and scaffold tier are the inputs to cluster grouping. - Safety: Progress Monitoring: the acute
regression alert lifecycle produced by progress monitoring feeds the
individualReviewexclusion list. - Workflow: Teacher Activation Workflow: the Smart Grouping Panel and Bulk Activation Confirmation steps in the 17-step teacher workflow correspond to the recompute and bulk-activate endpoints.
- Concepts: Intervention Design: how bundles, scaffold tiers, and smart clusters fit into the broader intervention design model.