Smart Clustering

Smart Clustering answers a practical teaching question: which students in this class can be taught together right now? The modules/cluster module groups a class’s activatable students into clusters based on a 5-key match, hard-excludes two safety partitions, and lets a teacher bulk-activate an entire cluster in a single confirmation step.

The partition: three buckets, one pass

Every recompute makes a single deterministic pass over the class roster. Each student lands in exactly one of three buckets, with no overlap:

Bucket	Trigger	What happens
`individualReview`	Student has an open `acute_regression` alert (SCS_03)	Never clustered, never bulk-activated
`moreDataNeeded`	Student is `DATA_INCOMPLETE`, `do_not_decide_yet`, or has no recommendation (SCS_02)	Never clustered, never bulk-activated
Clusterable	All others	Grouped by the 5-key match

The two excluded partitions (acute_regression and DATA_INCOMPLETE) are hard exclusions with no override. A bulk-activate request that names an excluded student returns 422 EXCLUDED_MEMBER and creates zero assignments (all-or-nothing). This is a deliberate safety design: these students need individual attention, not group assignment.

The 5-key cluster identity

Clusterable students are grouped into buckets whose identity is a canonical clusterKey string formed from five dimensions:

bundle_id: the recommended bundle
anchor_skill_group: the primary anchor skill group within that bundle
scaffold_tier: the resolved scaffold tier (with an optional MERGE_L1_L2 merge for teacher-led delivery modes, per SCS_04)
compatible_delivery_mode: the delivery mode shared by cluster members
sufficient_data: whether all members have enough evidence (DATA_INCOMPLETE students have already been excluded, so this is always true for clusterable rows)

Same 5-key values + same pinned smart_cluster_rule_set version = identical membership every time. This determinism (V-6) is what makes recompute idempotent: running it twice in a row produces the same clusters.

The cluster is addressed by its clusterKey. The opaque DB primary key is not used for bulk-activate or preview operations.

Minimum size gate

A cluster with fewer members than the minimum size (default: 2; configurable per school to 3–5) is dropped. It does not appear as a 1-member cluster. Students whose bundle/tier combination produced a below-minimum group are returned in the moreDataNeeded list from a presentation standpoint, so the teacher sees them and can address them individually.

Bulk activation

When a teacher confirms a cluster, POST /api/cluster/bulk-activate creates one bundle_assignment per confirmed member. The implementation reuses the same single write path as individual bundle activation. No second writer was created for clusters. The CI lint lint:bundle-assignment-single-write-path verifies this on every build.

The request body requires confirmedMemberStudentIds. The teacher must explicitly list the student IDs they intend to activate. This is the V-12 “teacher decides” principle encoded at the API level.

Staleness guard

Between the moment a teacher previews a cluster and the moment they submit bulk-activate, the cluster may have drifted (a student got a new recommendation, an acute regression alert opened, etc.). Bulk-activate re-validates membership against a fresh compute inside the activation transaction. A mismatch in either direction returns 409 CLUSTER_STALE and no assignments are created. The teacher re-runs recompute to get a fresh preview and then confirms again.

Per-member adjustments at activation time

The memberAdjustments field on the bulk-activate body allows optional per-member OVR_* overrides (scaffold tier, delivery, schedule) at the moment of activation. Each adjustment must carry a reasonCode from the ClusterOverrideReasonCode enum (a subset of the 9-code catalog). No free-text fields exist.

Recompute: on-demand and nightly

POST /api/cluster/recompute/{classId} is the idempotent function that computes or refreshes clusters for a class. It is callable on-demand by a teacher (needing VIEW_OWN_CLASSES) or admin (needing RUN_BACKOFFICE), and it is the same function a nightly scheduler calls per active class. Running it multiple times produces the same result given the same inputs.

Each recompute writes a new generation (append-only). Prior generations are never mutated. They expire at createdAt + 60d.

Scaffold Distribution Preview

GET /api/cluster/student/{id}/preview returns the Scaffold Distribution Preview for a specific student: their current cluster assignment plus the per-skill × per-task-type tier breakdown across all bundles. This is the data source for the teacher-facing scaffold distribution panel.

The tier distribution (Level_1/2/3 counts) is teacher/admin only. The student-facing surface never sees tier labels or cluster identity.

Key endpoints

Endpoint	Method	Permission	Purpose
`/api/cluster/recompute/{classId}`	POST	Teacher / Admin	On-demand recompute; returns clusters + 2 exclusion lists
`/api/cluster/student/{id}/preview`	GET	Teacher / Admin	Scaffold Distribution Preview for one student
`/api/cluster/bulk-activate`	POST	Teacher	Bulk-activate a confirmed cluster

Connections

Input: Intervention Bundles: each student’s bundle_recommendation row and scaffold tier are the inputs to cluster grouping.
Safety: Progress Monitoring: the acute regression alert lifecycle produced by progress monitoring feeds the individualReview exclusion list.
Workflow: Teacher Activation Workflow: the Smart Grouping Panel and Bulk Activation Confirmation steps in the 17-step teacher workflow correspond to the recompute and bulk-activate endpoints.
Concepts: Intervention Design: how bundles, scaffold tiers, and smart clusters fit into the broader intervention design model.