The Records Project
Massachusetts  ·  Vol. I  ·  Est. 2026
For audit and replication

The classifier, in plain English.

Every disposition reading on this site is generated by a transparent regex pipeline. The patterns are listed below in plain English, with what they trigger on, what they mean for the requester, and what we already know they get wrong.

If you spot an error in our classification of a specific case, tell us with the case number and what we got wrong. Mistakes feed the next regex iteration.


The reading model

Every SOR order has the same structure: setup, agency argument, optional petitioner challenge, the Supervisor's reasoning, and a disposition in the last one or two paragraphs. The classifier reads the disposition tail (last 40% of the document, anchored on the final "Sincerely,") and matches against a curated list of operative phrases.

Document-type detection runs first on the opening paragraph: is this a regular requester appeal, an agency-side fee or time petition, or a reconsideration? Each type gets its own outcome enum set, because the same word ("granted") means opposite things across types.

Source: regex_classifier_v3.py (the V4b lock as of 2026-05-03). 580 lines, ~80 patterns, runs the full 30,800-order corpus in under two minutes, costs zero.

Known limits

These are issues we have found and not yet fixed. Listing them here is part of the contract: a working classifier with documented errors is more useful than a black-box claiming perfect accuracy.

Material — affects headline numbers

"Ordered" is one bucket. It should be several.

The ORDERED_TO_PROVIDE outcome triggers on phrases like "is ordered to provide a response." Reading the underlying orders, that catches at least four meaningfully different things:

  • Released records — agency must turn over the documents.
  • Respond properly — agency must either release or identify an exemption. Often satisfied by a better-justified denial.
  • Clarify scope — agency must clarify whether additional records exist, or what the request actually covers.
  • Provide a fee estimate — agency must quote a price the requester can decide to pay or not.

In a working journalist's read of the docket, true orders to release records are rare. The bulk of "ordered" outcomes are procedural-cure orders telling the agency to respond better. Today's classifier collapses all four into one bucket, and the sub-flag that was supposed to catch the procedural-cure subset (merits_avoided) currently fires on only 3% of ORDERED — far too low.

Estimated impact: the headline rate of disclosure orders is materially overstated. The "actually released records" rate is being studied; expect it to land well below the current 52–58% post-reform band. Sub-categories will be exposed as drill-down filters on the stats page when the next iteration cycle lands.

Status: open. Surfaced 2026-05-04 from a sample read plus working-attorney intuition. Sub-classification queued in PLAN.md as the next iteration cycle.
Minor — known and quantified

10.1% of orders have no clear outcome

Across all 30,800 orders, 3,106 (10.1%) end up in a NONE bucket because the disposition phrasing eluded every pattern. These are queued for human review. Most are likely AGENCY_RESPONDED variants in older, more variable templates (especially 2018, where a uniform OCR fingerprint damaged 465 orders).

Status: open. Iteration cycle in progress.
Minor — known and quantified

PARTIAL outcomes are under-counted

Only 5 orders out of 30,800 are classified PARTIAL (granted in part, denied in part). Hand-reading suggests the true rate is much higher — orders that uphold withholding under one exemption while preserving the agency's right to charge under another are functionally partial wins, but the classifier currently calls them DENIED or UPHELD. Specific anchor phrases ("however, this determination does not preclude," "to the extent") need a dedicated regex pass.

Status: open. Iteration cycle queued.

Outcomes — regular appeals

A regular appeal is filed by the requester after an agency response (or non-response). These are the bulk of the corpus — 27,362 of 30,217 portal-tagged cases. Each disposition resolves to one of the following outcomes.

Substantive — favors requester

ORDERED_TO_PROVIDE

The Supervisor orders the agency to provide a response. As noted in Known limits, this can mean either substantive disclosure or a procedural-cure order to re-justify withholding.
is hereby ordered to provide is hereby ordered to make is ordered to provide is ordered to make said response is ordered to make a response shall provide … responsive records is ordered to review … records … provide is ordered to redact … (provide | disclose | release) once the fees are paid … (custodian) … provide failure to comply with this order may result in referral has not met its burden … why … may not be redacted
In plain English: SOR is telling the agency to do something — release, redact-and-release, give a better written response, clarify scope, or quote a fee. The current classifier counts all of those equally as a "win." Sub-categories (release vs. respond-better vs. clarify vs. fee-quote) are coming in the next iteration.

Substantive — favors agency

UPHELD_WITHHOLDING

The Supervisor finds the agency's withholding was proper under one or more exemptions.
the appeal is denied properly withheld upheld … withholding has properly denied acted properly in withholding whereas … properly (denied | invoked | withheld) public interest in … does not outweigh … privacy interest has met its burden (in | to) withhold i find … has met its burden … withhold
SOR sided with the agency. The records stay sealed.

Procedural close

AGENCY_RESPONDED_DURING_APPEAL

The agency provided a further response while the appeal was pending and the Supervisor closed the matter without ordering substantive disclosure. Common — about a third of all orders. The same matter often comes back as a new appeal.
i will now consider this … appeal closed will now consider this … appeal closed this administrative appeal is now closed this appeal is now closed considered closed intends on providing a (further | supplemental | written | forthcoming) response … is ordered to provide … within (10 | ten) business days
SOR closed without reaching the merits. Whether the requester actually got the records they wanted is between them and the agency now.

Substantive — split

PARTIAL

The Supervisor grants disclosure of some records or fields and upholds withholding of others. As noted in Known limits, currently severely under-counted.
partial … disclosure granted in part … denied in part
A nuanced ruling — typical when one exemption is rejected but a second is preserved.

Other procedural

CLOSED_OTHER

Procedural close on grounds other than agency-responded: parallel litigation, jurisdictional non-subject, no duty to create records, insufficient specificity, unique-right-of-access (police records), and similar.
no duty to create … appeal closed no authority to compel … create records § 6A(d) unique right of access … (declin | appeal closed) declin … review … unique right of access 32.08(2)(b) (parallel litigation regulation cite) parallel … litigation 32.08(1)(f) (specificity regulation cite) insufficient … specificity lack of jurisdiction not subject to the public records law is dismissed shares jurisdiction with the superior court i decline to … intervene
A grab-bag of procedural exits. The matter is closed without a merits ruling.

Procedural — narrow

DECLINED_TO_OPINE

The Supervisor declines to render a determination. Different from CLOSED_OTHER in that no procedural exit was cited; the Supervisor simply found it unnecessary or improper to opine on the question.
unnecessary to opine no need to opine declines? to opine decline to render a determination declines? to provide a determination
SOR took a pass.

Other

RECORDS_DESTROYED · WITHDRAWN_BY_REQUESTER · IN_CAMERA_ORDERED · RECONSIDERATION_DENIED · REQUESTER_DECEASED

Rare outcomes, each with a small set of trigger phrases. Combined fewer than 100 cases across the corpus.
records … have been destroyed records were destroyed the requestor … has withdrawn the request … has been withdrawn withdrew (his | her | the | their) request in[- ]camera review … (is | are) … order is ordered … submit … in[- ]camera i respectfully decline … reconsider decline … reconsideration reconsideration is denied requestor … is deceased requester … has (passed | died)
Edge-case dispositions. The requester withdrew, the records don't exist anymore, the Supervisor needs to see the records privately before ruling, the reconsideration was denied on a separate matter, or the requester died before resolution.

Outcomes — agency-side petitions

Fee petitions and time petitions are filed by the agency, asking the Supervisor's permission to charge for redaction time or to extend the response deadline. They use a different statutory basis than regular appeals — G.L. c. 66 §§ 10(c) and 10(d)(iv) — and they are tracked separately from regular appeals throughout this site.

Favors agency

GRANTED

The Supervisor grants the agency's petition. On a fee petition, this means the requester pays for the redaction work. On a time petition, the agency gets more time.
established good cause … (extension | fee) established good cause to permit an extension the … has met its burden … redaction i find the … has met its burden may assess a fee for … (segregation | redaction | commercial) is granted an extension i (approve | grant) … petition
Note the inversion: GRANTED on a petition is bad for the requester. GRANTED on a regular appeal is good for the requester. Same word, opposite consequence.

Favors requester

DENIED

The Supervisor denies the agency's petition.
has not met its burden … petition i (deny | decline to grant) … petition the petition is denied
The agency cannot charge, or cannot extend, on the grounds it asked for.

What we strip before classifying

Some legal-standard recitations appear in nearly every order regardless of outcome. If we let those through, they cause false positives — the recital "may be properly withheld from disclosure" reads like UPHELD even when the disposition is the opposite. The classifier strips these phrases from the disposition region before running outcome regex.

Suffolk § 10A(a) recital

Standard framing language for attorney-client privilege analysis. Cites Suffolk Constr. Co. v. Div. of Capital Asset Mgmt. Appears in 41% of true UPHELD orders and 80% of false-positive cases.
in assessing whether a records custodian has properly withheld records based on the claim of attorney-client privilege

Case-law quote on Exemption (X)

Quoted prior case-law containing "may be properly withheld from disclosure under Exemption (X)." Narrative cite, not a disposition.
may be properly withheld from disclosure under exemption (X)

Suffolk three-prong burden recital

The standard "records custodian claiming the attorney-client privilege has the burden of not only proving…" recital. Appears in nearly every privilege case as mandatory framing.
records custodian claiming the attorney-client privilege has the burden of not only proving …

Each canonical phrase is paired with a fuzzy probe. When the fuzzy probe matches but the canonical phrase doesn't, we flag the case for human review — has the boilerplate drifted (a new way of writing the same thing), or is this a substantive change of law?


How accuracy is measured here

Each new pattern is tested three ways before being added: (1) does it not break the smoke set, (2) does it improve the holdout against hand-labels, and (3) does the corpus distribution shift in a defensible way. Pattern changes are version-controlled and the prior corpus distributions are kept on disk as diff baselines.

When a finding like the "ordered to respond" issue surfaces, it is added to Known limits immediately and queued as the next iteration cycle. Patches do not silently change the headline numbers; they ship with a note explaining what changed and why.


Audit invitation

If you read an order on the underlying SOR docket and our classification looks wrong, send us the case number and what we got wrong. We will check, fix the pattern if the fix is a real one, and credit you in the changelog.

Report a misclassification →