StepInsight
Discovery Instrument · with Protiviti
ADM Transparency Readiness Diagnostic

Discovery Instrument — the questions, and how each is answered

The facilitated session we run to walk one citizen-facing automated decision through its seven surfaces (S1–S7). Each question is tagged by how its answer is established — confirmed with the client in the room, or verified in a technology audit — so the resulting readiness picture is evidence-backed, not self-attested.

StepInsight × Protiviti 7 surfaces · 29 questions Confirm + technology audit Draft — shows the shape of the instrument
What this session does

We walk one already-chosen decision end to end. At each surface we ask how it actually works today, reflect the answer back, and tie it to the obligations that bite there. The output is a single picture: a heatmap of where the decision is compliant, exposed, or unknown — with the three highest-priority gaps and a straight answer on readiness for the 10 December 2026 ADM transparency deadline.

Because the obligation mesh is the rubric, every stage is scored against the law itself — not opinion. That is what makes the result defensible, and it is a live preview of the continuous-assurance "eval loop" without having to build it.

Before this session — the decision is already chosen

This instrument assumes a decision has been selected and access granted

Choosing which decision, confirming the entity type (non-corporate Commonwealth / corporate / state), and securing access happen in a separate partner-scoping step with Rich beforehand. Those scoping questions are not run here — they gate whether we run this. This instrument is the diagnostic itself.

Who needs to be in the room

RoleWhy we need themSurfaces
Decision / process ownerHolds the end-to-end pictureS1–S7
Data / data-matching ownerThe Robodebt zone lives hereS1, S2
Model / scoring ownerExplainability + fairness answersS3, S7
Legally-authorised decision-makerThe accountability anchorS4, S5
Caseworker / operational leadWhat actually happens vs. the policyS4, S5, S6
Setting up for a clean, honest session

Pre-session checklist (30 seconds, the facilitator's job)

The opening move (warm, not a script)

"Before we dive in — so I don't misquote anyone later, a quick go-round: your name and what you do here. We're going to walk one decision end to end. At each step I'll ask how it actually works today, so we can lay it against the obligations that bite by the 10th of December. We're not auditing anyone today — we're mapping the surface so the gaps are obvious and fixable. Anything that's 'we're not sure' is exactly what we want to hear."

The walk — seven surfaces, the questions to ask
Confirm Established with the client in the room — governance, intent, ownership, awareness, plans. Audit Cannot be taken on trust — verified in a follow-on technology audit.
Risk & compliance note — how answers become evidence

Most technical-control questions below cannot be scored "compliant" on the client's say-so. Self-attestation is precisely the failure mode the Robodebt Royal Commission identified — a control "treated as correct by default." So we split the work: in the room we confirm governance, intent and ownership directly with the people accountable; everything tagged Audit is flagged for verification in a follow-on technology audit — inspecting the system, logs, model artefacts, test evidence and the actual citizen-facing notices. A stage only scores green once the audit evidence backs the answer; until then it is provisional. The tags below show which is which — and they double as the scope for that technology audit.

S1
Intake & data captureWhere the citizen data comes from
G6G7G8
Questions to ask the client
  1. Where does the citizen data feeding this decision come from — how many systems and third-party sources? ConfirmAudit
  2. Is the lawful basis and consent recorded at field level, or assumed? Audit
  3. Is the data trimmed to only what the decision actually needs? Audit
  4. What security baseline applies — PSPF, Essential Eight, certified hosting — and can the current assessment evidence be produced? Audit
Reflect back
Each source system by name; the security-baseline acronym expanded once; any count of systems repeated.
Audit verifies
Field-level provenance in the data schema · fields ingested vs. used · current PSPF/Essential Eight/IRAP assessment evidence.
S2
Data matching — the Robodebt zoneWhere averaging and proxy bias hide
G4G5G9
⚑ Heavy-load surface — protect this if time runs short
Questions to ask the client
  1. Is data matched, averaged or integrated across systems to build the case picture — anything resembling income averaging? ConfirmAudit
  2. Is the matching logic documented and tested, or treated as correct by default? Audit
  3. Are low-confidence matches flagged for a human, or do they flow straight through? Audit
  4. Have the matching assumptions been screened for indirect discrimination through proxy variables? Audit
Reflect back
The phrase "income averaging" if it surfaces — it's the Robodebt failure point. Reflect any threshold/confidence number; name the proxy variables mentioned (postcode, age band).
Audit verifies
The actual matching logic + test evidence · the human-referral threshold in the system · any bias / proxy-variable screening artefacts.
S3
AI scoring / triageWhether a flag can be explained
G2G9G10
Questions to ask the client
  1. Can you reconstruct, in plain language, why a given person was flagged or scored? Audit
  2. Does each output carry a reason code a caseworker — and a tribunal — could read? Audit
  3. Is the model a black box, or is explainability built in? Audit
  4. Has fairness been tested across the affected population against documented thresholds? Audit
Reflect back
The model / tool name; whether outputs carry a reason code; the fairness-testing cadence if named.
Audit verifies
Explainability tested on real cases · reason-code content in actual outputs · model architecture · documented fairness thresholds + results.
S4
The human decision — the SeamDecide, or rubber-stamp?
G2G4G10
⚑ Heavy-load surface — the accountability anchor
Questions to ask the client
  1. In practice, does the human meaningfully decide, or effectively rubber-stamp the AI output? ConfirmAudit
  2. Are there defined review gates before a decision is finalised? Confirm
  3. When a human overrides the AI, is the override and its reason logged? Audit
  4. Under what documented legal authority and delegation is the final decision made? Confirm
  5. Is that authorised decision-maker the same person who actually reviews the AI output day to day? Confirm
Reflect back
The named decision-maker's role; the distinction between deciding and rubber-stamping ("so the caseworker can override, but most go through as scored — got it"); whether overrides are logged.
Audit verifies
The override / acceptance rate and time-on-task in the logs — the hard evidence of whether oversight is meaningful or nominal — and that override reasons are actually captured.
S5
Notification, reasons & reviewThe 10 December 2026 deadline
G1G4G10
⚑ Heavy-load surface — the deadline lands here
Questions to ask the client
  1. Today, is AI involvement disclosed to the citizen — in the privacy policy and at the point of decision? Audit
  2. Are the reasons given adequate enough to support a merits review at the ART? ConfirmAudit
  3. Is the route to internal review and the ART clear to the citizen? Audit
  4. Does the client know whether this decision is caught by APP 1.7–1.9, and is there a plan to be compliant by 10 December 2026? Confirm
Reflect back
APP 1.7–1.9 expanded once ("the new ADM transparency rule"); ART expanded once ("the Administrative Review Tribunal"); the 10 Dec 2026 date repeated; disclosure live-today vs. planned.
Audit verifies
The actual privacy collection notice + decision letters (the documents, not the description) · whether reasons in a real sample would survive ART scrutiny · review-route wording.
S6
Logging & provenanceCan you reconstruct a past decision?
G4G7G8
Questions to ask the client
  1. Could you reproduce exactly what data and model version produced a specific decision made months ago? Audit
  2. Is the model version pinned and logged against each decision? Audit
  3. Are the complete decision records kept in an Archives-compliant way, or scattered across operational logs? Audit
Reflect back
The phrase "model version pinned"; where records actually live ("so it's in the case-management system and the model logs — two places").
Audit verifies
An actual reconstruction attempt on one past decision · model-version pinning in the logs · whether the full record set meets Archives Act retention.
S7
Monitoring & assuranceProtiviti's lane — with an AI lens
G2G3G9G10G11
Questions to ask the client
  1. How is the model currently monitored — continuously, or checked when someone remembers? ConfirmAudit
  2. Is drift and fairness re-tested between audits, or only point-in-time? Audit
  3. Does a rule or policy change — like the Dec-2026 date — trigger a re-assessment today? Confirm
  4. Is the assurance sample-based and manual, or moving toward continuous and evidence-rich? ConfirmAudit
  5. If the model or platform is vendor-supplied, do the contracts carry AI accountability terms (e.g. DTA AI Model Clauses) and audit / access rights? Confirm
Reflect back
The monitoring cadence ("reviewed annually unless something breaks — got it"); the point-in-time vs. continuous distinction — where Protiviti's value and the continuous-assurance product live (APRA, April 2026: point-in-time AI assurance is "no longer fit for purpose").
Audit verifies
The monitoring configuration + drift/fairness re-test artefacts · sample-vs-continuous assurance evidence · the supplier contract terms (Q5 — a document review).
Closing the session — soft check
  1. 2–3 minute recap. "Here's what I heard: the decision is [X]; the heaviest exposure looks like S2 / S4 / S5; the systems are [A, B, C]; the thing that surprised me was [Y]." (Reflecting the picture back is itself a clean second capture.)
  2. The check. "Any red flags from that? Anything that doesn't sound like how it actually works for you? Anything important we didn't get to?"
  3. Sit with the silence. The first five seconds of quiet is where the real corrections come out.
  4. Name the audit scope. "A handful of these we can only confirm by looking at the system itself — the logs, the model, the actual notices. I'll send a short list of what we'd need access to." (That list = every Audit question above.)
  5. What happens next. "From this we build the readiness heatmap — every stage scored against the obligations, the three priority gaps named, and a straight answer on readiness by 10 December. That's what goes in front of Lauren and Rita."
Coverage: the 29 questions map to the brief's diagnostic sections C–I, the accountability probe at S4 (brief B-Q9), and a procurement / supplier-accountability question added at S7 (obligation G11 — DTA AI Model Clauses), which the brief's mesh lists but the first draft omitted. Use-case selection and access (brief A, B, J) sit in the separate partner-scoping step and are not run here.

Evidence model: Confirm = established with the client in the room. Audit = verified in a follow-on technology audit before the stage can score compliant. Companion artefact: the readiness heatmap, PVT_adm_readiness-heatmap-mock.html.

Status flags current as at June 2026. Orientation only, not legal advice — confirm currency before any external use. Draft artefact showing the shape of the instrument; not a deployed engagement.
← Pack overview