Structured output is where AI stops being a chatbot and starts becoming production software. The moment an application needs a valid JSON object, a function call, a database-ready record, or a workflow action, cost is no longer just “tokens in, tokens out.” You also pay for schema instructions, tool definitions, validation retries, repair prompts, and sometimes a stronger model that follows constraints more reliably.
The expensive mistake is pricing structured output like a normal completion. A user-facing answer might be 300 output tokens. A production extraction task can include 1,200 schema tokens, 800 tool-definition tokens, 2,500 document tokens, and one failed retry that repeats most of the prompt. That retry can double the bill before anyone notices.
This guide breaks down the real cost of JSON mode, schema-constrained responses, and tool/function calling in 2026. We’ll compare cheap-first-pass models against stronger schema-reliable models, run monthly math across practical production scenarios, and end with a routing strategy for teams building automation at scale.
💡 Key Takeaway: Structured output cost is driven by four variables: schema size, prompt context, output length, and retry rate. A cheap model with a 25% retry rate can cost more than a stronger model with a 3% retry rate on the same workflow.
What counts as structured output cost?
Structured output means the model response must conform to a machine-readable shape. The common patterns are:
- JSON mode — the model is instructed or constrained to return valid JSON.
- Schema-constrained output — the response must match a JSON Schema, Pydantic model, Zod schema, or equivalent.
- Tool/function calling — the model returns a function name plus arguments.
- Multi-step tool workflows — the model calls tools, receives results, then calls additional tools or produces final structured data.
- Repair loops — the system validates output, detects invalid JSON or bad fields, and asks the model to fix it.
The cost formula is simple:
Task cost = input tokens × input price + output tokens × output price + retry overhead
The hidden part is retry overhead. A structured task rarely fails with a blank response. It fails after you already paid for the prompt, schema, and partial output. Then your app sends a repair prompt that includes the original invalid output, validation error, and usually the schema again.
For pricing, this guide uses the model prices provided by AI Cost Check model data:
| Model | Provider | Input / 1M tokens | Output / 1M tokens | Context |
|---|---|---|---|---|
| GPT-5 nano | OpenAI | $0.05 | $0.40 | 128K |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | 500K |
| GPT-5 | OpenAI | $1.25 | $10.00 | 1M |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 1M |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | 128K |
| Mistral Small 4 | Mistral AI | $0.15 | $0.60 | 128K |
| Mistral Large 3 | Mistral AI | $0.50 | $1.50 | 256K |
Those prices create a wide spread. For the same 4,000-token structured extraction task, the cheapest viable model can be under a tenth of a cent, while premium models can be several cents per run.
The card above assumes a task with 3,500 input tokens and 250 output tokens, before retries. That is a 15.3x difference on base cost alone.
The five cost drivers most teams miss
1. Schema tokens are input tokens
A schema is not free. If you include a 1,000-token JSON Schema in every request, you pay for it every time unless your provider offers prompt caching and your implementation uses it correctly.
A small classification schema might be 150-300 tokens. A customer support ticket schema with nested categories, enums, confidence scores, extracted entities, and routing actions can easily reach 900-1,800 tokens. Tool definitions can add another 300-1,500 tokens depending on parameter descriptions.
For example, a document extraction request might contain:
| Component | Tokens |
|---|---|
| System instruction | 250 |
| JSON schema | 1,200 |
| User document | 2,500 |
| Few-shot examples | 800 |
| Output JSON | 350 |
| Total | 5,100 |
On GPT-5 mini, this costs:
- Input: 4,750 tokens × $0.25 / 1M = $0.0011875
- Output: 350 tokens × $2.00 / 1M = $0.0007000
- Base task: $0.0018875
At 1 million tasks/month, that schema-heavy workflow costs $1,887.50/month before retries.
2. Tool definitions are repeated context
Tool calling adds structure, but it also adds prompt bulk. Each tool needs a name, description, parameters, required fields, and sometimes constraints. If you expose 12 tools to the model for a task that needs only one, you pay for all 12 definitions.
A practical rule: expose the minimum tool set for the current state. If the workflow stage is “create invoice,” do not include refund, cancellation, analytics, and CRM enrichment tools in the same prompt.
3. Output verbosity is expensive
Output tokens are usually priced higher than input tokens. On GPT-5, output is $10 per 1M tokens, which is 8x the input price of $1.25 per 1M tokens. On GPT-5 mini, output is $2 per 1M tokens, also 8x the input price of $0.25.
Verbose structured output multiplies cost. A model returning long explanations inside JSON fields costs more and increases the chance of invalid escaping, truncated JSON, or downstream parsing issues.
Prefer this:
{"action":"refund","confidence":0.94,"reason_code":"duplicate_charge"}
Avoid this:
{
"action": "refund",
"confidence": 0.94,
"reason": "The customer appears to have been charged twice based on the provided transaction history, and therefore the appropriate customer support action is to issue a refund..."
}
The second form is not just longer. It is less deterministic, harder to validate, and more expensive to store.
⚠️ Warning: Long natural-language fields inside JSON are a cost and reliability trap. Use enums, booleans, IDs, numeric scores, and short reason codes for production automation.
4. Validation retries compound fast
Retries are the largest structured-output budget surprise. A retry usually includes:
- The original schema or a simplified version
- The invalid model output
- The validation error
- A repair instruction
- A corrected output
If the original task was 4,000 input tokens and 300 output tokens, a repair attempt might add 1,800 input tokens and 300 output tokens. At a 20% retry rate, the average cost per successful task rises by roughly 20-35%, depending on repair size.
If failures need a full re-run instead of a repair prompt, the cost increase is closer to the retry rate itself. A 25% full retry rate means 1.25 paid attempts per successful result.
5. Few-shot examples help reliability but increase input cost
Few-shot examples are useful for schema fidelity, especially when fields are ambiguous. But every example adds tokens. Three examples at 300 tokens each add 900 input tokens per request.
At small scale, that is trivial. At 10 million requests/month, 900 extra input tokens costs:
- DeepSeek V4 Flash: $1,260/month
- GPT-5 mini: $2,250/month
- Claude Sonnet 4.6: $27,000/month
Few-shot examples should be treated like production dependencies: measure whether they reduce retries enough to justify their cost.
Cost-per-task comparison: same schema, different models
Let’s define a common structured-output task:
- System and task instructions: 300 input tokens
- JSON schema: 900 input tokens
- User content: 2,000 input tokens
- Tool definitions: 300 input tokens
- Output JSON: 250 output tokens
- Total input: 3,500 tokens
- Total output: 250 tokens
Base cost per task:
| Model | Input cost | Output cost | Base cost / task | Cost / 100K tasks |
|---|---|---|---|---|
| GPT-5 nano | $0.000175 | $0.000100 | $0.000275 | $27.50 |
| Gemini 2.5 Flash-Lite | $0.000350 | $0.000100 | $0.000450 | $45.00 |
| DeepSeek V4 Flash | $0.000490 | $0.000070 | $0.000560 | $56.00 |
| Mistral Small 4 | $0.000525 | $0.000150 | $0.000675 | $67.50 |
| GPT-5 mini | $0.000875 | $0.000500 | $0.001375 | $137.50 |
| Gemini 3 Flash | $0.001750 | $0.000750 | $0.002500 | $250.00 |
| GPT-5 | $0.004375 | $0.002500 | $0.006875 | $687.50 |
| Claude Sonnet 4.6 | $0.010500 | $0.003750 | $0.014250 | $1,425.00 |
| GPT-5.2 | $0.006125 | $0.003500 | $0.009625 | $962.50 |
The cheapest base cost is GPT-5 nano at $27.50 per 100K tasks, but base cost is not the whole decision. If a low-cost model generates invalid JSON, violates enum constraints, or chooses the wrong tool too often, retries erase the savings.
📊 Quick Math: A model with a $0.00056 base task cost and a 30% full retry rate averages $0.000728 per successful task. A model with a $0.001375 base cost and 3% retries averages $0.001416. The cheap model still wins on simple schemas, but the gap narrows from 2.46x to 1.95x.
Retry math: invalid JSON is not the only failure
Most structured-output failures are not syntax errors. Production validators fail outputs for stricter reasons:
- Missing required fields
- Extra fields when
additionalProperties: false - Wrong enum value
- String instead of number
- Null where a value is required
- Tool called with incomplete arguments
- Wrong function selected
- Date format mismatch
- Confidence score outside allowed range
- Output too verbose for downstream system limits
The right way to budget is expected cost per valid result:
Expected cost = base attempt cost + retry rate × retry attempt cost
If retries are full re-runs, retry attempt cost is roughly equal to base cost. If retries are repair prompts, retry attempt cost is usually 35-70% of base cost.
Assume the same base task above: 3,500 input tokens + 250 output tokens.
Repair prompt assumptions:
- Repair instruction + validation error + invalid output + compact schema: 1,700 input tokens
- Corrected JSON: 250 output tokens
Repair cost comparison:
| Model | Base task | Repair task | 5% retry | 15% retry | 30% retry |
|---|---|---|---|---|---|
| DeepSeek V4 Flash | $0.000560 | $0.000308 | $0.000575 | $0.000606 | $0.000652 |
| GPT-5 nano | $0.000275 | $0.000185 | $0.000284 | $0.000303 | $0.000331 |
| GPT-5 mini | $0.001375 | $0.000925 | $0.001421 | $0.001514 | $0.001653 |
| Gemini 3 Flash | $0.002500 | $0.001600 | $0.002580 | $0.002740 | $0.002980 |
| GPT-5 | $0.006875 | $0.004625 | $0.007106 | $0.007569 | $0.008263 |
| Claude Sonnet 4.6 | $0.014250 | $0.008850 | $0.014693 | $0.015578 | $0.016905 |
A 30% repair retry rate raises GPT-5 mini from $137.50 to $165.25 per 100K valid results. That is manageable. But full workflow retries are harsher.
If the model makes the wrong tool call and your application re-runs the full prompt, a 30% retry rate turns GPT-5 mini into $178.75 per 100K valid results. At 10 million tasks/month, that extra retry overhead is $4,125/month.
[stat] 30% A full retry rate of 30% increases the monthly bill by 30% for the same number of valid structured outputs.
Scenario 1: support ticket classification
This is the classic structured-output workflow: classify incoming support tickets, extract entities, assign priority, and choose a routing queue.
Task profile
- Tickets per month: 500,000
- Input per ticket: 1,200 tokens
- Schema and instructions: 600 tokens
- Output: 120 tokens
- Total: 1,800 input + 120 output
- Retry style: repair prompt
- Repair prompt: 900 input + 120 output
Recommended output fields:
{
"category": "billing",
"priority": "high",
"sentiment": "negative",
"account_id_present": true,
"route": "billing_escalation",
"confidence": 0.91
}
Monthly cost estimate
| Model | Base cost / task | Retry assumption | Monthly cost |
|---|---|---|---|
| GPT-5 nano | $0.000138 | 12% repair | $73.32 |
| DeepSeek V4 Flash | $0.000286 | 10% repair | $150.50 |
| Gemini 2.5 Flash-Lite | $0.000228 | 12% repair | $121.08 |
| GPT-5 mini | $0.000690 | 4% repair | $359.80 |
| Claude Haiku 4.5 | $0.002400 | 4% repair | $1,248.00 |
Recommendation: use a cheap deterministic model for first pass. GPT-5 nano, Gemini 2.5 Flash-Lite, or DeepSeek V4 Flash are the right class of model. Use enums for category, priority, and route. Escalate only low-confidence or policy-sensitive tickets to GPT-5 mini or Claude Haiku 4.5.
This workflow should not use Claude Sonnet 4.6 or GPT-5 for every ticket. The output is short and deterministic. Premium reasoning is wasteful unless the classification drives regulated, financial, or legal actions.
Scenario 2: invoice and receipt extraction
Invoice extraction has more fields, more formatting constraints, and higher business impact. The model must extract vendor, invoice number, dates, line items, tax, totals, currency, and payment terms.
Task profile
- Documents per month: 100,000
- Document text: 3,800 tokens
- Schema and instructions: 1,400 tokens
- Output JSON: 700 tokens
- Total: 5,200 input + 700 output
- Repair prompt: 2,400 input + 700 output
Monthly cost estimate
| Model | Base cost / task | Retry assumption | Monthly cost |
|---|---|---|---|
| DeepSeek V4 Flash | $0.000924 | 22% repair | $105.06 |
| Mistral Large 3 | $0.003650 | 10% repair | $391.50 |
| GPT-5 mini | $0.002700 | 8% repair | $287.20 |
| Gemini 3 Flash | $0.004700 | 8% repair | $500.00 |
| GPT-5 | $0.013500 | 3% repair | $1,384.50 |
| Claude Sonnet 4.6 | $0.026100 | 3% repair | $2,679.30 |
Recommendation: start with GPT-5 mini for invoice extraction if line items matter. It costs about $287/month for 100,000 documents under the retry assumptions above and gives a better reliability-cost balance than using a premium model for every document. Use DeepSeek V4 Flash for simple receipts and low-risk vendor documents. Route exceptions to GPT-5 or Claude Sonnet 4.6.
For teams deciding between OpenAI and Anthropic on reliability-sensitive workloads, see GPT-5 vs Claude Sonnet 4.5 and GPT-5 vs Claude Opus 4.6 for broader model tradeoffs.
✅ TL;DR: For extraction with many fields, the cheapest model is not always the cheapest system. Use a mid-tier model for the first pass, keep output compact, and escalate validation failures to a stronger model instead of retrying the same weak prompt repeatedly.
Scenario 3: tool-calling automation for SaaS operations
Tool calling becomes expensive when the model needs to inspect state, select actions, call tools, read results, and produce a final record. This is common in internal ops automation: update CRM, create support cases, schedule follow-ups, enrich leads, or process subscription changes.
Task profile
- Workflows per month: 250,000
- System prompt: 400 tokens
- Tool definitions: 1,800 tokens
- User/task context: 2,200 tokens
- Tool result context: 1,500 tokens
- Final structured output: 350 tokens
- Total across workflow: 5,900 input + 350 output
- Repair or wrong-tool retry: full or near-full retry
- Average retry rate: depends heavily on model
Monthly cost estimate
| Model | Base cost / workflow | Retry assumption | Monthly cost |
|---|---|---|---|
| DeepSeek V4 Flash | $0.000924 | 25% near-full retry | $288.75 |
| GPT-5 mini | $0.002175 | 8% near-full retry | $587.25 |
| Mistral Large 3 | $0.003475 | 10% near-full retry | $955.63 |
| Gemini 3 Flash | $0.004000 | 10% near-full retry | $1,100.00 |
| GPT-5 | $0.010875 | 4% near-full retry | $2,827.50 |
| Claude Sonnet 4.6 | $0.022950 | 4% near-full retry | $5,967.00 |
Recommendation: use GPT-5 mini as the default controller for tool-calling automation. It is not the cheapest per token, but a lower wrong-tool rate matters more than saving fractions of a cent on the first attempt. For simple one-tool workflows, DeepSeek V4 Flash is the cost leader. For workflows involving customer money, account deletion, legal obligations, or multi-step ambiguity, route high-risk cases to GPT-5 or Claude Sonnet 4.6.
Tool calling also benefits from prompt architecture. Do not expose every tool. Split workflows into states:
- Classify intent
- Select allowed tool group
- Call tool with constrained arguments
- Validate result
- Generate final structured audit record
This reduces tool-definition tokens and lowers wrong-tool probability.
Scenario 4: high-volume product data normalization
Ecommerce and marketplace teams often normalize messy product titles, attributes, categories, and variants. This is structured output at scale: short inputs, short outputs, huge volume.
Task profile
- Products per month: 10 million
- Input title and attributes: 350 tokens
- Schema and taxonomy instructions: 500 tokens
- Output: 90 tokens
- Total: 850 input + 90 output
- Repair prompt: 450 input + 90 output
Monthly cost estimate
| Model | Base cost / task | Retry assumption | Monthly cost |
|---|---|---|---|
| GPT-5 nano | $0.0000785 | 15% repair | $841.00 |
| Gemini 2.5 Flash-Lite | $0.0001210 | 12% repair | $1,297.60 |
| DeepSeek V4 Flash | $0.0001442 | 12% repair | $1,521.76 |
| Mistral Small 4 | $0.0001815 | 10% repair | $1,929.00 |
| GPT-5 mini | $0.0003925 | 5% repair | $4,105.00 |
Recommendation: use the cheapest model that passes taxonomy validation. GPT-5 nano is the cost winner in this scenario at roughly $841/month for 10 million products with repair retries. Escalate only products with ambiguous categories, regulated items, or conflicting attributes.
At this scale, a 100-token increase in schema size matters. On 10 million tasks, 100 extra input tokens costs:
- GPT-5 nano: $50/month
- DeepSeek V4 Flash: $140/month
- GPT-5 mini: $250/month
- Claude Sonnet 4.6: $3,000/month
That is why large taxonomies should be retrieved dynamically instead of pasted into every prompt.
JSON mode vs tool calling: which is cheaper?
JSON mode is cheaper when the application needs one final structured object. Tool calling is worth the overhead when the model must choose or execute actions.
Use JSON mode for:
- Classification
- Extraction
- Data normalization
- Summaries with fixed fields
- Scoring and ranking
- Validation reports
Use tool calling for:
- CRM updates
- Calendar scheduling
- Database writes
- Search and retrieval actions
- Multi-step agents
- Workflows that need external state
JSON mode usually has lower prompt overhead because it needs one schema and one output. Tool calling adds tool definitions and often additional model turns. A single tool call can turn one request into two or three billable model interactions.
A practical example:
| Workflow | Input tokens | Output tokens | GPT-5 mini cost |
|---|---|---|---|
| JSON classification | 1,800 | 120 | $0.000690 |
| One tool call + final JSON | 4,000 | 260 | $0.001520 |
| Three-step tool workflow | 8,500 | 650 | $0.003425 |
Tool calling costs 2.2x to 5x more in this example because the model is doing more work. That cost is justified when the workflow replaces human operations or prevents engineering complexity. It is wasteful when a simple JSON label would do.
💡 Key Takeaway: JSON mode is the default for structured data. Tool calling is for actions. If no external system needs to be queried or changed, skip tools and return compact JSON.
Why short deterministic answers beat verbose responses
Structured output should be optimized for machines, not readers. Short deterministic responses reduce cost, validation failures, storage size, latency, and parsing ambiguity.
Use enums instead of prose
Bad:
{"urgency":"This appears to be very important and should be handled soon."}
Good:
{"urgency":"high"}
Use reason codes instead of explanations
Bad:
{"reason":"The user is asking about a duplicate transaction and appears frustrated..."}
Good:
{"reason_code":"duplicate_charge"}
Use IDs instead of labels when possible
Bad:
{"category":"Enterprise Account Billing Issue"}
Good:
{"category_id":"billing.enterprise"}
Use nullable fields carefully
If a field can be missing, define whether it should be null, omitted, or set to a sentinel value. Inconsistent null handling is a common source of retries.
For production automation, the best schema is usually boring:
- Required fields are truly required
- Enums are short
- Descriptions are concise
- Output has no markdown
- Explanations are optional and capped
- Confidence scores are numeric
- Dates use ISO format
- Extra fields are disallowed
This makes cheaper models more viable because the task is less open-ended.
The recommended routing strategy for production teams
The best cost strategy is not “always use the cheapest model” or “always use the strongest model.” It is a routed pipeline that reserves expensive models for hard cases.
Tier 1: cheap first pass for deterministic tasks
Use GPT-5 nano, Gemini 2.5 Flash-Lite, DeepSeek V4 Flash, or Mistral Small 4 for:
- Short classification
- Product normalization
- Simple extraction
- Low-risk routing
- Bulk tagging
This tier should handle 70-90% of high-volume structured requests.
Tier 2: mid-tier model for schema-heavy extraction and tool control
Use GPT-5 mini, Gemini 3 Flash, Mistral Large 3, or Claude Haiku 4.5 for:
- Invoice extraction
- Multi-field records
- Tool arguments
- Workflows with moderate ambiguity
- Cases where retries are frequent on cheaper models
This tier is the default for many production automation teams because it balances reliability and cost.
Tier 3: premium escalation for high-risk or ambiguous tasks
Use GPT-5, GPT-5.2, Claude Sonnet 4.6, or compare premium choices such as GPT-5 vs Gemini 3 Pro for:
- Financial approvals
- Legal or compliance workflows
- Customer-impacting account actions
- Ambiguous multi-step reasoning
- Repeated validation failures
- Low-confidence outputs
This tier should handle 1-10% of tasks, not the bulk path.
The production routing pattern
A robust structured-output pipeline looks like this:
- Run cheap or mid-tier first pass.
- Validate with strict local code.
- If invalid, attempt one compact repair.
- If still invalid, escalate to stronger model.
- If confidence is below threshold, escalate.
- Log schema failures by field.
- Shrink schema and prompts based on observed failure patterns.
Do not retry the same model three or four times with the same prompt. That produces predictable waste. One repair attempt is enough. After that, route up.
⚠️ Warning: Repeating the same failed structured prompt is one of the fastest ways to inflate AI API bills. One repair retry, then escalate or send to a human review queue.
Practical cost controls for structured output
Minify schemas where safe
Long descriptions improve model behavior up to a point. After that, they become expensive comments. Keep field descriptions short and direct.
Instead of:
"description": "This field should contain the priority level of the ticket based on the user's emotional tone, business impact, urgency, and whether the issue prevents them from completing their intended workflow."
Use:
"description": "Priority: low, medium, high, or urgent."
Retrieve only relevant schema sections
If you have a large taxonomy, do not include the entire taxonomy in every request. Use retrieval or a preliminary classifier to select the relevant subset.
Cap output lengths
Set clear maximums for free-text fields:
summary: max 200 charactersreason_code: enumnotes: optional, max 300 charactersentities: max 10 items
Separate thinking from final output
If your workflow needs reasoning, keep the final response compact. Do not ask for a long explanation inside the JSON unless a human will read it.
Track cost per valid output, not cost per request
The metric that matters is cost per valid structured result. A cheap model with many failures looks good in request logs and bad in business metrics. Track:
- Base attempt cost
- Repair cost
- Escalation cost
- Validation failure rate
- Field-level failure rate
- Valid outputs per dollar
Use AI Cost Check to model different input/output sizes and compare providers before committing to a workflow. For broader token budgeting concepts, see the token guide.
Frequently asked questions
How much does structured AI output cost?
A typical structured-output task costs between $0.0001 and $0.02 per valid result depending on model, schema size, output length, and retry rate. Simple classification on GPT-5 nano can be under $100 per 500,000 tasks, while schema-heavy extraction on Claude Sonnet 4.6 can exceed $2,000 per 100,000 documents.
Is JSON mode cheaper than tool calling?
Yes, JSON mode is cheaper for single-response structured data because it avoids tool definitions and extra model turns. Tool calling is worth the additional cost when the model must query external systems, update records, or choose actions. For classification, extraction, and normalization, use JSON mode first.
How much do validation retries add to AI API costs?
Validation retries commonly add 5-30% to structured-output costs. Repair retries are cheaper than full re-runs because they can use a compact prompt, but wrong-tool retries often repeat most of the workflow. Budget using cost per valid output, not cost per initial request.
Which model should I use for structured output in production?
Use GPT-5 nano, Gemini 2.5 Flash-Lite, or DeepSeek V4 Flash for simple high-volume tasks. Use GPT-5 mini for schema-heavy extraction and tool-calling controllers. Escalate high-risk, ambiguous, or repeatedly invalid cases to GPT-5 or Claude Sonnet 4.6.
How do I reduce structured output costs?
Reduce schema tokens, shorten outputs, use enums instead of prose, expose fewer tools, cap free-text fields, and allow only one repair retry before escalation. The biggest savings usually come from routing: cheap model first, strict validation, compact repair, then premium model only for failures.
Calculate your structured output costs
Structured output pricing becomes predictable once you model schema tokens, tool overhead, output size, and retries. The fastest way to get an accurate estimate is to run your own scenarios in AI Cost Check with your actual input and output token counts.
Recommended next steps:
- Compare model prices in the AI Cost Check calculator
- Review GPT-5 mini for mid-tier structured extraction
- Compare GPT-5 vs DeepSeek V3.2 for low-cost automation tradeoffs
- Read the token guide if your team is still estimating prompts by characters
For most production teams, the winning architecture is clear: compact schema, short deterministic JSON, one repair attempt, and routed escalation. That keeps automation reliable without letting validation retries quietly become the largest line item in your AI bill.
