~/guides/-guides-generate-types-from-json-samples-
guides · Types

Generate Types from JSON Samples: Fast Scaffolding Without Bad Contracts

How to generate TypeScript, Go, Python, and other models from sample JSON without hard-coding accidental structure into your codebase.

last updated · June 2, 2026by @vultio

Why generated types are useful in the first place

Generating types from JSON samples is one of the fastest ways to move from raw payloads to usable application models. It removes a lot of repetitive handwork and gives you a first draft that is usually better than starting from an empty file.

The catch is that generators only know what the sample shows. They cannot infer the full business contract, missing variants, or future API behavior from one convenient example.

The core risk: sample truth is not contract truth

If your sample omits nullable fields, variant response shapes, or optional keys that appear in production, the generated type will present an overly confident model. That becomes dangerous when developers start treating inferred code as canonical truth.

In other words: type generation is excellent for scaffolding, weak as a substitute for schema design, API docs, or real validation.

What makes a good input sample

Nested objects and arrays that reflect the real contract, not only the simplest payload.
Nullable and optional fields that appear in production often enough to matter.
Representative enum-like values and status fields.
At least one realistic example of repeated child items, not only a single array element.

Why one sample is rarely enough

Real APIs often have hidden variation: optional expansions, nullable fields, partial error objects, pagination wrappers, and status-dependent shapes. One “happy path” payload gives you speed, but almost never gives you a full picture.

If the endpoint can return draft, archived, failed, and partially populated states, your generated model should be informed by those realities rather than by the cleanest example in the docs.

A practical workflow that works well

  1. Clean and format the sample first. Broken JSON produces misleading output or no output at all.
  2. Generate the first draft. Use the tool to get basic shapes, nesting, and property names into code quickly.
  3. Refine the result manually. Rename weak models, extract reusable nested types, and decide which fields are actually optional.
  4. Compare against schema or docs. If OpenAPI, JSON Schema, or platform docs exist, use them as the real source of truth.
  5. Re-test with more than one sample. A second and third payload often reveal missing variants immediately.

Generated models still need naming judgment

Generators are good at structure, but not at semantics. They may create vague names, awkward nested types, or duplicated helper objects that make perfect sense to the algorithm and very little sense to humans maintaining the code later.

The best teams treat generated output as scaffolding: useful enough to accelerate setup, but still open to refactoring into cleaner, domain-aware model names once the shape is visible.

Where generation shines most

Use caseWhy it helps
Frontend API integrationYou get interfaces and object shapes quickly enough to unblock component work.
Backoffice or scripting utilitiesGenerators remove boilerplate for one-off or internal data transforms.
Cross-language model bootstrappingUseful when you need an immediate first draft in TypeScript, Go, Python, or Java.
Documentation supportGenerated output can expose hidden complexity in the payload faster than prose alone.

Generation vs validation vs schema design

LayerPrimary job
Type generationBootstrap developer-facing models quickly from examples.
Runtime validationReject malformed or unsafe payloads during execution.
Schema designDefine the contract intentionally so teams know what is allowed over time.

A strong review habit after generation

Check whether nullable fields were inferred correctly or accidentally made mandatory.
Look for repeated anonymous nested objects that deserve extraction into shared models.
Review arrays carefully: does the generated item type still make sense when the list is empty or partially populated?
Compare generated names against your project conventions before committing them into shared code.

Common mistakes

Generating from a toy example

The output looks clean but misses real nullability, nested variants, and repeated edge cases.

Skipping human cleanup

Generated names and model boundaries are often technically valid but awkward for long-term use.

Confusing types with validation

An interface or struct does not guarantee runtime payload safety.

Locking in accidental field names

One messy sample can fossilize temporary or poorly named keys into your codebase.

What multiple samples reveal that one sample hides

A second or third payload often changes the model more than developers expect. You may discover that a field is sometimes a string and sometimes null, that an array can arrive empty, or that a nested object only appears for paid accounts, admin users, or failed states. Those differences are exactly the kind of edge cases a one-sample workflow tends to hide.

This is why representative sampling matters more than sample quantity for its own sake. You do not need twenty random payloads. You need a small set that covers the meaningful branches in the contract: success, partial success, failure, minimal payload, and maximal payload.

Where generated types can mislead teams

The danger is not just incorrect syntax. The deeper risk is false confidence. Once a generated interface or struct lands in the codebase, other developers may assume someone already thought through field optionality, lifecycle transitions, and backward compatibility. In reality, the model may simply reflect the luck of whatever sample happened to be pasted into the tool that day.

That is why generated output should be labeled mentally as provisional. It is a fast sketch of observed structure, not proof that the API team intended every property to be mandatory, stable, or globally available.

Naming is where human judgment matters most

A generator can tell you that there is an object nested inside another object. It cannot reliably tell you whether that object should be calledUserProfile, AccountOwner, BillingContact, or something more domain-specific. Good names carry intent, and intent is rarely visible in raw JSON alone.

If you skip the naming pass, you often end up with code that is technically typed but semantically muddy. That slows onboarding, blurs domain boundaries, and makes future refactors harder because nobody is fully sure what the generated model was meant to represent.

A post-generation checklist worth keeping

  1. Compare the output against docs or schema so you can spot missing variants immediately.
  2. Review optional and nullable fields line by line because generators often overfit to whichever sample shape they saw.
  3. Rename ambiguous models before they spread into components, services, and public APIs.
  4. Add runtime validation when bad payloads would hurt because static types alone do not protect execution paths.
  5. Regenerate only when needed and re-review the diff, rather than assuming fresh output is automatically safer output.