# Simulacra Headless API

> Operating guide for the Simulacra Headless API.


## Canonical URLs

- [Interactive API docs](https://api.simulacra-data.com/__docs__/)

- [OpenAPI JSON, primary contract](https://api.simulacra-data.com/openapi.json)

- [Error catalog](https://api.simulacra-data.com/errors)

- [Full agent operating manual](https://api.simulacra-data.com/llms-full.txt)

- [Health check](https://api.simulacra-data.com/healthz)


Use HTTPS endpoints and `/openapi.json` as the machine-readable

contract for this API.


## Core Workflow

1. If credentials do not exist: `POST /v1/signup`, poll

   `/v1/signup/{request_id}`, then claim the one-time secret at

   `POST /v1/credential-claims`.

2. Mint an Auth0 machine-to-machine bearer token.

3. `POST /v1/datasets` with a seed file and an `Idempotency-Key`.

4. Poll `/v1/jobs/{job_id}` when the response is async.

5. GET `/v1/datasets/{dataset_id}/schema` before conditioning.

6. POST `/v1/datasets/{dataset_id}/generations` with an

   `Idempotency-Key`.

7. Poll `/v1/jobs/{job_id}` or `/v1/generations/{generation_id}`.

8. Download `artifact_url` after status is `ready` or `partial`.


## Retry And Polling

- Treat `202 Accepted` as progress, not failure.

- Poll every 2 seconds in examples; production agents should use

  jittered exponential backoff with a deadline.

- Retry transient `429`, `500`, `502`, `503`, and `504` only with the

  same `Idempotency-Key` for POST requests.

- Do not retry `400`, `401`, `402`, `403`, or `404` without changing

  the request.

- When an error body includes `code`, fetch `/errors/{code}` before

  deciding whether the request is retry-safe.


## Coding Agent Rules

- Treat `/openapi.json` as canonical.

- Prefer stable `operationId` values over path-derived names.

- Follow OpenAPI Links when choosing the next operation.

- Generate clients from `/openapi.json` when useful.

- Reuse helper patterns for token minting, idempotent POST retries,

  polling, schema-first conditions, downloads, and problem-code handling.

- Treat operations with `x-openai-isConsequential: true` as

  customer-affecting actions that require explicit user intent.

- Use idempotency keys on POST retries.

- Do not guess categorical levels or numeric ranges; fetch schema.

- Categorical condition values are desired outcome percentages, not internal model parameters.

- Numeric conditions require both `min` and `max`.

- Tight scenarios can produce fewer rows than requested even when

  quota remains; usage-billing is counted on generated rows rather

  than requested rows.

- Preserve `X-Request-Id`, body `request_id`, `job_id`, `dataset_id`,

  and `generation_id` for support.

- Preserve problem `code` and `type`; they are stable diagnostics.

- Never log bearer tokens, client secrets, customer data, claim tokens,

  or artifact URLs.


## Security And Retention Defaults

- Access is company-approved and authenticated with Auth0 M2M tokens.

- SLA, privacy, data-processing, and security-documentation terms are

  governed by each customer's enterprise agreement.

- Dataset and artifact retention are explicit API lifecycle concepts;

  do not assume indefinite storage.


## Overview And Examples

## What Is Simulacra?


Simulacra is a research simulation and what-if scenario modeling

platform for consumer and market research teams. It augments existing

studies with high-fidelity synthetic data so teams can expand sample

sizes, rebalance cohorts, explore low-incidence audiences, and build

scenario models from the data they already trust.


The differentiator is conditioning: instead of only asking for more

rows, you can ask what the full dataset should look like under a

specific desired outcome mix, such as a premium-heavy segment, a

younger target audience, or a high-intent buying scenario. Simulacra

then generates a coherent synthetic dataset around that scenario,

subject to feasibility under the trained model.


The Headless API exposes that workflow programmatically: upload a seed

dataset, wait for training, generate scenario-conditioned synthetic

rows, and download the result as Parquet, CSV, or Arrow.


## Security, Compliance, And Retention


This API serves approved company tenants, not anonymous public

traffic. Resource routes require Auth0 machine-to-machine bearer

tokens, operator actions are audited, and production infrastructure

is monitored with security alerts and health probes.


![SOC 2 audited controls](/assets/compliance/soc2.png)

![ISO/IEC 27001 certified controls](/assets/compliance/iso27001.png)


Simulacra maintains SOC 2 audited and ISO/IEC 27001 certified

controls for this API and the broader platform.


- **Authentication:** OAuth2 client-credentials through Auth0. Treat

  `client_secret` as a production secret and store it in your secret

  manager.

- **Authorization:** credentials are tenant-scoped and tied to a

  Simulacra-approved company. Sales and support users do not handle

  customer `client_secret` values.

- **Transport and storage:** API traffic uses TLS. Retained customer

  artifacts are encrypted at rest; standard managed mode uses

  Simulacra-managed controls, while enterprise storage mode can

  deliver artifacts through customer-controlled storage/key paths.

- **Default retention:** trained dataset artifacts and generated

  outputs default to 24-hour retention. Explicit extension is bounded

  by a 7-day maximum continuous dataset retention window. Managed

  artifact download URLs are short-lived, with a 15-minute default.

- **Delete semantics:** `DELETE /v1/datasets/{dataset_id}` removes

  active dataset access and associated retrievable dataset artifacts

  from the API surface; generation artifacts expire on their own

  retention windows.

- **Secrets and claims:** approved signup and rotation secrets are

  delivered through encrypted one-time credential claims. Claim

  tokens expire and cannot be reused after the secret is claimed.

- **Reliability:** long-running work is represented as async jobs.

  Preserve `X-Request-Id`, body `request_id`, `job_id`, `dataset_id`,

  and `generation_id` in your own support logs.

- **SLA and procurement:** these defaults are platform controls.

  Uptime commitments, support response targets, data-processing

  terms, and audit report access are governed by your enterprise

  agreement with Simulacra.


## End-To-End Setup Flow


Use this as the happy path for a first integration. The endpoint

pages below are detailed references; this flow shows how their

values connect.


1. Request access with `POST /v1/signup`; save the returned

   `request_id`. Re-submitting the same contact email returns

   the existing pending or approved request instead of creating

   a second queue entry.

2. Poll `GET /v1/signup/{request_id}` until the request is

   approved.

3. Exchange the approved `credential_claim_token` at

   `POST /v1/credential-claims`; store the returned

   `client_secret` immediately.

4. Mint an Auth0 bearer token with the client credentials.

5. Upload a seed dataset with `POST /v1/datasets`; save

   `dataset_id` and poll `job_id` when the response is async.

6. Inspect `GET /v1/datasets/{dataset_id}/schema`; use this

   cleaned schema, not your original headers, to build

   conditions.

7. Generate synthetic rows with

   `POST /v1/datasets/{dataset_id}/generations`; save

   `generation_id` and poll `job_id` when needed.

8. Fetch `GET /v1/generations/{generation_id}` until status is

   `ready` or `partial` and `artifact_url` is present. `partial`

   means the scenario was valid but fewer rows were feasible than

   requested; usage-billing is counted on generated rows rather

   than requested rows.

9. Download the artifact. Managed-mode URLs point back to this

   API and still require the bearer token; enterprise URLs may be

   absolute customer-storage URLs.


## What's New


- 2026-05-05: response identifier fields such as `dataset_id`,

  `generation_id`, `job_id`, and `artifact_url` are JSON scalars

  as documented. If you tested against an earlier preview and

  added client code like `response['dataset_id'][0]`, remove any

  `response['dataset_id'][0]` workaround before continuing.


## Versioning And Changelog Policy


The `/v1/*` contract is stable for production integrations.

Simulacra may add optional fields, new enum values, new endpoints,

or richer examples without a version bump. Breaking changes get

at least 30 days' notice or a future `/v2` surface.


Target notice for planned breaking changes is at least 30 days.

Security, legal, or emergency reliability fixes may move faster,

but should include direct customer communication and a clear

rollback or migration path when possible.


Customer-visible contract changes are listed in **What's New** above.

If your tooling consumes `/openapi.json`, diff the spec before

deploy and treat unknown new fields as forward-compatible.


## Client Integration


`/openapi.json` is the canonical machine-readable contract for

generated clients and API tooling. Direct HTTPS clients should

follow the examples below and the documented helper patterns for

Auth0 token minting, signup polling, one-time credential claims,

idempotent retries, async job polling, schema-first conditions,

artifact download, and problem-code classification.


Keep generated clients thin: preserve raw response fields, pass

through `X-Request-Id` and problem `code` values, and put polling,

retry, schema-resolution, and download behavior in a small helper

layer owned by your application.


## Common Mistakes To Avoid


- Do not build conditions from original column names. Training can

  rename columns into identifier-safe form, for example

  `purchase_intent` may become `purchase.intent` depending on the

  cleaning path. Always GET the schema first.

- Cleaning can drop low-signal columns, near-zero-variance columns,

  and rare categorical levels that are too sparse to model reliably. If a

  column or level is not in the schema response, do not condition

  on it.

- `credential_claim_token` is one-time-use. Do not close the

  response before storing the returned `client_secret` in your

  secret manager.

- Use an `Idempotency-Key` on every POST retry. Retrying without

  one can create duplicate work and duplicate usage charges.

- Tight scenarios can produce fewer rows than requested even when

  quota remains; Simulacra will never return more rows than

  `row_count`. Usage-billing is counted on generated rows rather

  than requested rows.

- Do not put bearer tokens, client secrets, artifact URLs, or

  customer data in chat, browser consoles, notebooks shared with

  third parties, or application logs.


## Retries, Quotas, And Billing


- `202 Accepted` is normal for dataset training and generation.

  Poll `/v1/jobs/{job_id}` every 2 seconds for quickstarts; production

  clients should use jittered exponential backoff with a deadline.

- Retry `POST /v1/datasets` and

  `POST /v1/datasets/{dataset_id}/generations` only with an

  `Idempotency-Key`. Reusing the same key makes the retry safe;

  changing the key creates new work.

- Retry transient `429`, `500`, `502`, `503`, and `504` responses

  with backoff. Do not retry `400`, `401`, `402`, `403`, or `404`

  without changing the request.

- `402` means the request exceeds your company's active row

  subscription or request-cap.

- Per-request row caps are safety limits. Tight scenarios can still

  produce fewer rows than requested even when quota remains;

  usage-billing is counted on generated rows rather than requested

  rows.


## Error Catalog


Problem responses include `type`, `title`, `status`, and `detail`.

When the API can classify the failure, the body also includes a stable

`code` such as `simio_unknown_condition_column`, and `type` points to

`https://api.simulacra-data.com/errors/{code}`. Open that URL for

the cause, fix, retryability, and support guidance. Preserve

`X-Request-Id` and response-body `request_id` values when contacting

support.


Error codes are part of the v1 contract. New codes may be added;

renames or removals require a migration window.


- Catalog index: `/errors`

- Example: `/errors/simio_unknown_condition_column`


## Troubleshooting


- **401 Unauthorized:** mint a fresh Auth0 token and verify the

  audience is `https://api.simulacra-data.com`. The Authorize

  panel and the `Authorization: Bearer …` header expect the JWT

  returned by Auth0 (begins with `eyJ…`), NOT your `client_secret`.

  See *Authorize The Interactive Panel* above for the exchange.

  If this happens during upload, reselect the seed file before

  retrying.

- **400 request body is empty:** set `Content-Type: application/json`

  for JSON endpoints and use multipart form-data only for dataset

  uploads.

- **400 unknown condition column or level:** call

  `/v1/datasets/{dataset_id}/schema` and rebuild conditions from the

  cleaned schema. Original seed names may have been normalized.

- **202 keeps polling:** keep polling until `completed`, `failed`,

  `expired`, or `cancelled`; use a deadline and preserve `job_id`.

- **partial generation:** the scenario was feasible only for a subset

  of the requested rows. Inspect `rows_generated` before using the

  artifact downstream.

- **404 on copied IDs:** identifier fields are JSON strings. If your

  client still indexes `[0]`, it may be sending a one-character ID.


## Request Access


API access is request-and-approve. Before any of the credentials in

the Quickstart will work you need an approved tenant.


1. `POST /v1/signup` with your `company_name` and `contact_email`.

   No login is required for this access request; leave the

   Authorization field blank. The response includes a `request_id`.

2. `GET /v1/signup/{request_id}` returns `pending`, `approved`, or

   `declined`. This check is also open because credentials do not

   exist until the request is approved.

3. Once Simulacra approves, the status response includes a

   `client_id` and a one-time `credential_claim_token`.

4. `POST /v1/credential-claims` with that token returns the

   `client_secret` exactly once, plus the Auth0 token URL and

   audience. Store it in your secret manager immediately.


```sh

RESP=$(curl -sS -X POST "https://api.simulacra-data.com/v1/signup" \

  -H "content-type: application/json" \

  -d '{

    "company_name": "Acme Research",

    "contact_email": "data-science@acme.example"

  }')

# 202 Accepted for a new request, or 200 OK if this email already

# has a pending or approved request. Both shapes include request_id.

# Save request_id; the polling URL needs it verbatim. Valid format

# is ^req_[A-Za-z0-9]{1,64}$ — no dots, dashes, or whitespace.

REQUEST_ID=$(echo "${RESP}" | jq -r .request_id)


curl -sS "https://api.simulacra-data.com/v1/signup/${REQUEST_ID}"

# -> pending until an operator approves; then status flips to

#    "approved" and includes `client_id` plus a one-time

#    `credential_claim_token`.


APPROVAL=$(curl -sS "https://api.simulacra-data.com/v1/signup/${REQUEST_ID}")

CLAIM_TOKEN=$(echo "${APPROVAL}" | jq -r .credential_claim_token)


curl -sS -X POST "https://api.simulacra-data.com/v1/credential-claims" \

  -H "content-type: application/json" \

  -d "$(jq -nc --arg token "${CLAIM_TOKEN}" '{claim_token: $token}')"

# -> returns client_id, client_secret, token_url, audience, grant_type.

#    The claim token is one-time-use; put client_secret in your

#    secret manager, not in source code or chat.

```


If you are testing this from the interactive docs panel below, click

**TRY** on `POST /v1/signup`, fill the `company_name` and

`contact_email` fields, and **leave the Authorization field blank** —

this is the access-request step before credentials exist.


## Authorize The Interactive Panel


The interactive docs authenticate with the **HTTP Bearer** field

inside the **AUTHENTICATION** panel. That field expects an Auth0 JWT

access token, not your `client_secret`. JWTs always start with `eyJ`

and contain two dots; your `client_secret` has no fixed prefix and

is a single high-entropy ~64-character string with no dots. Paste

the wrong one and every protected call returns 401.


Click **AUTHENTICATION** in the left navigation. If you already

have a JWT, paste it into the HTTP Bearer field. If you only have

your `client_id` and `client_secret`, use the *Exchange credentials

and fill HTTP Bearer* form in that same panel. It exchanges the

credentials server-side and loads the JWT into HTTP Bearer for you.

Try-It on protected endpoints then succeeds. The token is valid for

~24 hours; rerun the form when it expires.


If you prefer the command line:


```sh

ACCESS_TOKEN=$(curl -sS -X POST "https://simulacra-data.us.auth0.com/oauth/token" \

  -H "content-type: application/json" \

  -d '{

    "client_id":     "YOUR_CLIENT_ID",

    "client_secret": "YOUR_CLIENT_SECRET",

    "audience":      "https://api.simulacra-data.com",

    "grant_type":    "client_credentials"

  }' | jq -r .access_token)


echo "$ACCESS_TOKEN"

```


### Rotating or replacing your client_secret


If you still have a working `client_secret` (or a valid bearer

token minted from it), use `POST /v1/credential-rotations` — it

rotates the secret at Auth0 and returns a one-time

`credential_claim_token` you redeem at `POST /v1/credential-claims`

for the new `client_secret`. The endpoint is authenticated; the

rate limit is three rotations per 24 hours per client.


In the interactive docs, `POST /v1/credential-rotations` gets a

per-tab `Idempotency-Key` automatically. If the browser reloads

or the response disappears before you redeem the claim token, retry

the same operation in that tab; the API returns the same

`credential_claim_token` without rotating Auth0 again.


If you have lost the `client_secret` entirely and cannot

authenticate, email

[support@simulacra-data.com](mailto:support@simulacra-data.com)

and reference your `client_id`. Simulacra will rotate operator-side

and deliver the new `credential_claim_token` over a secure channel.

Resubmitting the signup form does NOT re-issue a

`credential_claim_token` once your initial credential has been

claimed.


## Quickstart


Pick the language tab that matches your stack. Each script below is a

complete, runnable end-to-end flow: it mints a bearer token, uploads a

toy seed dataset, polls until training finishes, fetches the trained

schema, generates scenario-conditioned synthetic rows, downloads the

artifact, and reads it back. The flow is identical across languages —

only the syntax changes — and the conditioning request body is the

same JSON structure everywhere.


The scenario examples are intentionally built from normal client-side

objects — pandas data frames, R data frames, Julia DataFrames, or a

shell variable for SPSS automation. Your HTTP client serializes those

objects to JSON; you should not be hand-maintaining JSON files in a

production integration.


Always fetch `/v1/datasets/{dataset_id}/schema` before building

conditions. The trained schema is the customer-facing contract

after cleaning: names may be normalized, columns may be dropped,

and rare categorical levels may be removed.

The examples below resolve condition columns from the returned

schema before submitting the generation request.


All four scripts read these environment variables:


```sh

export SIMIO_CLIENT_ID="..."

export SIMIO_CLIENT_SECRET="..."

export SIMIO_AUTH0_DOMAIN="simulacra-data.us.auth0.com"

export SIMIO_AUTH0_AUDIENCE="https://api.simulacra-data.com"

export SIMIO_API_BASE="https://api.simulacra-data.com"

```


Jump to: [Python](#quickstart-python) · [R](#quickstart-r) ·

[Julia](#quickstart-julia) · [SPSS](#quickstart-spss)


<a id="quickstart-python"></a>

### Python


Uses `requests` for HTTP, `pandas` + `numpy` for seed/scenario

construction, and `pyarrow` (a `pandas` extra) for Parquet I/O.


```python

import os, time, requests

import numpy as np

import pandas as pd


AUTH0_DOMAIN   = os.environ["SIMIO_AUTH0_DOMAIN"]

AUTH0_AUDIENCE = os.environ["SIMIO_AUTH0_AUDIENCE"]

CLIENT_ID      = os.environ["SIMIO_CLIENT_ID"]

CLIENT_SECRET  = os.environ["SIMIO_CLIENT_SECRET"]

API_BASE       = os.environ["SIMIO_API_BASE"]


# 1. Mint a bearer token.

token = requests.post(

    f"https://{AUTH0_DOMAIN}/oauth/token",

    json={

        "client_id":     CLIENT_ID,

        "client_secret": CLIENT_SECRET,

        "audience":      AUTH0_AUDIENCE,

        "grant_type":    "client_credentials",

    },

    timeout=30,

).json()["access_token"]

auth = {"Authorization": f"Bearer {token}"}


# 2. Build a toy seed dataset and write it to seed.csv.

rng = np.random.default_rng(7)

n = 800

df = pd.DataFrame({

    "age":     rng.integers(18, 66, n),

    "segment": rng.choice(["Value", "Mainstream", "Premium"],

                          n, p=[0.35, 0.45, 0.20]),

    "channel": rng.choice(["Retail", "Online", "Club"],

                          n, p=[0.50, 0.35, 0.15]),

    "region":  rng.choice(["Northeast", "South", "Midwest", "West"], n),

})

intent  = 42.0

intent += (df["segment"] == "Premium") * 18

intent += (df["channel"] == "Online")  *  8

intent += ((df["age"] - 35) / 3).clip(-8, 8)

intent += rng.normal(0, 10, n)

df["purchase_intent"] = intent.round().clip(0, 100).astype(int)

df.to_csv("seed.csv", index=False)


# Helpers: 202 Accepted is normal for training and generation.

# This quickstart uses a simple 2s poll loop; production clients

# should add jittered exponential backoff and a deadline appropriate

# for their workload.

def poll_job(job_id, field, deadline_seconds=600):

    deadline = time.time() + deadline_seconds

    while time.time() < deadline:

        body = requests.get(f"{API_BASE}/v1/jobs/{job_id}",

                             headers=auth, timeout=30).json()

        if body["status"] in ("failed", "expired", "cancelled"):

            raise RuntimeError(f"job {job_id} {body['status']}: {body}")

        if body["status"] == "completed":

            value = body.get(field)

            if value:

                return value

            raise RuntimeError(f"job {job_id} completed without {field}: {body}")

        time.sleep(2)

    raise TimeoutError(f"job {job_id} did not complete before deadline")


def wait_for_dataset(upload):

    if upload.get("status") == "ready" and upload.get("dataset_id"):

        return upload["dataset_id"]

    return poll_job(upload["job_id"], "dataset_id")


def wait_for_generation(initial):

    generation_id = initial.get("generation_id")

    if initial.get("status") == "processing" and initial.get("job_id"):

        generation_id = poll_job(initial["job_id"], "generation_id")

    if not generation_id:

        raise RuntimeError(f"generation response lacked generation_id: {initial}")

    deadline = time.time() + 600

    while time.time() < deadline:

        meta = requests.get(

            f"{API_BASE}/v1/generations/{generation_id}",

            headers=auth, timeout=30,

        ).json()

        if meta.get("status") in ("ready", "partial") and meta.get("artifact_url"):

            return meta

        if meta.get("status") == "failed":

            raise RuntimeError(f"generation failed: {meta}")

        time.sleep(2)

    raise TimeoutError(f"generation {generation_id} did not produce an artifact")


# 3. Upload seed + train. Returns either the dataset_id directly

#    (synchronous) or a 202 with job_id (asynchronous).

with open("seed.csv", "rb") as fh:

    upload = requests.post(

        f"{API_BASE}/v1/datasets",

        headers={**auth,

                 "Idempotency-Key": f"dataset-{int(time.time())}"},

        files={"seed_file": ("seed.csv", fh, "text/csv")},

        data={"display_name": "Quickstart seed", "wait_seconds": "10"},

        timeout=60,

    ).json()

dataset_id = wait_for_dataset(upload)


# 4. Inspect the trained schema. Use the returned column names,

#    levels, and ranges to build conditions; do not guess from the

#    original seed headers.

schema = requests.get(

    f"{API_BASE}/v1/datasets/{dataset_id}/schema",

    headers=auth, timeout=30,

).json()

schema_columns = schema.get("schema", [])

schema_by_name = {col["name"]: col for col in schema_columns}


def resolve_column(*candidates):

    for name in candidates:

        if name in schema_by_name:

            return name

    normalized = {name.replace("_", ".").lower(): name for name in schema_by_name}

    for name in candidates:

        hit = normalized.get(name.replace("_", ".").lower())

        if hit:

            return hit

    raise KeyError(

        f"None of {candidates} is in the cleaned schema. Available: {list(schema_by_name)}"

    )


def require_levels(column, levels):

    available = set(schema_by_name[column].get("levels") or [])

    missing = set(levels) - available

    if missing:

        raise KeyError(

            f"Schema column {column!r} is missing levels {sorted(missing)}. "

            f"Available levels: {sorted(available)}"

        )


segment_col = resolve_column("segment")

channel_col = resolve_column("channel")

age_col = resolve_column("age")

intent_col = resolve_column("purchase_intent", "purchase.intent")

require_levels(segment_col, ["Premium", "Mainstream", "Value"])

require_levels(channel_col, ["Online", "Retail", "Club"])


# 5. Define a scenario as ordinary data frames, then convert to the

#    API condition object. Categorical target_share values are desired

#    outcome percentages, subject to feasibility jitter.

categorical_targets = pd.DataFrame([

    {"column": segment_col, "level": "Premium",    "target_share": 0.55},

    {"column": segment_col, "level": "Mainstream", "target_share": 0.35},

    {"column": segment_col, "level": "Value",      "target_share": 0.10},

    {"column": channel_col, "level": "Online",     "target_share": 0.70},

    {"column": channel_col, "level": "Retail",     "target_share": 0.20},

    {"column": channel_col, "level": "Club",       "target_share": 0.10},

])

numeric_ranges = pd.DataFrame([

    {"column": age_col,    "min": 25, "max": 44},

    {"column": intent_col, "min": 70, "max": 100},

])


def categorical_conditions(targets):

    return {

        column: dict(zip(group["level"], group["target_share"]))

        for column, group in targets.groupby("column", sort=False)

    }


def numeric_conditions(ranges):

    return {

        row.column: {"min": row.min, "max": row.max}

        for row in ranges.itertuples(index=False)

    }


scenario = {

    "row_count":     5000,

    "output_format": "parquet",

    "seed":          20260430,

    "wait_seconds":  20,

    "conditions": {

        "categorical": categorical_conditions(categorical_targets),

        "numeric":     numeric_conditions(numeric_ranges),

    },

}


# 6. Generate the scenario-conditioned synthetic artifact.

gen = requests.post(

    f"{API_BASE}/v1/datasets/{dataset_id}/generations",

    headers={**auth,

             "content-type":    "application/json",

             "Idempotency-Key": f"generation-{int(time.time())}"},

    json=scenario, timeout=60,

).json()

meta = wait_for_generation(gen)


# 7. Download the artifact. artifact_url is absolute for enterprise

#    tenants and relative (managed-mode relay) otherwise.

artifact_url = meta["artifact_url"]

if not artifact_url.startswith("http"):

    artifact_url = f"{API_BASE}{artifact_url}"

art = requests.get(artifact_url, headers=auth, timeout=120)

art.raise_for_status()

with open("synthetic.parquet", "wb") as fh:

    fh.write(art.content)


# 8. Read the result.

synth = pd.read_parquet("synthetic.parquet")

print(len(synth), "rows")

print(synth.head())

```


<a id="quickstart-r"></a>

### R


Uses `httr2` for HTTP and `arrow` for Parquet I/O.


```r

library(httr2)

library(arrow)


AUTH0_DOMAIN   <- Sys.getenv("SIMIO_AUTH0_DOMAIN")

AUTH0_AUDIENCE <- Sys.getenv("SIMIO_AUTH0_AUDIENCE")

CLIENT_ID      <- Sys.getenv("SIMIO_CLIENT_ID")

CLIENT_SECRET  <- Sys.getenv("SIMIO_CLIENT_SECRET")

API_BASE       <- Sys.getenv("SIMIO_API_BASE")


# 1. Mint a bearer token.

token <- request(paste0("https://", AUTH0_DOMAIN, "/oauth/token")) |>

  req_method("POST") |>

  req_body_json(list(

    client_id     = CLIENT_ID,

    client_secret = CLIENT_SECRET,

    audience      = AUTH0_AUDIENCE,

    grant_type    = "client_credentials"

  )) |>

  req_perform() |> resp_body_json()

TOKEN <- token$access_token

auth <- function(req) req_auth_bearer_token(req, TOKEN)


# 2. Build a toy seed dataset and write it to seed.csv.

set.seed(7); n <- 800

seed <- data.frame(

  age     = sample(18:65, n, replace = TRUE),

  segment = sample(c("Value", "Mainstream", "Premium"), n,

                   replace = TRUE, prob = c(0.35, 0.45, 0.20)),

  channel = sample(c("Retail", "Online", "Club"), n,

                   replace = TRUE, prob = c(0.50, 0.35, 0.15)),

  region  = sample(c("Northeast", "South", "Midwest", "West"), n,

                   replace = TRUE)

)

intent <- 42 +

  (seed$segment == "Premium") * 18 +

  (seed$channel == "Online")  *  8 +

  pmin(pmax((seed$age - 35) / 3, -8), 8) +

  rnorm(n, 0, 10)

seed$purchase_intent <- pmax(pmin(round(intent), 100), 0)

write.csv(seed, "seed.csv", row.names = FALSE)


# Helper: a 202 response is normal for async work. Poll the job

# until it is completed. Production clients should add jittered

# exponential backoff and a deadline appropriate for their workload.

poll_job <- function(job_id, field) {

  deadline <- Sys.time() + 600

  repeat {

    if (Sys.time() > deadline) stop("job ", job_id, " timed out")

    body <- request(paste0(API_BASE, "/v1/jobs/", job_id)) |>

      auth() |> req_perform() |> resp_body_json()

    if (body$status %in% c("failed", "expired", "cancelled"))

      stop("job ", job_id, " ", body$status)

    if (identical(body$status, "completed")) {

      value <- body[[field]]

      if (!is.null(value) && nzchar(value)) return(value)

      stop("job ", job_id, " completed without ", field)

    }

    Sys.sleep(2)

  }

}


wait_for_generation <- function(initial) {

  generation_id <- initial$generation_id

  if (identical(initial$status, "processing") && !is.null(initial$job_id))

    generation_id <- poll_job(initial$job_id, "generation_id")

  if (is.null(generation_id) || !nzchar(generation_id))

    stop("generation response lacked generation_id")

  deadline <- Sys.time() + 600

  repeat {

    if (Sys.time() > deadline) stop("generation ", generation_id, " timed out")

    meta <- request(paste0(API_BASE, "/v1/generations/", generation_id)) |>

      auth() |> req_perform() |> resp_body_json()

    if (meta$status %in% c("ready", "partial") &&

        !is.null(meta$artifact_url) && nzchar(meta$artifact_url))

      return(meta)

    if (identical(meta$status, "failed")) stop("generation failed")

    Sys.sleep(2)

  }

}


# 3. Upload seed + train.

upload <- request(paste0(API_BASE, "/v1/datasets")) |>

  req_method("POST") |> auth() |>

  req_headers(`Idempotency-Key` =

                paste0("dataset-", as.integer(Sys.time()))) |>

  req_body_multipart(

    seed_file    = curl::form_file("seed.csv", type = "text/csv"),

    display_name = "Quickstart seed",

    wait_seconds = "10"

  ) |>

  req_perform() |> resp_body_json()

dataset_id <- if (identical(upload$status, "ready") &&

                  !is.null(upload$dataset_id) && nzchar(upload$dataset_id))

  upload$dataset_id else poll_job(upload$job_id, "dataset_id")


# 4. Inspect the trained schema.

schema <- request(paste0(API_BASE, "/v1/datasets/", dataset_id,

                         "/schema")) |>

  auth() |> req_perform() |> resp_body_json()

schema_cols <- schema$schema

schema_names <- vapply(schema_cols, `[[`, character(1), "name")

resolve_column <- function(...) {

  candidates <- c(...)

  hit <- candidates[candidates %in% schema_names]

  if (length(hit)) return(hit[[1]])

  normalized <- setNames(schema_names, tolower(gsub("_", ".", schema_names)))

  for (candidate in candidates) {

    key <- tolower(gsub("_", ".", candidate))

    hit <- normalized[[key]]

    if (!is.null(hit) && !is.na(hit)) return(hit)

  }

  stop("None of ", paste(candidates, collapse = ", "),

       " is in the cleaned schema. Available: ",

       paste(schema_names, collapse = ", "))

}

require_levels <- function(column, levels) {

  col <- schema_cols[[match(column, schema_names)]]

  available <- if (is.null(col$levels)) character() else unlist(col$levels)

  missing <- setdiff(levels, available)

  if (length(missing)) stop("Schema column ", column,

                           " is missing levels: ",

                           paste(missing, collapse = ", "))

}

segment_col <- resolve_column("segment")

channel_col <- resolve_column("channel")

age_col     <- resolve_column("age")

intent_col  <- resolve_column("purchase_intent", "purchase.intent")

require_levels(segment_col, c("Premium", "Mainstream", "Value"))

require_levels(channel_col, c("Online", "Retail", "Club"))


# 5. Define a scenario as ordinary data frames, then convert to the

#    API condition object. target_share values are desired outcome

#    percentages, subject to feasibility jitter.

categorical_targets <- data.frame(

  column = c(segment_col, segment_col, segment_col,

             channel_col, channel_col, channel_col),

  level = c("Premium", "Mainstream", "Value",

            "Online", "Retail", "Club"),

  target_share = c(0.55, 0.35, 0.10, 0.70, 0.20, 0.10)

)

numeric_ranges <- data.frame(

  column = c(age_col, intent_col),

  min    = c(25, 70),

  max    = c(44, 100)

)


categorical_conditions <- lapply(

  split(categorical_targets, categorical_targets$column),

  function(x) as.list(setNames(x$target_share, x$level))

)

numeric_conditions <- setNames(

  lapply(seq_len(nrow(numeric_ranges)), function(i) {

    list(min = numeric_ranges$min[[i]], max = numeric_ranges$max[[i]])

  }),

  numeric_ranges$column

)


scenario <- list(

  row_count     = 5000L,

  output_format = "parquet",

  seed          = 20260430L,

  wait_seconds  = 20L,

  conditions = list(

    categorical = categorical_conditions,

    numeric     = numeric_conditions

  )

)


# 6. Generate the scenario-conditioned synthetic artifact.

gen <- request(paste0(API_BASE, "/v1/datasets/", dataset_id,

                      "/generations")) |>

  req_method("POST") |> auth() |>

  req_headers(`Idempotency-Key` =

                paste0("generation-", as.integer(Sys.time()))) |>

  req_body_json(scenario) |>

  req_perform() |> resp_body_json()

meta <- wait_for_generation(gen)


# 7. Download the artifact.

artifact_url <- if (startsWith(meta$artifact_url, "http"))

  meta$artifact_url else paste0(API_BASE, meta$artifact_url)

request(artifact_url) |> auth() |>

  req_perform(path = "synthetic.parquet")


# 8. Read the result.

synth <- read_parquet("synthetic.parquet")

cat(nrow(synth), "rows\n")

head(synth)

```


<a id="quickstart-julia"></a>

### Julia


Uses `HTTP.jl` + `JSON3` for HTTP, `DataFrames` + `CSV` + `StatsBase`

for the seed, and `Parquet2` for the result.


```julia

using HTTP, JSON3, DataFrames, CSV, Random, StatsBase, Parquet2


AUTH0_DOMAIN   = ENV["SIMIO_AUTH0_DOMAIN"]

AUTH0_AUDIENCE = ENV["SIMIO_AUTH0_AUDIENCE"]

CLIENT_ID      = ENV["SIMIO_CLIENT_ID"]

CLIENT_SECRET  = ENV["SIMIO_CLIENT_SECRET"]

API_BASE       = ENV["SIMIO_API_BASE"]


# 1. Mint a bearer token.

token_resp = HTTP.post(

    "https://$AUTH0_DOMAIN/oauth/token",

    ["content-type" => "application/json"],

    JSON3.write(Dict(

        "client_id"     => CLIENT_ID,

        "client_secret" => CLIENT_SECRET,

        "audience"      => AUTH0_AUDIENCE,

        "grant_type"    => "client_credentials",

    )),

)

TOKEN = JSON3.read(token_resp.body).access_token

auth() = ["Authorization" => "Bearer $TOKEN"]


# 2. Build a toy seed dataset and write it to seed.csv.

Random.seed!(7)

n = 800

df = DataFrame(

    age     = rand(18:65, n),

    segment = sample(["Value", "Mainstream", "Premium"],

                     StatsBase.Weights([0.35, 0.45, 0.20]), n),

    channel = sample(["Retail", "Online", "Club"],

                     StatsBase.Weights([0.50, 0.35, 0.15]), n),

    region  = rand(["Northeast", "South", "Midwest", "West"], n),

)

intent = 42.0 .+

         (df.segment .== "Premium") .* 18 .+

         (df.channel .== "Online")  .*  8 .+

         clamp.((df.age .- 35) ./ 3, -8, 8) .+

         randn(n) .* 10

df.purchase_intent = clamp.(round.(Int, intent), 0, 100)

CSV.write("seed.csv", df)


# Helper: a 202 response is normal for async work. Poll the job

# until it is completed. Production clients should add jittered

# exponential backoff and a deadline appropriate for their workload.

function poll_job(job_id, field)

    deadline = time() + 600

    while true

        time() > deadline && error("job $job_id timed out")

        r = HTTP.get("$API_BASE/v1/jobs/$job_id", auth())

        body = JSON3.read(r.body)

        body.status in ("failed", "expired", "cancelled") &&

            error("job $job_id $(body.status)")

        if body.status == "completed"

            v = get(body, Symbol(field), nothing)

            v === nothing && error("job $job_id completed without $field")

            return string(v)

        end

        sleep(2)

    end

end


function wait_for_generation(initial)

    generation_id = get(initial, :generation_id, nothing)

    if get(initial, :status, "") == "processing" &&

       get(initial, :job_id, nothing) !== nothing

        generation_id = poll_job(initial.job_id, "generation_id")

    end

    generation_id === nothing && error("generation response lacked generation_id")

    deadline = time() + 600

    while true

        time() > deadline && error("generation $generation_id timed out")

        meta = JSON3.read(HTTP.get(

            "$API_BASE/v1/generations/$generation_id", auth()).body)

        if meta.status in ("ready", "partial") &&

           get(meta, :artifact_url, nothing) !== nothing

            return meta

        end

        meta.status == "failed" && error("generation failed")

        sleep(2)

    end

end


# 3. Upload seed + train.

upload = HTTP.post(

    "$API_BASE/v1/datasets",

    vcat(auth(), ["Idempotency-Key" => "dataset-$(round(Int, time()))"]),

    HTTP.Form(Dict(

        "seed_file"    => HTTP.Multipart("seed.csv", open("seed.csv"),

                                         "text/csv"),

        "display_name" => "Quickstart seed",

        "wait_seconds" => "10",

    )),

)

upload_body = JSON3.read(upload.body)

dataset_id  = get(upload_body, :dataset_id, nothing)

dataset_id  = get(upload_body, :status, "") == "ready" &&

    !isnothing(dataset_id) ? string(dataset_id) :

    poll_job(upload_body.job_id, "dataset_id")


# 4. Inspect the trained schema.

schema = JSON3.read(HTTP.get(

    "$API_BASE/v1/datasets/$dataset_id/schema", auth()).body)

schema_cols = collect(schema.schema)

schema_names = [String(col.name) for col in schema_cols]

function resolve_column(candidates...)

    for candidate in candidates

        string(candidate) in schema_names && return string(candidate)

    end

    normalized = Dict(lowercase(replace(name, "_" => ".")) => name

                      for name in schema_names)

    for candidate in candidates

        key = lowercase(replace(string(candidate), "_" => "."))

        haskey(normalized, key) && return normalized[key]

    end

    error("None of $(candidates) is in the cleaned schema. Available: $schema_names")

end

function require_levels(column, levels)

    idx = findfirst(==(column), schema_names)

    raw_levels = hasproperty(schema_cols[idx], :levels) ? schema_cols[idx].levels : String[]

    available = Set(string.(raw_levels))

    missing = setdiff(Set(levels), available)

    !isempty(missing) && error("Schema column $column is missing levels: $missing")

end

segment_col = resolve_column("segment")

channel_col = resolve_column("channel")

age_col = resolve_column("age")

intent_col = resolve_column("purchase_intent", "purchase.intent")

require_levels(segment_col, ["Premium", "Mainstream", "Value"])

require_levels(channel_col, ["Online", "Retail", "Club"])


# 5. Define a scenario as ordinary DataFrames, then convert to the

#    API condition object. target_share values are desired outcome

#    percentages, subject to feasibility jitter.

categorical_targets = DataFrame(

    column = [segment_col, segment_col, segment_col,

              channel_col, channel_col, channel_col],

    level = ["Premium", "Mainstream", "Value",

             "Online", "Retail", "Club"],

    target_share = [0.55, 0.35, 0.10, 0.70, 0.20, 0.10],

)

numeric_ranges = DataFrame(

    column = [age_col, intent_col],

    min = [25, 70],

    max = [44, 100],

)


function categorical_conditions(targets)

    Dict(

        col => Dict(row.level => row.target_share

                    for row in eachrow(targets[targets.column .== col, :]))

        for col in unique(targets.column)

    )

end


numeric_conditions = Dict(

    row.column => Dict("min" => row.min, "max" => row.max)

    for row in eachrow(numeric_ranges)

)


scenario = Dict(

    "row_count"     => 5000,

    "output_format" => "parquet",

    "seed"          => 20260430,

    "wait_seconds"  => 20,

    "conditions"    => Dict(

        "categorical" => categorical_conditions(categorical_targets),

        "numeric"     => numeric_conditions,

    ),

)


# 6. Generate the scenario-conditioned synthetic artifact.

gen = HTTP.post(

    "$API_BASE/v1/datasets/$dataset_id/generations",

    vcat(auth(), ["content-type"    => "application/json",

                  "Idempotency-Key" => "generation-$(round(Int, time()))"]),

    JSON3.write(scenario),

)

gen_body      = JSON3.read(gen.body)

meta = wait_for_generation(gen_body)


# 7. Download the artifact.

artifact_url = startswith(meta.artifact_url, "http") ?

    String(meta.artifact_url) : "$API_BASE$(meta.artifact_url)"

open("synthetic.parquet", "w") do io

    write(io, HTTP.get(artifact_url, auth()).body)

end


# 8. Read the result.

synth = DataFrame(Parquet2.Dataset("synthetic.parquet"))

println(nrow(synth), " rows")

first(synth, 5)

```


<a id="quickstart-spss"></a>

### SPSS


SPSS does not have a native HTTP client, so the realistic flow is to

drive the API from a small shell wrapper and then read the result with

SPSS syntax. SPSS users typically already have survey data they want to

use as a seed instead of synthesizing one — `seed_file` accepts CSV,

Parquet, Excel, and SAV directly.


Requires `curl` and `jq` (see [jq install docs](https://jqlang.org/download/))

on the box that runs the shell script.

The shell wrapper resolves cleaned column names client-side; the API

also validates submitted columns and levels against the cleaned schema

and returns a 400 with a schema hint if they do not match.


```sh

# 1. Mint a bearer token.

TOKEN=$(curl -sS -X POST "https://${SIMIO_AUTH0_DOMAIN}/oauth/token" \

  -H "content-type: application/json" \

  -d "{

    \"client_id\":\"${SIMIO_CLIENT_ID}\",

    \"client_secret\":\"${SIMIO_CLIENT_SECRET}\",

    \"audience\":\"${SIMIO_AUTH0_AUDIENCE}\",

    \"grant_type\":\"client_credentials\"

  }" | jq -r ".access_token")

AUTH="Authorization: Bearer ${TOKEN}"


# Helper: poll a 202 job until it completes. Production clients

# should add jittered exponential backoff and a deadline matched

# to their workload.

poll_job_field() {

  job_id="$1"; field="$2"; deadline=$((SECONDS + 600))

  while [ "${SECONDS}" -lt "${deadline}" ]; do

    sleep 2

    job=$(curl -sS -H "${AUTH}" "${SIMIO_API_BASE}/v1/jobs/${job_id}")

    status=$(echo "${job}" | jq -r ".status // empty")

    if [ "${status}" = "failed" ] || [ "${status}" = "expired" ] || [ "${status}" = "cancelled" ]; then

      echo "${job}" >&2; exit 1

    fi

    if [ "${status}" = "completed" ]; then

      value=$(echo "${job}" | jq -r ".${field} // empty")

      if [ -n "${value}" ]; then printf "%s" "${value}"; return 0; fi

      echo "${job}" >&2; echo "job completed without ${field}" >&2; exit 1

    fi

  done

  echo "job ${job_id} timed out" >&2; exit 1

}


wait_generation_artifact() {

  generation_id="$1"; deadline=$((SECONDS + 600))

  while [ "${SECONDS}" -lt "${deadline}" ]; do

    meta=$(curl -sS -H "${AUTH}" "${SIMIO_API_BASE}/v1/generations/${generation_id}")

    status=$(echo "${meta}" | jq -r ".status // empty")

    artifact=$(echo "${meta}" | jq -r ".artifact_url // empty")

    if { [ "${status}" = "ready" ] || [ "${status}" = "partial" ]; } && [ -n "${artifact}" ]; then

      printf "%s" "${meta}"; return 0

    fi

    if [ "${status}" = "failed" ]; then echo "${meta}" >&2; exit 1; fi

    sleep 2

  done

  echo "generation ${generation_id} timed out" >&2; exit 1

}


# 3. Upload seed + train. Use your existing SAV/CSV file here; we use

#    survey.sav as an example.

UPLOAD=$(curl -sS -X POST "${SIMIO_API_BASE}/v1/datasets" \

  -H "${AUTH}" -H "Idempotency-Key: dataset-$(date +%s)" \

  -F "seed_file=@./survey.sav;type=application/x-spss-sav" \

  -F "display_name=SPSS quickstart seed" \

  -F "wait_seconds=10")

DATASET_ID=$(echo "${UPLOAD}" | jq -r ".dataset_id // empty")

JOB_ID=$(echo "${UPLOAD}" | jq -r ".job_id // empty")

if [ -z "${DATASET_ID}" ]; then

  DATASET_ID=$(poll_job_field "${JOB_ID}" dataset_id);

fi


# 4. Inspect the trained schema. Use these cleaned names in

#    conditions; do not assume original seed headers survived.

SCHEMA=$(curl -sS -H "${AUTH}" \

  "${SIMIO_API_BASE}/v1/datasets/${DATASET_ID}/schema")

echo "${SCHEMA}" | jq .

resolve_col() {

  a="$1"; b="${2:-$1}"

  echo "${SCHEMA}" | jq -r --arg a "${a}" --arg b "${b}" \

    '.schema[] | select(.name == $a or .name == $b) | .name' | head -n 1

}

SEGMENT_COL=$(resolve_col segment)

CHANNEL_COL=$(resolve_col channel)

AGE_COL=$(resolve_col age)

INTENT_COL=$(resolve_col purchase_intent purchase.intent)

if [ -z "${SEGMENT_COL}" ] || [ -z "${CHANNEL_COL}" ] || [ -z "${AGE_COL}" ] || [ -z "${INTENT_COL}" ]; then

  echo "Expected quickstart columns were not all retained after cleaning" >&2; exit 1

fi


# 5. Build the scenario request in a shell variable. Categorical

#    values are desired outcome percentages. Use output_format=csv

#    so the artifact lands in a format SPSS reads directly.

SCENARIO=$(jq -cn --arg segment "${SEGMENT_COL}" --arg channel "${CHANNEL_COL}" \

  --arg age "${AGE_COL}" --arg intent "${INTENT_COL}" '{

  row_count: 5000,

  output_format: "csv",

  seed: 20260430,

  wait_seconds: 20,

  conditions: {

    categorical: {

      ($segment): {Premium: 0.55, Mainstream: 0.35, Value: 0.10},

      ($channel): {Online: 0.70, Retail: 0.20, Club: 0.10}

    },

    numeric: {

      ($age): {min: 25, max: 44},

      ($intent): {min: 70, max: 100}

    }

  }

}')


GEN=$(curl -sS -X POST "${SIMIO_API_BASE}/v1/datasets/${DATASET_ID}/generations" \

  -H "${AUTH}" -H "content-type: application/json" \

  -H "Idempotency-Key: generation-$(date +%s)" \

  --data "${SCENARIO}")

GENERATION_ID=$(echo "${GEN}" | jq -r ".generation_id // empty")

GEN_JOB_ID=$(echo "${GEN}" | jq -r ".job_id // empty")

GEN_STATUS=$(echo "${GEN}" | jq -r ".status // empty")

if [ "${GEN_STATUS}" = "processing" ] && [ -n "${GEN_JOB_ID}" ]; then

  GENERATION_ID=$(poll_job_field "${GEN_JOB_ID}" generation_id)

fi


# 6. Download the artifact.

GEN_META=$(wait_generation_artifact "${GENERATION_ID}")

ARTIFACT_URL=$(echo "${GEN_META}" | jq -r ".artifact_url")

case "${ARTIFACT_URL}" in

  http*) curl -sS -L "${ARTIFACT_URL}" -o synthetic.csv ;;

  *)     curl -sS -L -H "${AUTH}" \

          "${SIMIO_API_BASE}${ARTIFACT_URL}" -o synthetic.csv ;;

esac

```


Then read the result in SPSS — adjust the variable list to match your

seed's column names and types:


```spss

GET DATA

  /TYPE=TXT

  /FILE='synthetic.csv'

  /ENCODING='UTF8'

  /DELCASE=LINE

  /DELIMITERS=","

  /QUALIFIER='"'

  /ARRANGEMENT=DELIMITED

  /FIRSTCASE=2

  /IMPORTCASE=ALL

  /VARIABLES=

    age F8.0

    segment A20

    channel A20

    region A20

    purchase_intent F8.0.

CACHE.

EXECUTE.

```


## Conditioning Reference


- Categorical values under `conditions.categorical` are desired outcome

  percentages: `0.70` means a 70% target share for that level, subject

  to feasibility jitter.

- Numeric ranges under `conditions.numeric` are bilateral: include both

  `min` and `max`.

- Always derive column names and levels from

  `/v1/datasets/{dataset_id}/schema`. Do not guess from memory or

  prompt context.

- Tight scenarios can produce fewer rows than requested even when

  quota remains; usage-billing is counted on generated rows rather

  than requested rows.

- Send an `Idempotency-Key` header on every `POST /v1/datasets` and

  `POST /v1/datasets/{id}/generations` so retries are safe.


## Notes For Coding Agents


Agents should start at `/llms.txt`, use `/openapi.json` as the

canonical contract, and fetch `/v1/datasets/{dataset_id}/schema` before

constructing conditions. Do not infer private columns, dataset IDs,

or credential values from examples. Preserve `X-Request-Id` and any

response-body `request_id` in logs for support, but never log bearer

tokens or client secrets.


## Client Operating Rules


- Treat `202 Accepted` as normal; poll `/v1/jobs/{job_id}`.

- Store `X-Request-Id` and any response-body `request_id` for support.

- Use `/v1/datasets/{dataset_id}/schema` before conditional generation.

- Tight scenarios can produce fewer rows than requested even when

  quota remains; usage-billing is counted on generated rows rather

  than requested rows.

- Do not put client secrets in browsers, shared notebooks, or logs.