API Reference
REST API for the same workflows as the app: create projects, upload documents, define agents, launch extraction, and read results with source citations.
Examples use Python 3.9+ with requests,
but the API is plain JSON over HTTPS — use any client you like.
Table of contents
- Base URL
- OpenAPI schema
- Authentication
- Endpoints
- Schema-driven extract (convenience endpoint)
- Step-by-step full example
- Webhooks
- Iterating
- Reading results
- Limits & gotchas
- FAQ
- What do
per_document_modeandper_tag_modeon a project actually do? - Can I turn
per_document_modeon or off on an existing project via the API? - Can I create or list dimension templates via the API?
- The project status endpoint reports
parsing_state: stuck. What does that mean? - How do I tell, strictly, that a run finished successfully?
- What do the
resolution_statusvalues on a result mean? - What do the
extraction_statusvalues on an agent or result mean? - What do the four pipeline stages do?
- What
value_typevalues are supported end-to-end today?
- What do
- Support
Base URL
https://api.parsewise.ai/api/v1
OpenAPI schema
The schema is the source of truth for request/response shapes, field names, and enum values.
- YAML: https://api.parsewise.ai/api/v1/schema/
- JSON: https://api.parsewise.ai/api/v1/schema/?format=json
Authentication
Every request must include your API key in the X-API-Key header.
import os
import requests
resp = requests.get(
"https://api.parsewise.ai/api/v1/projects/",
headers={"X-API-Key": os.environ["PARSEWISE_API_KEY"]},
)
resp.raise_for_status()
print(resp.json())
- Keys are prefixed
pw_live_and scoped to one organisation. - Manage keys (create, rotate, revoke) on the Developer page.
- Treat keys as secrets — never commit them or ship them to clients.
Endpoints
All paths are relative to the base URL above. Tables are generated from the OpenAPI schema — see it for full request and response shapes.
Projects
| Operation | Path | Method |
|---|---|---|
| List projects | /projects/ |
GET |
| Get a project | /projects/{id}/ |
GET |
| Get output schema | /projects/{project_id}/schema/ |
GET |
| Get project processing status | /projects/{project_id}/status/ |
GET |
| Create a project | /projects/ |
POST |
| Partially update a project | /projects/{id}/ |
PATCH |
| Update a project | /projects/{id}/ |
PUT |
| Set output schema | /projects/{project_id}/schema/ |
PUT |
| Delete a project | /projects/{id}/ |
DELETE |
Documents
| Operation | Path | Method |
|---|---|---|
| List documents | /projects/{project_id}/documents/ |
GET |
| Get a document | /projects/{project_id}/documents/{document_id}/ |
GET |
| Get a document page | /projects/{project_id}/documents/{document_id}/pages/{page_number}/ |
GET |
| Get a page image | /projects/{project_id}/documents/{document_id}/pages/{page_number}/image/ |
GET |
| Upload documents | /projects/{project_id}/documents/ |
POST |
| Delete a document | /projects/{project_id}/documents/{document_id}/ |
DELETE |
Agents
| Operation | Path | Method |
|---|---|---|
| List agents | /projects/{project_id}/agents/ |
GET |
| Get agent processing status | /projects/{project_id}/agents/status/ |
GET |
| Get an agent | /projects/{project_id}/agents/{agent_id}/ |
GET |
| Create an agent | /projects/{project_id}/agents/ |
POST |
| Launch the agent pipeline | /projects/{project_id}/agents/launch/ |
POST |
| Partially update an agent | /projects/{project_id}/agents/{agent_id}/ |
PATCH |
| Update an agent | /projects/{project_id}/agents/{agent_id}/ |
PUT |
| Delete an agent | /projects/{project_id}/agents/{agent_id}/ |
DELETE |
Results
| Operation | Path | Method |
|---|---|---|
| List extraction results | /projects/{project_id}/results/ |
GET |
| Get results in schema format | /projects/{project_id}/results/schema/ |
GET |
| Get a single extraction result | /projects/{project_id}/results/{resolution_result_id}/ |
GET |
| List extractions for a resolution result | /projects/{project_id}/results/{resolution_result_id}/extractions/ |
GET |
Dimensions
| Operation | Path | Method |
|---|---|---|
| List dimensions | /projects/{project_id}/dimensions/ |
GET |
| Get a dimension | /projects/{project_id}/dimensions/{dimension_id}/ |
GET |
| Create a dimension | /projects/{project_id}/dimensions/ |
POST |
| Partially update a dimension | /projects/{project_id}/dimensions/{dimension_id}/ |
PATCH |
| Delete a dimension | /projects/{project_id}/dimensions/{dimension_id}/ |
DELETE |
Extract
| Operation | Path | Method |
|---|---|---|
| Extract structured data from documents | /extract/ |
POST |
Extractions
| Operation | Path | Method |
|---|---|---|
| Get bounding boxes for an extraction | /projects/{project_id}/extractions/{extraction_id}/bounding-boxes/ |
GET |
File Edit
| Operation | Path | Method |
|---|---|---|
| Download the edited file | /projects/{project_id}/file-edit/download/ |
GET |
| Get file edit job status | /projects/{project_id}/file-edit/status/ |
GET |
| projects_file_edit_create | /projects/{project_id}/file-edit/ |
POST |
| Delete the file edit job | /projects/{project_id}/file-edit/status/ |
DELETE |
Webhooks
| Operation | Path | Method |
|---|---|---|
| List webhook subscriptions | /webhooks/ |
GET |
| List available webhook event types | /webhooks/events/ |
GET |
| Retrieve a webhook subscription | /webhooks/{id}/ |
GET |
| Create a webhook subscription | /webhooks/ |
POST |
| Send a synthetic test event | /webhooks/{id}/test/ |
POST |
| Update a webhook subscription | /webhooks/{id}/ |
PATCH |
| Delete a webhook subscription | /webhooks/{id}/ |
DELETE |
Schema-driven extract (convenience endpoint)
If you already know the output shape you want, POST /extract/ collapses
the entire create → upload → configure → launch flow into a single
multipart request. You supply the files and a JSON Schema; Parsewise
creates a project, auto-generates agents from the schema, and runs the
full pipeline in the background.
When to use it: you have a target JSON Schema and want results shaped to it without hand-tuning individual agents first. Use the step-by-step flow below when you need per-agent control.
Request
curl -X POST \
-H "X-API-Key: $PARSEWISE_API_KEY" \
-F 'files=@report.pdf' \
-F 'files=@accounts.xlsx' \
-F 'schema={"type":"object","properties":{"revenue":{"type":"number"},"ceo":{"type":"string"}}}' \
-F 'project_name=API Test Project' \
"https://api.parsewise.ai/api/v1/extract/"
| Field | Type | Required | Description |
|---|---|---|---|
files |
file(s) | yes | One or more document files (repeat the field for multiple). |
schema |
JSON string | yes | A valid JSON Schema (Draft 2020-12) describing the desired output. |
project_name |
string | no | Name for the auto-created project. Defaults to "API Extraction". |
Response (202 Accepted)
{
"project_id": "<uuid>",
"status_url": "/api/v1/projects/<uuid>/status/",
"results_url": "/api/v1/projects/<uuid>/results/schema/"
}
Poll and read results
Poll GET /projects/{project_id}/status/ until pipeline_running is
false and schema_status is "success", then fetch:
GET /projects/{project_id}/results/schema/
The response body is a JSON object shaped to the schema you submitted, with values populated from the documents.
Enriched results
Append ?enrich=true to get per-field metadata alongside each value.
For every scalar leaf in the output, two sibling keys are added:
| Sibling key | Description |
|---|---|
<field>_consistency |
Resolution status for the field — one of Resolved, Requires attention, Not resolved, No result, or Ignored. |
<field>_parsewise_url |
Deep link into the Parsewise UI for the underlying resolution result. |
GET /projects/{project_id}/results/schema/?enrich=true
Example (plain vs enriched):
// Plain (?enrich omitted or false)
{ "revenue": 42000000, "ceo": "Jane Doe" }
// Enriched (?enrich=true)
{
"revenue": 42000000,
"revenue_consistency": "Resolved",
"revenue_parsewise_url": "https://app.parsewise.ai/projects/<project-uuid>/agents/<agent-uuid>/<result-uuid>",
"ceo": "Jane Doe",
"ceo_consistency": "Resolved",
"ceo_parsewise_url": "https://app.parsewise.ai/projects/<project-uuid>/agents/<agent-uuid>/<result-uuid>"
}
Minimal Python example
import os, time, json, requests
BASE = "https://api.parsewise.ai/api/v1"
H = {"X-API-Key": os.environ["PARSEWISE_API_KEY"]}
schema = {
"type": "object",
"properties": {
"revenue": {"type": "number"},
"ceo": {"type": "string"},
},
}
resp = requests.post(
f"{BASE}/extract/",
headers=H,
files=[("files", open("report.pdf", "rb"))],
data={"schema": json.dumps(schema), "project_name": "API Test Project"},
)
resp.raise_for_status()
project_id = resp.json()["project_id"]
while True:
status = requests.get(
f"{BASE}/projects/{project_id}/status/", headers=H
).json()
if not status["pipeline_running"] and status.get("schema_status") == "success":
break
time.sleep(30)
results = requests.get(
f"{BASE}/projects/{project_id}/results/schema/", headers=H
).json()
print(json.dumps(results, indent=2))
Step-by-step full example
Shared setup
import os
import time
import requests
API_KEY = os.environ["PARSEWISE_API_KEY"]
BASE_URL = "https://api.parsewise.ai/api/v1"
HEADERS = {"X-API-Key": API_KEY}
1. Create a project
resp = requests.post(
f"{BASE_URL}/projects/",
headers=HEADERS,
json={"name": "Q4 leases", "description": "Lease extraction"},
)
resp.raise_for_status()
project_id = resp.json()["id"]
2. Upload a document
with open("lease.pdf", "rb") as f:
resp = requests.post(
f"{BASE_URL}/projects/{project_id}/documents/",
headers=HEADERS,
files={"file": f},
)
resp.raise_for_status()
3. Create an agent
resp = requests.post(
f"{BASE_URL}/projects/{project_id}/agents/",
headers=HEADERS,
json={
"name": "Annual rent (USD)",
"extraction_instructions": "Extract the annual rent in USD as a number.",
"value_type": "number",
"unit": "USD",
},
)
resp.raise_for_status()
agent_id = resp.json()["id"]
4. Launch extraction
resp = requests.post(
f"{BASE_URL}/projects/{project_id}/agents/launch/",
headers=HEADERS,
)
resp.raise_for_status() # returns 202 Accepted with no body
5. Poll for progress
Poll with exponential backoff (e.g. 2s → 4s → 8s, capped at ~30s). Stop only when both:
pipeline_runningisfalse, and- every agent’s
extraction_statusisProcessed.
def wait_for_run(project_id: str, max_wait_seconds: int = 1800) -> dict:
delay = 2
deadline = time.time() + max_wait_seconds
while time.time() < deadline:
resp = requests.get(
f"{BASE_URL}/projects/{project_id}/agents/status/",
headers=HEADERS,
)
resp.raise_for_status()
status = resp.json()
running = status.get("pipeline_running", False)
all_processed = all(
a.get("extraction_status") == "Processed"
for a in status.get("agents", [])
)
if not running and all_processed:
return status
time.sleep(delay)
delay = min(delay * 2, 30)
raise TimeoutError("Run did not finish in time")
wait_for_run(project_id)
See the FAQ for why pipeline_running=false alone is not enough.
6. Read results
List rows (paginated):
resp = requests.get(
f"{BASE_URL}/projects/{project_id}/results/",
headers=HEADERS,
)
resp.raise_for_status()
rows = resp.json()["results"]
for row in rows:
print(row["agent_name"], "→", row["resolution_result"]["value"])
Fetch full detail for a single row (includes document-level citations):
resolution_result_id = rows[0]["resolution_result"]["id"]
resp = requests.get(
f"{BASE_URL}/projects/{project_id}/results/{resolution_result_id}/",
headers=HEADERS,
)
resp.raise_for_status()
detail = resp.json()
for source in detail.get("sources", []):
print(source["document_name"], "p.", source["page_number"])
Webhooks
Instead of polling agents/status/ to find out when an extraction run
finishes, you can register a webhook and let Parsewise call you. We POST
a JSON event to your URL when an asynchronous pipeline reaches a
terminal state.
Manage webhooks (create, test, delete) on the
Developer page or via
/api/v1/webhooks/ — the OpenAPI schema linked above is the source of
truth for request/response shapes.
Event types
The current event registry is exposed at
GET /api/v1/webhooks/events/. As of this writing it contains:
| Event | When it fires |
|---|---|
agent.completion.succeeded |
The agent extraction pipeline for a project finished successfully. |
agent.completion.failed |
The agent extraction pipeline for a project failed. |
webhook.ping |
A synthetic event you can fire from POST /api/v1/webhooks/<id>/test/. Useful for verifying connectivity without launching a real pipeline. |
Envelope
Every webhook body is a JSON envelope of the same shape — only the
data field varies per event:
{
"id": "f7d6...", // unique delivery id (also Parsewise-Delivery-Id header)
"event": "agent.completion.succeeded",
"occurred_at": "2026-05-01T17:23:11.123456+00:00",
"subscription_id": "b1...",
"data": { /* event-specific — see below */ }
}
Per-event data payloads
agent.completion.succeeded:
{
"project_id": "5fa1...",
"run_number": 4,
"agents": [
{ "id": "8c2e...", "name": "Annual rent (USD)", "extraction_status": "Processed" },
{ "id": "9d11...", "name": "Term length (months)", "extraction_status": "No Result" }
]
}
Each entry in agents reflects that agent’s terminal extraction_status
for the run (same values as on GET /agents/status/ — typically
Processed or No Result).
agent.completion.failed:
{
"project_id": "5fa1...",
"run_number": 4,
"failure_reason": "internal_error"
}
failure_reason is currently always internal_error (an unexpected
pipeline failure). The schema reserves additional values
(validation_failed, extraction_failed, expired) for future use —
handle unknown values gracefully.
webhook.ping (synthetic test event):
{ "message": "This is a test delivery from Parsewise." }
Request headers
Every delivery includes:
Content-Type: application/jsonParsewise-Delivery-Id: <uuid>— unique per attempt; use it for idempotency.Parsewise-Event: <event_name>— convenience header; the same value also appears in the body.
Plus any custom headers you registered on the subscription.
Retries
Failed deliveries are retried with exponential backoff and jitter. We
treat 5xx, 408, 429, connection errors, and read timeouts as
retryable. 2xx is success; other 4xx responses stop retrying. After
the final attempt, the delivery row is marked failed and shows up in
your delivery history.
Authenticating webhooks
Configure an Authorization header (or any custom request header) on
the subscription. Parsewise stores the value encrypted at rest and
sends it on every delivery, so you can verify it on receipt the way
you would any inbound request. Retryable failures (see Retries)
mean the same delivery can land more than once, so make your handler
idempotent on Parsewise-Delivery-Id.
Testing locally
POST /api/v1/webhooks/<id>/test/ synchronously delivers one
webhook.ping event and returns the HTTP status, duration, and any
error from the destination. No retries — the response reflects the
single attempt. Use it to verify your endpoint is reachable and that
TLS is set up correctly before any real event fires.
Iterating
Common follow-ups:
- New documents arrive → upload them, then re-launch.
- An agent is wrong →
PATCHthe agent, then re-launch. - A new column is needed → create another agent, then re-launch.
Updating an agent is a single PATCH:
resp = requests.patch(
f"{BASE_URL}/projects/{project_id}/agents/{agent_id}/",
headers=HEADERS,
json={"extraction_instructions": "new task text"},
)
resp.raise_for_status()
The body is PatchedV1AgentRequest — every field is optional. Follow
the update with a launch to recompute.
Launch cost model
Launches are incremental, not a full recompute:
- Parsing runs only on documents still in
Pending. - Extraction is keyed per agent/document. Already-resolved pairs are skipped; only new documents and invalidated agents do work.
- Agent edits invalidate that agent’s data. Changing any of
extraction_instructions,value_type,examples,unit,resolution_instructions,inconsistency_instructions,enable_complex_calculations_in_resolution, orenable_web_searchclears the agent’s extractions on save, so the next launch re-runs that agent across every document. Other agents are untouched.
Reading results
Two endpoints return results (see the schema for the full field set):
GET /projects/{project_id}/results/— paginated list. Each row hasagent_name,value_type,extraction_status,resolution_result(resolved value + metadata), anddimension_instances.GET /projects/{project_id}/results/{resolution_result_id}/— full detail for one row, including document-level citations insources[].
When consuming responses:
- Parse
resolution_result.valueby the agent’svalue_type(stringornumbertoday — see the FAQ). - Gate on
resolution_result.resolution_statusfor high-confidence values only. - Use
resolution_result.referencesfor inline citations ([document_name, page_number];page_numberisNonefor web sources). The detail endpoint’ssources[]gives document-level citations.
Limits & gotchas
- Launch is project-scoped. There is no “only these documents” or “only this agent” option. Prove an agent on a small test project before pointing it at thousands of production documents.
- Don’t stack launches. A second launch issued while a run is in progress is queued silently behind it — wait for the current one to finish.
per_document_mode/per_tag_modeare set at project creation. They can’t be toggled on an existing project viaPATCH. See the FAQ below.- Dimension templates aren’t yet creatable via the API. See the FAQ.
FAQ
What do per_document_mode and per_tag_mode on a project actually do?
They’re mutually exclusive “row shape” modes for the project. With
per_document_mode on, a system-managed Document dimension is
attached to every agent, producing one result row per uploaded
document. per_tag_mode does the same but keyed by document tags.
Both default to off. Leave them off unless you need that row shape —
custom dimensions on individual agents are usually more flexible.
Can I turn per_document_mode on or off on an existing project via the API?
No — set the mode at project creation. PATCH /projects/{id}/
doesn’t run the required cleanup (wiping stale results, attaching or
removing the system Document dimension across every agent), so
flipping it that way leaves the project inconsistent.
To change the mode on a project that already has data, recreate the project and re-upload, or toggle it from the Agents page in the UI (which calls an internal endpoint that performs the cleanup).
Can I create or list dimension templates via the API?
No. The v1 agent payload accepts dimension_template_id, but v1 has no
endpoint to create or list templates.
- For row-per-document or row-per-tag output, use
per_document_modeorper_tag_modeat project creation instead. - For true custom dimensions (per clause, per party, per region), the
workflow isn’t available in v1 yet. Unknown
dimension_template_idvalues are silently skipped. Contact support@parsewise.ai if you’re blocked.
The project status endpoint reports parsing_state: stuck. What does that mean?
At least one document has had a parsing run in flight for over an hour.
A background monitor auto-retries parsing once around 15 minutes in;
stuck means that retry didn’t clear it. There’s no v1 endpoint to
force another retry — delete and re-upload the affected documents, or
contact support.
How do I tell, strictly, that a run finished successfully?
Check two signals together: pipeline_running=false on agents/status/
and every agent reporting extraction_status=Processed.
pipeline_running=false alone also flips true on validation failures
and cancelled runs, so on its own it doesn’t prove success. If any
agent is still Pending while pipeline_running=false, the run failed
— don’t treat the results as final.
What do the resolution_status values on a result mean?
Resolved— sources agreed (or disagreements were reconciled) and the resolver produced a single canonical value. Safe to consume.Requires attention— inconsistencies couldn’t be auto-resolved. A value is still set; gate on this for high-confidence only.Not resolved— the resolver hasn’t run yet. Usually means the pipeline is still in progress; poll again.No result— extraction produced no usable candidates (value absent, low confidence, or no matching dimension instances).Ignored— a user or resolution rule excluded every candidate. Treat as intentional.
For high-confidence pipelines, gate on resolution_status == "Resolved".
What do the extraction_status values on an agent or result mean?
Pending— not yet run, or in flight.Processed— finished successfully.No Result— finished but produced no candidates. On an agent, this usually means the extraction task doesn’t match the documents — iterate and re-launch.
What do the four pipeline stages do?
parsing— extract text and layout from uploaded documents, page by page.extraction— each agent runs its instructions against the relevant pages to find candidate values.web_search— supplements candidates with public web data, only for agents with web search enabled.resolution— consolidates per-page, per-source candidates into one final value per result cell, flagging inconsistencies.
What value_type values are supported end-to-end today?
Two, for new agents:
string(default) — free text. Use for dates, booleans, codes, names, clauses — anything non-numeric. Specify the format in the extraction task (e.g. “ISO 8601 dateYYYY-MM-DD“).number— numeric values you’ll aggregate or compare. Pair withunit(e.g.USD,%,days).
The schema also lists bool, date, and datetime for backwards
compatibility, but they aren’t fully wired end-to-end — stick to
string or number.
Support
- Bugs, questions, feature requests: support@parsewise.ai
- Enterprise / custom integrations: sales@parsewise.ai
Table of contents
- Create a dimension
- Create a project
- Create a webhook subscription
- Create an agent
- Delete a dimension
- Delete a document
- Delete a project
- Delete a webhook subscription
- Delete an agent
- Delete the file edit job
- Download the edited file
- Extract structured data from documents
- Get a dimension
- Get a document
- Get a document page
- Get a page image
- Get a project
- Get a single extraction result
- Get agent processing status
- Get an agent
- Get bounding boxes for an extraction
- Get file edit job status
- Get output schema
- Get project processing status
- Get results in schema format
- Launch the agent pipeline
- List agents
- List available webhook event types
- List dimensions
- List documents
- List extraction results
- List extractions for a resolution result
- List projects
- List webhook subscriptions
- Partially update a dimension
- Partially update a project
- Partially update an agent
- Retrieve a webhook subscription
- Send a synthetic test event
- Set output schema
- Update a project
- Update a webhook subscription
- Update an agent
- Upload documents
- projects_file_edit_create