API Reference
REST API for the same workflows as the app: create projects, upload documents, define agents, launch extraction, and read results with source citations.
Examples use Python 3.9+ with requests,
but the API is plain JSON over HTTPS — use any client you like.
Table of contents
- Base URL
- OpenAPI schema
- Authentication
- Endpoints
- Full example
- Iterating
- Reading results
- Limits & gotchas
- FAQ
- What do
per_document_modeandper_tag_modeon a project actually do? - Can I turn
per_document_modeon or off on an existing project via the API? - Can I create or list dimension templates via the API?
- The project status endpoint reports
parsing_state: stuck. What does that mean? - How do I tell, strictly, that a run finished successfully?
- What do the
resolution_statusvalues on a result mean? - What do the
extraction_statusvalues on an agent or result mean? - What do the four pipeline stages do?
- What
value_typevalues are supported end-to-end today?
- What do
- Support
Base URL
https://api.parsewise.ai/api/v1
OpenAPI schema
The schema is the source of truth for request/response shapes, field names, and enum values. It’s public and requires no API key.
- YAML: https://api.parsewise.ai/api/v1/schema/
- JSON: https://api.parsewise.ai/api/v1/schema/?format=json
Authentication
Every request must include your API key in the X-API-Key header.
import os
import requests
resp = requests.get(
"https://api.parsewise.ai/api/v1/projects/",
headers={"X-API-Key": os.environ["PARSEWISE_API_KEY"]},
)
resp.raise_for_status()
print(resp.json())
- Keys are prefixed
pw_live_and scoped to one organisation. - Manage keys (create, rotate, revoke) on the Developer page.
- Treat keys as secrets — never commit them or ship them to clients.
Endpoints
All paths are relative to the base URL above. Tables are generated from the OpenAPI schema — see it for full request and response shapes.
Projects
| Method | Path | Description |
|---|---|---|
GET |
/projects/ |
List projects |
GET |
/projects/{id}/ |
Get a project |
GET |
/projects/{project_id}/status/ |
Get project processing status |
POST |
/projects/ |
Create a project |
PATCH |
/projects/{id}/ |
Partially update a project |
PUT |
/projects/{id}/ |
Update a project |
DELETE |
/projects/{id}/ |
Delete a project |
Documents
| Method | Path | Description |
|---|---|---|
GET |
/projects/{project_id}/documents/ |
List documents |
GET |
/projects/{project_id}/documents/{document_id}/ |
Get a document |
GET |
/projects/{project_id}/documents/{document_id}/pages/{page_number}/ |
Get a document page |
GET |
/projects/{project_id}/documents/{document_id}/pages/{page_number}/image/ |
Get a page image |
POST |
/projects/{project_id}/documents/ |
Upload documents |
DELETE |
/projects/{project_id}/documents/{document_id}/ |
Delete a document |
Agents
| Method | Path | Description |
|---|---|---|
GET |
/projects/{project_id}/agents/ |
List agents |
GET |
/projects/{project_id}/agents/status/ |
Get agent processing status |
GET |
/projects/{project_id}/agents/{agent_id}/ |
Get an agent |
POST |
/projects/{project_id}/agents/ |
Create an agent |
POST |
/projects/{project_id}/agents/launch/ |
Launch the agent pipeline |
PATCH |
/projects/{project_id}/agents/{agent_id}/ |
Partially update an agent |
PUT |
/projects/{project_id}/agents/{agent_id}/ |
Update an agent |
DELETE |
/projects/{project_id}/agents/{agent_id}/ |
Delete an agent |
Results
| Method | Path | Description |
|---|---|---|
GET |
/projects/{project_id}/results/ |
List extraction results |
GET |
/projects/{project_id}/results/{resolution_result_id}/ |
Get a single extraction result |
GET |
/projects/{project_id}/results/{resolution_result_id}/extractions/ |
List extractions for a resolution result |
Extractions
| Method | Path | Description |
|---|---|---|
GET |
/projects/{project_id}/extractions/{extraction_id}/bounding-boxes/ |
Get bounding boxes for an extraction |
Full example
Shared setup
import os
import time
import requests
API_KEY = os.environ["PARSEWISE_API_KEY"]
BASE_URL = "https://api.parsewise.ai/api/v1"
HEADERS = {"X-API-Key": API_KEY}
1. Create a project
resp = requests.post(
f"{BASE_URL}/projects/",
headers=HEADERS,
json={"name": "Q4 leases", "description": "Lease extraction"},
)
resp.raise_for_status()
project_id = resp.json()["id"]
2. Upload a document
with open("lease.pdf", "rb") as f:
resp = requests.post(
f"{BASE_URL}/projects/{project_id}/documents/",
headers=HEADERS,
files={"file": f},
)
resp.raise_for_status()
3. Create an agent
resp = requests.post(
f"{BASE_URL}/projects/{project_id}/agents/",
headers=HEADERS,
json={
"name": "Annual rent (USD)",
"extraction_instructions": "Extract the annual rent in USD as a number.",
"value_type": "number",
"unit": "USD",
},
)
resp.raise_for_status()
agent_id = resp.json()["id"]
4. Launch extraction
resp = requests.post(
f"{BASE_URL}/projects/{project_id}/agents/launch/",
headers=HEADERS,
)
resp.raise_for_status() # returns 202 Accepted with no body
5. Poll for progress
Poll with exponential backoff (e.g. 2s → 4s → 8s, capped at ~30s). Stop only when both:
pipeline_runningisfalse, and- every agent’s
extraction_statusisProcessed.
def wait_for_run(project_id: str, max_wait_seconds: int = 1800) -> dict:
delay = 2
deadline = time.time() + max_wait_seconds
while time.time() < deadline:
resp = requests.get(
f"{BASE_URL}/projects/{project_id}/agents/status/",
headers=HEADERS,
)
resp.raise_for_status()
status = resp.json()
running = status.get("pipeline_running", False)
all_processed = all(
a.get("extraction_status") == "Processed"
for a in status.get("agents", [])
)
if not running and all_processed:
return status
time.sleep(delay)
delay = min(delay * 2, 30)
raise TimeoutError("Run did not finish in time")
wait_for_run(project_id)
See the FAQ for why pipeline_running=false alone is not enough.
6. Read results
List rows (paginated):
resp = requests.get(
f"{BASE_URL}/projects/{project_id}/results/",
headers=HEADERS,
)
resp.raise_for_status()
rows = resp.json()["results"]
for row in rows:
print(row["agent_name"], "→", row["resolution_result"]["value"])
Fetch full detail for a single row (includes document-level citations):
resolution_result_id = rows[0]["resolution_result"]["id"]
resp = requests.get(
f"{BASE_URL}/projects/{project_id}/results/{resolution_result_id}/",
headers=HEADERS,
)
resp.raise_for_status()
detail = resp.json()
for source in detail.get("sources", []):
print(source["document_name"], "p.", source["page_number"])
Iterating
Common follow-ups:
- New documents arrive → upload them, then re-launch.
- An agent is wrong →
PATCHthe agent, then re-launch. - A new column is needed → create another agent, then re-launch.
Updating an agent is a single PATCH:
resp = requests.patch(
f"{BASE_URL}/projects/{project_id}/agents/{agent_id}/",
headers=HEADERS,
json={"extraction_instructions": "new task text"},
)
resp.raise_for_status()
The body is PatchedV1AgentRequest — every field is optional. Follow
the update with a launch to recompute.
Launch cost model
Launches are incremental, not a full recompute:
- Parsing runs only on documents still in
Pending. - Extraction is keyed per agent/document. Already-resolved pairs are skipped; only new documents and invalidated agents do work.
- Agent edits invalidate that agent’s data. Changing any of
extraction_instructions,value_type,examples,unit,resolution_instructions,inconsistency_instructions,enable_complex_calculations_in_resolution, orenable_web_searchclears the agent’s extractions on save, so the next launch re-runs that agent across every document. Other agents are untouched.
Reading results
Two endpoints return results (see the schema for the full field set):
GET /projects/{project_id}/results/— paginated list. Each row hasagent_name,value_type,extraction_status,resolution_result(resolved value + metadata), anddimension_instances.GET /projects/{project_id}/results/{resolution_result_id}/— full detail for one row, including document-level citations insources[].
When consuming responses:
- Parse
resolution_result.valueby the agent’svalue_type(stringornumbertoday — see the FAQ). - Gate on
resolution_result.resolution_statusfor high-confidence values only. - Use
resolution_result.referencesfor inline citations ([document_name, page_number];page_numberisNonefor web sources). The detail endpoint’ssources[]gives document-level citations.
Limits & gotchas
- Launch is project-scoped. There is no “only these documents” or “only this agent” option. Prove an agent on a small test project before pointing it at thousands of production documents.
- Don’t stack launches. A second launch issued while a run is in progress is queued silently behind it — wait for the current one to finish.
per_document_mode/per_tag_modeare set at project creation. They can’t be toggled on an existing project viaPATCH. See the FAQ below.- Dimension templates aren’t yet creatable via the API. See the FAQ.
FAQ
What do per_document_mode and per_tag_mode on a project actually do?
They’re mutually exclusive “row shape” modes for the project. With
per_document_mode on, a system-managed Document dimension is
attached to every agent, producing one result row per uploaded
document. per_tag_mode does the same but keyed by document tags.
Both default to off. Leave them off unless you need that row shape —
custom dimensions on individual agents are usually more flexible.
Can I turn per_document_mode on or off on an existing project via the API?
No — set the mode at project creation. PATCH /projects/{id}/
doesn’t run the required cleanup (wiping stale results, attaching or
removing the system Document dimension across every agent), so
flipping it that way leaves the project inconsistent.
To change the mode on a project that already has data, recreate the project and re-upload, or toggle it from the Agents page in the UI (which calls an internal endpoint that performs the cleanup).
Can I create or list dimension templates via the API?
No. The v1 agent payload accepts dimension_template_id, but v1 has no
endpoint to create or list templates.
- For row-per-document or row-per-tag output, use
per_document_modeorper_tag_modeat project creation instead. - For true custom dimensions (per clause, per party, per region), the
workflow isn’t available in v1 yet. Unknown
dimension_template_idvalues are silently skipped. Contact support@parsewise.ai if you’re blocked.
The project status endpoint reports parsing_state: stuck. What does that mean?
At least one document has had a parsing run in flight for over an hour.
A background monitor auto-retries parsing once around 15 minutes in;
stuck means that retry didn’t clear it. There’s no v1 endpoint to
force another retry — delete and re-upload the affected documents, or
contact support.
How do I tell, strictly, that a run finished successfully?
Check two signals together: pipeline_running=false on agents/status/
and every agent reporting extraction_status=Processed.
pipeline_running=false alone also flips true on validation failures
and cancelled runs, so on its own it doesn’t prove success. If any
agent is still Pending while pipeline_running=false, the run failed
— don’t treat the results as final.
What do the resolution_status values on a result mean?
Resolved— sources agreed (or disagreements were reconciled) and the resolver produced a single canonical value. Safe to consume.Requires attention— inconsistencies couldn’t be auto-resolved. A value is still set; gate on this for high-confidence only.Not resolved— the resolver hasn’t run yet. Usually means the pipeline is still in progress; poll again.No result— extraction produced no usable candidates (value absent, low confidence, or no matching dimension instances).Ignored— a user or resolution rule excluded every candidate. Treat as intentional.
For high-confidence pipelines, gate on resolution_status == "Resolved".
What do the extraction_status values on an agent or result mean?
Pending— not yet run, or in flight.Processed— finished successfully.No Result— finished but produced no candidates. On an agent, this usually means the extraction task doesn’t match the documents — iterate and re-launch.
What do the four pipeline stages do?
parsing— extract text and layout from uploaded documents, page by page.extraction— each agent runs its instructions against the relevant pages to find candidate values.web_search— supplements candidates with public web data, only for agents with web search enabled.resolution— consolidates per-page, per-source candidates into one final value per result cell, flagging inconsistencies.
What value_type values are supported end-to-end today?
Two, for new agents:
string(default) — free text. Use for dates, booleans, codes, names, clauses — anything non-numeric. Specify the format in the extraction task (e.g. “ISO 8601 dateYYYY-MM-DD“).number— numeric values you’ll aggregate or compare. Pair withunit(e.g.USD,%,days).
The schema also lists bool, date, and datetime for backwards
compatibility, but they aren’t fully wired end-to-end — stick to
string or number.
Support
- Bugs, questions, feature requests: support@parsewise.ai
- Enterprise / custom integrations: sales@parsewise.ai