Getting Started

Your first extraction in Parsewise, either from the app or via Python.

Table of contents

Prerequisites
Concepts
In the app
With the API
- Schema-driven shortcut
Tips for good results
Next steps

Prerequisites

A Parsewise account. Sign up at parsewise.ai if you don’t have one yet.
A source document or two to experiment with (PDF, Word, Excel, PowerPoint, or image).
For the API path: an API key from the Developer page in the app (app.parsewise.ai/developer).

Concepts

Concept	What it is
Project	A workspace holding documents, agents, and results for one subject area (e.g. “Q4 leases”).
Document	A source file you upload. Parsed on upload so agents can read it.
Agent	A reusable definition of what to extract and how. One agent produces one column of values.
Dimension	Optional attachment that splits an agent’s output into multiple rows (e.g. one per clause, one per year).
Result	The resolved extracted values, with citations back to the source pages.

In the app

1. Create a project

Go to app.parsewise.ai and click New Project.
Choose Blank Project, give it a name (e.g. “Lease tests”), and click Create Project.

2. Upload documents

On the Documents page, click Upload Files (or drag files onto the page).
Wait for each document’s status to reach Processed.

3. Create an extraction agent

Two ways to do this:

With Navi (recommended for new users):

Open Navi from the sidebar.
Type something like: “Create an agent that extracts the annual rent in USD.”
Review the proposed configuration and click Create & Launch.

Manually on the Agents page:

Go to Agents and click Create → Manually.
Fill in:
- Name — e.g. Annual rent (USD)
- Cell type — number
- Unit — USD
- Extraction task — “Extract the total annual rent in USD. Return a plain number with no currency symbol or thousands separator.”
Click Save, then Launch All.

4. View results

Go to the Results page.
Switch between Table and By Agent using the toggle in the header.
Click any value to open the Entity Details page, where you can see the underlying sources, the document pages they came from, and override the resolved value if needed.
Click the Download Excel button to export.

5. Iterate

Not quite right? Edit the agent’s extraction task and click Launch All again. Destructive changes re-extract that agent against every document.
New documents arrived? Upload them and click Launch All. Only new or invalidated work is done.

With the API

API examples hit https://api.parsewise.ai/api/v1, with your key in the X-API-Key header. Keys are scoped to one organisation.

export PARSEWISE_API_KEY=pw_live_...

End-to-end hello world — create a project, upload a document, launch an agent, and print results:

import os, time, requests

BASE = "https://api.parsewise.ai/api/v1"
H = {"X-API-Key": os.environ["PARSEWISE_API_KEY"]}

p = requests.post(f"{BASE}/projects/", headers=H,
                  json={"name": "Demo"}).json()
pid = p["id"]

with open("lease.pdf", "rb") as f:
    requests.post(f"{BASE}/projects/{pid}/documents/", headers=H,
                  files={"file": f}).raise_for_status()

requests.post(f"{BASE}/projects/{pid}/agents/", headers=H, json={
    "name": "Annual rent (USD)",
    "extraction_instructions": "Extract the annual rent in USD as a number.",
    "value_type": "number",
    "unit": "USD",
}).raise_for_status()

requests.post(f"{BASE}/projects/{pid}/agents/launch/",
              headers=H).raise_for_status()

while requests.get(f"{BASE}/projects/{pid}/agents/status/",
                   headers=H).json()["pipeline_running"]:
    time.sleep(30)

for row in requests.get(f"{BASE}/projects/{pid}/results/",
                        headers=H).json()["results"]:
    print(row["agent_name"], "→", row["resolution_result"]["value"])

Schema-driven shortcut

If you already have a target JSON Schema, POST /extract/ does everything in one call — upload files, supply the schema, and get back a project ID to poll:

import os, time, json, requests

BASE = "https://api.parsewise.ai/api/v1"
H = {"X-API-Key": os.environ["PARSEWISE_API_KEY"]}

schema = {
    "type": "object",
    "properties": {
        "revenue": {"type": "number"},
        "ceo": {"type": "string"},
    },
}

resp = requests.post(
    f"{BASE}/extract/",
    headers=H,
    files=[("files", open("report.pdf", "rb"))],
    data={"schema": json.dumps(schema)},
)
resp.raise_for_status()
project_id = resp.json()["project_id"]

while True:
    s = requests.get(f"{BASE}/projects/{project_id}/status/", headers=H).json()
    if not s["pipeline_running"] and s.get("schema_status") == "success":
        break
    time.sleep(30)

print(requests.get(
    f"{BASE}/projects/{project_id}/results/schema/", headers=H
).json())

Append ?enrich=true to the results/schema/ call to get per-field consistency status and deep links back into the Parsewise UI alongside each value.

See the API Reference for the full walkthrough, enriched results, robust polling with exponential backoff, per-document citations, error handling, and the FAQ.

Tips for good results

Be specific in extraction tasks. Describe the value first, then where to find it. Include the expected format (e.g. “ISO 8601 date YYYY-MM-DD“, “number without thousands separators”).
Start narrow. Prove an agent out on one well-understood document before pointing it at thousands.
Read the citations, not just the values. A citation confirms the agent found the right span, which matters even when the value happens to look correct.
Pick the right value type. Use number when you’ll aggregate or compare; otherwise use string and specify the format in the task.
One concept per agent. If a name needs “and”, split it into two agents so each value has its own column and its own citations.

More guidance in Platform → Agent design best practices.

Next steps

Read the API reference for the full endpoint list and FAQ.
Explore platform concepts to understand dimensions, per-document mode, and Navi in depth.
Stuck? Email support@parsewise.ai.