Getting Started

Your first extraction in Parsewise, either from the app or via Python.

Table of contents

Prerequisites

  • A Parsewise account. Sign up at parsewise.ai if you don’t have one yet.
  • A source document or two to experiment with (PDF, Word, Excel, PowerPoint, or image).
  • For the API path: an API key from the Developer page in the app (app.parsewise.ai/developer). API access is an entitlement — if you don’t see the Developer page, email support@parsewise.ai.

Concepts

Concept What it is
Project A workspace holding documents, agents, and results for one subject area (e.g. “Q4 leases”).
Document A source file you upload. Parsed on upload so agents can read it.
Agent A reusable definition of what to extract and how. One agent produces one column of values.
Dimension Optional attachment that splits an agent’s output into multiple rows (e.g. one per clause, one per year).
Result The resolved extracted values, with citations back to the source pages.

In the app

1. Create a project

  1. Go to app.parsewise.ai and click New Project.
  2. Choose Blank Project, give it a name (e.g. “Lease tests”), and click Create Project.

2. Upload documents

  1. On the Documents page, click Upload Files (or drag files onto the page).
  2. Wait for each document’s status to reach Processed.

3. Create an extraction agent

Two ways to do this:

With Navi (recommended for new users):

  1. Open Navi from the sidebar.
  2. Type something like: “Create an agent that extracts the annual rent in USD.”
  3. Review the proposed configuration and click Create & Launch.

Manually on the Agents page:

  1. Go to Agents and click Create → Manually.
  2. Fill in:
    • Name — e.g. Annual rent (USD)
    • Cell typenumber
    • UnitUSD
    • Extraction task“Extract the total annual rent in USD. Return a plain number with no currency symbol or thousands separator.”
  3. Click Save, then Launch All.

4. View results

  1. Go to the Results page.
  2. Switch between Table and By Agent using the toggle in the header.
  3. Click any value to open the Entity Details page, where you can see the underlying sources, the document pages they came from, and override the resolved value if needed.
  4. Click the Download Excel button to export.

5. Iterate

  • Not quite right? Edit the agent’s extraction task and click Launch All again. Destructive changes re-extract that agent against every document.
  • New documents arrived? Upload them and click Launch All. Only new or invalidated work is done.

With the API

API examples hit https://api.parsewise.ai/api/v1, with your key in the X-API-Key header. Keys are scoped to one organisation.

export PARSEWISE_API_KEY=pw_live_...

End-to-end hello world — create a project, upload a document, launch an agent, and print results:

import os, time, requests

BASE = "https://api.parsewise.ai/api/v1"
H = {"X-API-Key": os.environ["PARSEWISE_API_KEY"]}

p = requests.post(f"{BASE}/projects/", headers=H,
                  json={"name": "Demo"}).json()
pid = p["id"]

with open("lease.pdf", "rb") as f:
    requests.post(f"{BASE}/projects/{pid}/documents/", headers=H,
                  files={"file": f}).raise_for_status()

requests.post(f"{BASE}/projects/{pid}/agents/", headers=H, json={
    "name": "Annual rent (USD)",
    "extraction_instructions": "Extract the annual rent in USD as a number.",
    "value_type": "number",
    "unit": "USD",
}).raise_for_status()

requests.post(f"{BASE}/projects/{pid}/agents/launch/",
              headers=H).raise_for_status()

while requests.get(f"{BASE}/projects/{pid}/agents/status/",
                   headers=H).json()["pipeline_running"]:
    time.sleep(5)

for row in requests.get(f"{BASE}/projects/{pid}/results/",
                        headers=H).json()["results"]:
    print(row["agent_name"], "", row["resolution_result"]["value"])

See the API Reference for the full walkthrough, robust polling with exponential backoff, per-document citations, error handling, and the FAQ.


Tips for good results

  • Be specific in extraction tasks. Describe the value first, then where to find it. Include the expected format (e.g. “ISO 8601 date YYYY-MM-DD, “number without thousands separators”).
  • Start narrow. Prove an agent out on one well-understood document before pointing it at thousands.
  • Read the citations, not just the values. A citation confirms the agent found the right span, which matters even when the value happens to look correct.
  • Pick the right value type. Use number when you’ll aggregate or compare; otherwise use string and specify the format in the task.
  • One concept per agent. If a name needs “and”, split it into two agents so each value has its own column and its own citations.

More guidance in Platform → Agent design best practices.


Next steps