Extract structured data from documents

Upload one or more documents together with a target JSON schema. A project is created automatically and the full pipeline (text extraction → agent generation → mapping → extraction) runs in the background. Poll GET /projects/{project_id}/status/ until pipeline_running is false and schema_status is success, then fetch results from GET /projects/{project_id}/results/schema/.

On this page

HTTP request

POST https://api.parsewise.ai/api/v1/extract/

Request Header

Name Required Type Description
X-API-Key Yes string API key with the pw_live_ prefix. See Authentication.

Request Body

Supported content types: multipart/form-data.

Name Required Type Description
files Yes array<string (binary)> One or more document files to process.
schema Yes any A valid JSON Schema (Draft 2020-12) describing the desired output shape.
project_name No string Optional name for the created project. Defaults to “API Extraction”.

Responses

Status Type Description
202 V1ExtractResponse

Security

  • ApiKeyAuth — apiKey — in X-API-Key header. API key with pw_live_ prefix.

Python example

import os
import requests

API_KEY = os.environ["PARSEWISE_API_KEY"]
BASE_URL = "https://api.parsewise.ai/api/v1"

# Send files as multipart/form-data; repeat the "files" field per file.
with open("example.pdf", "rb") as f:
    files = {"files": f}

resp = requests.post(
    f"{BASE_URL}/extract/",
    headers={"X-API-Key": API_KEY},
    files=files,
)
resp.raise_for_status()
print(resp.json() if resp.content else None)

Definitions

V1ExtractRequestRequest

Name Required Type Description
files Yes array<string (binary)> One or more document files to process.
schema Yes any A valid JSON Schema (Draft 2020-12) describing the desired output shape.
project_name No string Optional name for the created project. Defaults to “API Extraction”.

V1ExtractResponse

Name Required Type Description
project_id Yes string (uuid)  
status_url Yes string  
results_url Yes string