Platform
Concepts, agent design, plans, and security.
Table of contents
Core concepts
Projects
A project is a workspace containing related documents and extraction agents. Each project has:
- A collection of uploaded documents
- Extraction agents configured for that project
- Dimensions that categorise extracted data
- Results views showing all extracted information
- Navi, the AI assistant for interacting with the data
Scope one project per subject area (e.g. “Q4 leases”, “2024 expense receipts”).
Documents
Documents are the source files you upload. Supported formats:
- PDF — most common; best results with text-based PDFs
- Microsoft Word (
.docx,.doc) - Microsoft Excel (
.xlsx) - PowerPoint (
.pptx,.ppt) - Images (PNG, JPG, JPEG) — text is extracted automatically
- Additional file types — enterprise customers can enable more file types; email support@parsewise.ai.
Parsewise parses each document on upload. Status values:
- Pending — waiting to be processed
- Processed — ready for extraction
- Error — something went wrong; re-upload or contact support
Extraction agents
Agents are AI-powered extractors that find specific information in your documents. Each agent:
- Has a name describing what it extracts (e.g. “Annual rent (USD)”).
- Has extraction instructions defining exactly what to look for.
- Has a value type —
stringornumber. - Has a unit (if
number): e.g.$,%,km,kg,days. - Has optional resolution instructions for combining or choosing from
multiple extracted values. Defaults:
- For numbers: select the most relevant value, preferring higher precision and more common values.
- For text: combine values into a concise, non-overlapping summary.
- Has optional inconsistency instructions to control when values are flagged as contradictory.
- Can have dimensions to segment extractions (e.g. by Company, by Year).
- Runs when you launch the project’s extraction pipeline.
Dimensions
Dimensions are categories that slice extracted data into meaningful groups. Examples:
- Company —
Acme Corp,Global Industries - Year —
2023,2024 - Quarter —
Q1,Q2,Q3,Q4 - Document — automatic in per-document mode, links each row to its source file
Dimensions can be static (predefined values the agent looks for) or dynamic (values discovered automatically during extraction).
Extractions and results
An extraction is a single value pulled from a document by an agent, with a link to the source page. Multiple extractions across documents are then consolidated by the resolver into a final result — one canonical value per dimension combination, with citations back to every contributing source.
Per-Document Mode
Controls how results are organised across documents.
ON:
- One row per document; a Document dimension is added to every agent.
- Example: extract “Invoice Total” from 50 invoices → 50 separate values.
- Use for batches of independent documents (invoices, receipts, forms).
OFF (default):
- Results are consolidated across all documents into one answer per cell.
- Custom dimensions provide structure (Year, Company, Category, etc.).
- Example: extract “Revenue” with a “Year” dimension → one value per year.
- Use for interrelated documents (amendments, decisions, filings).
Set the mode at project creation — switching later clears existing results, and the API doesn’t support flipping it on an existing project. See API → FAQ.
Project Context
Project Context is global background information that applies across your entire project. Open the Project Context menu on the Documents or Agents page. It has four configurable fields:
- Project Context (general) — document type, language, currency, date formats, numbering conventions, domain abbreviations.
- Text Extraction Instructions — translation preferences, image handling, formatting preservation, technical terms.
- Document Summary Instructions — focus areas, key information to include, structure preferences.
- Agent Context — instructions applied during entity extraction, resolution, and inconsistency checking. Useful for aliases and domain-specific interpretation rules.
You can copy all project context as JSON and paste it into another project for easy sharing.
Web search on agents
Agents can optionally run targeted web searches after extracting from
your documents — useful for public data like credit ratings, sanctions
lists, filings, benchmarks, and exchange rates. Off by default. See
enable_web_search below for details.
Navi
Navi is the in-app AI assistant. It can:
- Answer questions about documents with inline citations.
- Query extracted data (“show me all revenue values for 2024”).
- Create and update agents from natural-language descriptions.
- Run Python for calculations over extracted data.
- Search the web when enabled.
Agent design best practices
Naming agents
Agent names appear everywhere — in results columns, citations, Navi answers — so treat them like column headers in a spreadsheet.
- Use a short noun phrase describing the value, not the question.
Prefer
Counterparty nameoverWho is the counterparty?; preferNotice period (days)overHow many days is the notice period?. - Include units or format hints when they disambiguate.
Start date (ISO 8601),Purchase price (USD),Tenure (months). - Keep it under ~40 characters so it renders cleanly in table headers.
- Avoid project-specific jargon that a new teammate wouldn’t recognise — the agent often outlives the reason it was created.
- One concept per agent. If the name needs “and”, split it into two agents so each value has its own column and its own citations.
Writing good extraction tasks
The extraction task is the prompt the agent uses to find and normalise a value on each page.
- Describe the value first, then where to find it. “The effective date of the agreement, typically in the preamble or on the signature page” beats “Look at the first page”.
- Specify the output format explicitly. Dates as ISO 8601, currencies
as ISO 4217 codes, amounts as numbers without thousands separators,
booleans as
true/false. - Pick the right
value_type. Usestringfor text (including dates, booleans, codes, names); usenumberfor values you’ll aggregate or compare, paired with aunit. - Call out what not to extract. “Ignore illustrative examples in appendices”, “Do not extract dates from the document metadata”.
- Give disambiguation rules. “If multiple effective dates are present, prefer the one in the signature block over the preamble.”
- Prefer citations-first reasoning. “Return the clause verbatim in the citation, then normalise” pushes the agent to ground its answer.
Iterating on agents
Agent design is an empirical loop:
- Start narrow. Write the minimal extraction task against one well-understood document.
- Read every citation, not just the values. A citation tells you whether the agent found the right span — the value can look right for the wrong reason.
- Tighten the task for the specific failure mode. Avoid generic “be accurate” wording; describe the concrete distinction the agent missed.
- Expand to the full set once stable, and stop when iteration stops changing results.
How many values an agent produces
By default, an agent produces one value per project, aggregating evidence across every document. That’s right for “single answer” fields (a single counterparty name, a single effective date).
To get more than one value, attach a dimension:
- Custom dimensions — one value per instance (e.g. per clause, per party, per region).
- System
Documentdimension — one value per uploaded document, attached automatically whenper_document_modeis on.
Rules of thumb:
- No dimension when the whole project contributes to one answer.
- Custom dimension when the value repeats per entity inside the documents.
per_document_modeat project creation when each document is an independent subject.- Keep dimensions stable across agents that describe the same underlying entity, so their results join cleanly.
Per-agent capability flags
Two optional booleans on the agent payload change how extraction runs.
Both default to false. Turn them on per agent only when you need them
— they cost extra latency (and, for web search, external API usage).
enable_web_search
- What it does. After extracting from documents, the agent runs targeted web searches to supplement document-derived values with public web data. Web sources come back as citations alongside document citations.
- When to turn it on. The value exists partly outside your documents — credit ratings, sanctions list status, public filings, benchmarks, exchange rates.
- When to leave it off. The value exists entirely inside your uploaded documents, or the data is private and wouldn’t appear publicly.
- Cost / latency. Adds a
web_searchpipeline stage and external API calls; run time per agent is materially longer.
enable_complex_calculations_in_resolution
- What it does. Enables a Python-based resolver stage that can combine per-document extractions using arbitrary arithmetic (weighted averages, conditional sums, growth rates) instead of the default “pick the most relevant value” resolver.
- When to turn it on. Your agent needs to aggregate across documents — totals, averages, weighted blends, year-over-year deltas.
- When to leave it off. You want a single canonical value per cell and the default resolver’s heuristics are fine.
- Cost / latency. Adds a resolution-stage compute step per result.
Both flags are destructive — flipping either on or off clears the agent’s existing extractions, so the next launch re-extracts that agent across every document.
When to launch
Launching consumes compute and time, so batch changes intentionally:
- Re-launch after any agent edit and after any document upload.
- Prove new agents on a small project first — launches are project-scoped.
Plans
- Free ($0/month) — 25 chat messages, 50 document pages, 10 extraction agents, 1 user. Perfect for trying Parsewise.
- Growth ($249/month) — 200 chat messages, 750 document pages, 15 extraction agents, 2 users. Usage limits renew monthly.
- Enterprise (custom pricing) — unlimited usage, additional file types, custom integrations, API access, SSO, dedicated support, and custom security/compliance options.
API access is an entitlement the Parsewise team turns on per organisation. Enterprise customers typically have it enabled; Free and Growth plans don’t include API access by default.
To upgrade or discuss Enterprise options, email sales@parsewise.ai.
Security & compliance
Parsewise takes data protection seriously:
- Encryption at rest and in transit
- Role-based access controls
- SSO support on Enterprise plans
- Custom data residency and compliance options for Enterprise
Security certifications, trust center documentation, and compliance artefacts (NDAs, DPAs) are available from the Security page in the app. For security questions or to request documentation, email support@parsewise.ai.
Support
- Product questions, bugs, feature requests: support@parsewise.ai
- Larger Excel exports (beyond 10,000 cells per table): support@parsewise.ai
- Enterprise features, API access, custom integrations: sales@parsewise.ai
- In the app: ask Navi how to do something — it has the platform documentation built in.