Finance Doc Parser
Verifiedby Dryade
Description
Extract structured data from financial documents: invoices, balance sheets, income statements, and bank statements
Screenshots
Details
Finance Document Parser
Tier: Starter | Type: Tool | Category: Finance | Version: 1.0.0
Extract structured, machine-readable data from common financial documents including invoices, balance sheets, income statements, and bank statements. Reduces manual data entry from hours to seconds.
1. Overview
Plugin Name: Finance Document Parser Slug: finance-doc-parser Required Tier: starter Plugin Type: tool (REST API endpoints) Category: Finance Author: Dryade License: DSUL
What It Does
Parses financial documents and extracts all key fields into structured JSON. Supports four document types commonly processed by finance teams: invoices, balance sheets, income statements, and bank statements.
Key Capabilities
- Invoice parsing: vendor details, line items, amounts, VAT, payment terms
- Balance sheet extraction: assets, liabilities, equity with full breakdown
- Income statement parsing: revenue, expenses, margins, period data
- Bank statement processing: transaction list, categorization, running balances
- Batch processing for multiple documents at once
2. User Stories
Primary User Stories
US-1: Automate Invoice Data Entry
As a bookkeeper, I want to extract invoice data automatically so that I can eliminate manual data entry and reduce errors.
Acceptance Criteria:
- [ ] Upload invoice content and receive structured JSON with all fields
- [ ] Vendor name, amounts, line items, and dates correctly extracted
- [ ] VAT/tax amounts identified separately
US-2: Build Financial Models from Statements
As a financial analyst, I want to parse balance sheets and income statements so that I can feed clean data into my models without manual transcription.
Acceptance Criteria:
- [ ] Balance sheet returns assets/liabilities/equity breakdown
- [ ] Income statement returns revenue/expenses/margins
- [ ] Data is consistent and machine-readable
Edge Cases
- Unsupported document type: Returns clear error with list of supported types
- Malformed document content: Returns success=false with descriptive error message
- Empty content: Handled gracefully, returns mock data in default mode
3. Architecture
Component Diagram
+------------------+ +------------------+ +------------------+
| Plugin Router | --> | Parse Logic | --> | Data Provider |
| /finance-doc- | | routes.py | | (mock / real) |
| parser/* | +------------------+ +------------------+
+------------------+ |
+-----v------+
| Demo Data |
| data/*.json|
+------------+
Components
| Component | File | Responsibility |
|-----------|------|----------------|
| Router | routes.py | API endpoints, request validation |
| Plugin | plugin.py | Plugin lifecycle, config management |
| Data | data/ | Demo datasets (5 JSON files) |
Dependencies
- Internal: core.plugins.PluginProtocol, core.plugin_config_store.PluginConfigStore
- External: None (standard library only in mock mode)
- Plugin: None
4. API Spec
REST Endpoints
| Method | Path | Description | Auth |
|--------|------|-------------|------|
| POST | /api/plugins/finance-doc-parser/parse | Parse a single document | Yes |
| POST | /api/plugins/finance-doc-parser/batch | Parse multiple documents | Yes |
| GET | /api/plugins/finance-doc-parser/supported-types | List supported document types | No |
| GET | /api/plugins/finance-doc-parser/status | Health check | No |
Request/Response Examples
Parse Document
// Request
{
"document_type": "invoice",
"content": "<invoice text or JSON>"
}
// Response
{
"success": true,
"document_type": "invoice",
"extracted_fields": {
"vendor": {"name": "Dupont Technologies SAS"},
"total_amount": 30216.00,
"currency": "EUR"
},
"summary": "Invoice from Dupont Technologies SAS for EUR 30,216.00",
"confidence": 0.95
}
5. Data Flow
Processing Pipeline
1. User submits document via POST /parse or /batch
2. Router validates request against Pydantic models
3. Mock mode loads pre-parsed data from data/ directory
4. Summarizer generates human-readable summary
5. Response returned with extracted fields and confidence score
Demo Data Description
The data/ directory contains:
sample_invoice.json: French vendor invoice with 3 line items (EUR 30,216)sample_balance_sheet.json: Q3 2025 balance sheet (EUR 5.4M total assets)sample_income_statement.json: Q3 2025 P&L (EUR 2.5M revenue, 9.8% net margin)sample_bank_statement.json: October 2025 bank statement (15 transactions)sample_invoice_batch.json: 3 invoices for batch processing demo
Total: 5 demo files covering all supported document types.
6. Security Considerations
Data Handling
- PII: No - demo data uses fictional companies only
- Encryption: N/A in mock mode; real mode should use HTTPS for API calls
- Data Retention: No data persisted; stateless request/response
External API Keys
No external API keys required in mock mode.
Isolation
- Plugin runs in sandboxed context via core plugin loader
- No direct database access -- uses core API only
- No file writes outside plugin directory
7. Test Plan
Test Classes
| Class | Tests | Coverage Target |
|-------|-------|----------------|
| TestPluginAttributes | 6 | 100% manifest fields |
| TestPluginRouter | 5 | All routes |
| TestPluginConfig | 2 | Config validation |
| TestDemoData | 8 | All data files |
Running Tests
cd dryade-plugins
python -m pytest starter/finance_doc_parser/tests/ -x -v --tb=short
8. Deployment Notes
Requirements
No additional packages required beyond Dryade core.
Configuration
Default plugin configuration (set via plugin settings UI or API):
{
"data_source": "mock"
}
Compatibility
- Min Dryade Version: 1.0.0
- Python: >=3.11
9. User Guide
Getting Started
- Ensure your Dryade instance has a starter tier license or higher
- Install the plugin via the marketplace or
dryade-pm push - Use the
/parseendpoint to extract data from a financial document - Use
/batchfor processing multiple documents at once
Common Workflows
Workflow 1: Parse a Single Invoice
- POST to
/parsewithdocument_type: "invoice"and invoice content - Receive structured JSON with vendor, line items, amounts
- Feed extracted data into your accounting system
Workflow 2: Batch Process Bank Statements
- POST to
/batchwith array of bank statement documents - Receive per-document results with transaction lists
- Reconcile against internal records
10. Screenshots
Screenshots will be added after UI integration.
11. Changelog
1.0.0 (2026-03-05)
- Initial release
- Invoice, balance sheet, income statement, and bank statement parsers
- Mock data with 5 sample documents
- Batch processing endpoint
- Supported types discovery endpoint
Future Roadmap
- [ ] Real-mode parsing with LLM extraction
- [ ] PDF document support
- [ ] Multi-currency normalization
- [ ] Historical document comparison
Requires starter tier subscription