Agentic Data Layer

Profile. Inspect. Extract. Validate. Cite.
ADX gives your AI agents structured document tools.

Documents are not text blobs. Give your agents document tools.

Get Started Agent Tools View on GitHub

How It Works

profile

inspect

extract

validate

cite

Upload any document. ADX wraps best-in-class parsers and exposes a canonical, citeable, inspectable document model to your agents.

Without ADX

Agent: reads invoice.pdf as raw text
Agent: tries regex on flattened string
Agent: misses table spanning pages 2-3
Agent: extracts wrong total
Agent: no way to verify the answer

Raw text dumps lose structure, tables, and provenance. The agent hallucinates fields it cannot cite.

With ADX

python

from adx import ADX
dn = ADX()
doc_id = dn.upload("invoice.pdf")
profile = dn.profile(doc_id)
extraction = dn.extract(doc_id, schema="invoice")
result = dn.validate(doc_id, extraction.id)

json

{"field": "total", "value": 4250.00,
 "confidence": 0.95,
 "citation": {"page": 3, "bbox": [72, 680, 540, 700]}}

Every field is extracted with a citation. Every value is validated.

What ADX Does

ADX is not a parser. It is an orchestration layer that:

Wraps best-in-class parsers (PyMuPDF, openpyxl, csv) behind a uniform interface
Builds a canonical DocumentGraph from any supported file format
Exposes structured, read-only agent tools for inspection and search
Extracts fields using built-in schemas with field-level citations
Validates results with rule-based checks (types, arithmetic, citations)

Quick Start

bash

pip install adx

# Start the server
adx serve

# Upload and profile a document
curl -X POST http://localhost:8000/v1/files -F "file=@invoice.pdf"
curl http://localhost:8000/v1/files/{id}/profile

Or use the Python SDK directly:

python

from adx import ADX

dn = ADX()
doc_id = dn.upload("invoice.pdf")
print(dn.profile(doc_id))
print(dn.structure(doc_id))
extraction = dn.extract(doc_id, schema="invoice")
print(dn.validate(doc_id, extraction.id))

Agent Tools

Nine read-only, deterministic tools that return structured JSON:

profile_document

File metadata, page count, detected type, confidence scores, and recommended next tools.

list_structure

Headings, sections, table locations, and page-level outline for navigation.

search_document

Full-text search with page/cell locations and surrounding context snippets.

get_page / get_table

Retrieve a specific page's text blocks or a table's rows with bounding boxes.

list_sheets / read_range

Navigate spreadsheet sheets and read cell ranges with formulas and computed values.

find_cells / inspect_formula

Search cells by value or pattern. Trace formula dependencies and precedents.

Schema Extraction

Built-in schemas for invoices, contracts, and financial models. Field-level citations included.

Validation Engine

Rule-based checks: required fields, type coercion, arithmetic verification, citation provenance.

Integrations

Python SDK

Upload, inspect, extract, and validate documents with a single client object.

python

from adx import ADX

dn = ADX()
doc_id = dn.upload("report.pdf")
profile = dn.profile(doc_id)
tables = dn.structure(doc_id)["tables"]
data = dn.get_table(doc_id, tables[0]["id"])

REST API

Full HTTP API for language-agnostic integration. Start with adx serve.

bash

# Upload
curl -X POST localhost:8000/v1/files -F "file=@model.xlsx"

# Extract with schema
curl -X POST localhost:8000/v1/files/{id}/extract \
  -H "Content-Type: application/json" \
  -d '{"schema": "financial_model"}'

CLI

Command-line tools for scripting and CI pipelines.

bash

adx upload invoice.pdf
adx profile <id>
adx extract <id> --schema invoice
adx validate <id> --extraction <eid>
adx export <id> --format markdown

Supported Formats

Format	Parser	Features
PDF	PyMuPDF	Text blocks, tables, bounding boxes, section detection
Excel (.xlsx)	openpyxl	Sheets, formulas, named ranges, hidden cells, merged cells
Excel (.xls)	xlrd	Legacy Excel format support
DOCX	python-docx	Paragraphs, headings, tables, images, metadata
PPTX	python-pptx	Slides, text shapes, tables, layout extraction
RTF	striprtf	Text extraction with formatting stripped
CSV	stdlib csv	Dialect sniffing, encoding detection, header inference

Learn More

Quickstart for a five-minute walkthrough.
Agent Tools for the full tool reference.
PDF Guide for PDF-specific features.
Excel Guide for spreadsheet-specific features.
API Reference for every endpoint.
Examples for real-world extraction workflows.

Agentic Data Layer

How It Works ​

Without ADX

With ADX

What ADX Does ​

Quick Start ​

Agent Tools ​

profile_document

list_structure

search_document

get_page / get_table

list_sheets / read_range

find_cells / inspect_formula

Schema Extraction

Validation Engine

Integrations ​

Python SDK

REST API

CLI

Supported Formats ​

Learn More ​

How It Works

What ADX Does

Quick Start

Agent Tools

Integrations

Supported Formats

Learn More