Alpha release. Install with pip install adx. APIs may change before v1.0. Roadmap →
Skip to content

Agentic Data Layer

Profile. Inspect. Extract. Validate. Cite.
ADX gives your AI agents structured document tools.

Documents are not text blobs. Give your agents document tools.

How It Works

profile
inspect
extract
validate
cite

Upload any document. ADX wraps best-in-class parsers and exposes a canonical, citeable, inspectable document model to your agents.

AI AgentClaude, GPT,or any MCP clientADXprofile_documentsearch_documentextract (schema)validate + citeread_range, formulasParsersPyMuPDF, openpyxlcsv, custom

Without ADX

Agent: reads invoice.pdf as raw text
Agent: tries regex on flattened string
Agent: misses table spanning pages 2-3
Agent: extracts wrong total
Agent: no way to verify the answer

Raw text dumps lose structure, tables, and provenance. The agent hallucinates fields it cannot cite.

With ADX

python
from adx import ADX
dn = ADX()
doc_id = dn.upload("invoice.pdf")
profile = dn.profile(doc_id)
extraction = dn.extract(doc_id, schema="invoice")
result = dn.validate(doc_id, extraction.id)
json
{"field": "total", "value": 4250.00,
 "confidence": 0.95,
 "citation": {"page": 3, "bbox": [72, 680, 540, 700]}}

Every field is extracted with a citation. Every value is validated.

What ADX Does

ADX is not a parser. It is an orchestration layer that:

  1. Wraps best-in-class parsers (PyMuPDF, openpyxl, csv) behind a uniform interface
  2. Builds a canonical DocumentGraph from any supported file format
  3. Exposes structured, read-only agent tools for inspection and search
  4. Extracts fields using built-in schemas with field-level citations
  5. Validates results with rule-based checks (types, arithmetic, citations)

Quick Start

bash
pip install adx

# Start the server
adx serve

# Upload and profile a document
curl -X POST http://localhost:8000/v1/files -F "file=@invoice.pdf"
curl http://localhost:8000/v1/files/{id}/profile

Or use the Python SDK directly:

python
from adx import ADX

dn = ADX()
doc_id = dn.upload("invoice.pdf")
print(dn.profile(doc_id))
print(dn.structure(doc_id))
extraction = dn.extract(doc_id, schema="invoice")
print(dn.validate(doc_id, extraction.id))

Agent Tools

Nine read-only, deterministic tools that return structured JSON:

profile_document

File metadata, page count, detected type, confidence scores, and recommended next tools.

list_structure

Headings, sections, table locations, and page-level outline for navigation.

search_document

Full-text search with page/cell locations and surrounding context snippets.

get_page / get_table

Retrieve a specific page's text blocks or a table's rows with bounding boxes.

list_sheets / read_range

Navigate spreadsheet sheets and read cell ranges with formulas and computed values.

find_cells / inspect_formula

Search cells by value or pattern. Trace formula dependencies and precedents.

Schema Extraction

Built-in schemas for invoices, contracts, and financial models. Field-level citations included.

Validation Engine

Rule-based checks: required fields, type coercion, arithmetic verification, citation provenance.

Integrations

Python SDK

Upload, inspect, extract, and validate documents with a single client object.

python
from adx import ADX

dn = ADX()
doc_id = dn.upload("report.pdf")
profile = dn.profile(doc_id)
tables = dn.structure(doc_id)["tables"]
data = dn.get_table(doc_id, tables[0]["id"])

REST API

Full HTTP API for language-agnostic integration. Start with adx serve.

bash
# Upload
curl -X POST localhost:8000/v1/files -F "file=@model.xlsx"

# Extract with schema
curl -X POST localhost:8000/v1/files/{id}/extract \
  -H "Content-Type: application/json" \
  -d '{"schema": "financial_model"}'

CLI

Command-line tools for scripting and CI pipelines.

bash
adx upload invoice.pdf
adx profile <id>
adx extract <id> --schema invoice
adx validate <id> --extraction <eid>
adx export <id> --format markdown

Supported Formats

FormatParserFeatures
PDFPyMuPDFText blocks, tables, bounding boxes, section detection
Excel (.xlsx)openpyxlSheets, formulas, named ranges, hidden cells, merged cells
Excel (.xls)xlrdLegacy Excel format support
DOCXpython-docxParagraphs, headings, tables, images, metadata
PPTXpython-pptxSlides, text shapes, tables, layout extraction
RTFstriprtfText extraction with formatting stripped
CSVstdlib csvDialect sniffing, encoding detection, header inference

Learn More

Documents are not text blobs. Give your agents document tools.