Alpha release. Install with pip install adx. APIs may change before v1.0. Roadmap →
Skip to content

Chunking

ADX provides document chunking for retrieval-augmented generation (RAG) pipelines. Chunks preserve citations back to source pages and sections.

Quick Start

python
from adx import ADX

dn = ADX()
graph = dn.upload("report.pdf")
chunks = dn.chunk(graph.document.id, strategy="section_aware")

for chunk in chunks:
    print(chunk.text[:100], f"({chunk.token_count} tokens)")

Strategies

section_aware (default)

Splits at section boundaries detected from document headings. Chunks respect the logical structure of the document.

python
chunks = dn.chunk(file_id, strategy="section_aware", max_chunk_size=1000, overlap=200)

fixed_size

Splits text into fixed-size chunks by token count with configurable overlap.

python
chunks = dn.chunk(file_id, strategy="fixed_size", max_chunk_size=500, overlap=100)

table_only

Extracts only table content as chunks — useful when you only need structured data.

python
chunks = dn.chunk(file_id, strategy="table_only")

Parameters

ParameterTypeDefaultDescription
strategystr"section_aware"Chunking strategy
max_chunk_sizeint1000Maximum tokens per chunk
overlapint200Token overlap between chunks

Chunk Object

Each chunk includes:

FieldDescription
idUnique chunk identifier
textChunk text content
chunk_typeType: section, fixed, table
section_pathHeading hierarchy path (section_aware only)
page_numbersSource page numbers
sheet_nameSource sheet (spreadsheets only)
token_countEstimated token count
citationsSource citations with page/bbox references

REST API

bash
curl -X POST http://localhost:8000/v1/files/{file_id}/chunks \
  -H "Content-Type: application/json" \
  -d '{"strategy": "section_aware", "max_chunk_size": 1000, "overlap": 200}'

MCP Tool

The adx_chunk tool is available via MCP:

json
{
  "file_id": "abc123",
  "strategy": "section_aware",
  "max_chunk_size": 1000
}

Documents are not text blobs. Give your agents document tools.