Send any file and a JSON schema. We map the file into the shape you asked for — even if the file and the schema look nothing alike. No chunking, no prompt engineering, no post-processing.
// we handle OCR, vision and metadata fusion, and schema validation. you write the schema. you send the files.
what you can send
- docx
- doc
- pptx
- xlsx
- csv
- html
- txt
- md
- jpg
- png
- heic
- webp
- tiff
- bmp
- gif
- eml
- msg
// no audio, no video — yet
what you can send
- docx
- doc
- pptx
- xlsx
- csv
- html
- txt
- md
- jpg
- png
- heic
- webp
- tiff
- bmp
- gif
- eml
- msg
// no audio, no video — yet
01playground
Try it now.
Pick a preset schema, drop in a sample file, and inspect the JSON we hand back. The playground uses the same extraction backend as production.
// presets for now. with API access you'd send your own schema in the request body.
sandboxed · no signup · nothing stored
02the engine
Six lenses on the file. One JSON in your shape.
Most extractors pick a horse — pure OCR, pure vision, or one giant LLM call — and lose what the others would have caught. We run six lanes against the same file in a single round trip, weigh them against each other, and only commit a value once they agree. The answer comes back in your schema, with your field names.
// no single method gets it right. we run six and let them argue.
// the three colours in the stack
- traditional
- ai
- fusion
// your schema
- vendorstringinvoice_nostringissued_atdatedue_atdatetotal_eurnumber
// your file
atlas_invoice.pdf
scanned · OCR
// six lenses on the same file
- metadatatraditionalembedded text, EXIF, dates
- ocrtraditionalprinted + handwritten characters
- layouttraditionalcolumns, table cells, key/value
- visionailogos, signatures, charts
- reasoningaischema-aware llm pass
- consensusfusioncross-checks every field
// merged answer — your shape
{ "vendor": "Atlas Logistics GmbH", "invoice_no": "INV-2026-0418", "issued_at": "2026-04-12", "due_at": "2026-05-12", "total_eur": 4280.50}03security
Built for the files you can’t afford to leak.
Zero retention. Files and responses are deleted the moment we hand back your JSON, traffic is encrypted end-to-end, and your data never trains a model.
// where it runs is the next section — your hardware or our European cloud.
04the wire
One endpoint. Boring on purpose.
Multipart POST. The schema is a JSON form field, the files are file fields, the response is your data — validated, in your shape, in the same round trip. No clever protocol to learn.
// what you don’t have to do
- install an SDK
- wire up webhooks
- speak a streaming protocol
- presign upload URLs
- track temporary file IDs
- poll, batch, or coordinate jobs
// the whole integration fits on a postcard.
curl -X POST https://api.arkintel.com/v1/extract \ -H "Authorization: Bearer $ARKINTEL_API_KEY" \ -F 'schema=@invoice.schema.json;type=application/json' \ -F "files=@invoice.pdf"06where it runs · sovereign by design
Your hardware,
or our European cloud.
Two places, same software. European jurisdiction, open-weight models, default-deny egress at the network — either way.
Self-hosted. Inside your perimeter.
Deploy the full stack inside your own data centre, your private cloud account, or an air-gapped rack. We install it, you run it — with our team on the other end of a support channel. Default-deny egress at the network layer, not the application. Air-gap optional, with signed offline update bundles when you choose it.
- On-prem, private cloud, or air-gapped
- Default-deny egress · enforced at network layer
- Open weights — yours to keep
Arkintel-managed. EU-resident. Operated by us.
Deployed on Hetzner’s EU regions — Falkenstein, Helsinki, Nuremberg — a German-headquartered provider, operated end-to-end by Arkintel as a Dutch entity. Outside the reach of the US CLOUD Act and any foreign subpoena. Dedicated tenancy, EU-only egress, same APIs and audit story as the self-hosted path.
- EU jurisdiction · no US exposure
- Hetzner EU regions · operated by Arkintel
- Dedicated tenancy · EU-only egress
Want the best frontier models too?Add Privacy Gate to your self-hosted stack
Need the deployment story in detail?Read the self-hosted deep-dive
03the app suite
Production AI apps,
live today.
Five live products — secure, team-ready, and fully traceable. We can host them for you or you can self-host. Select an app below to see how it works.
Chat
Private chat. Configured exactly to your liking.
A self-hosted chat platform that runs entirely in your own building. Designed to guarantee there is no leaking of data or sensitive information, making it the perfect choice for sensitive or regulated industries.
- Runs entirely on your own infrastructure
- Configured exactly to your specific requirements
- No data leakage — safe for sensitive industries
- Everything auditable, nothing egressing without a rule
you
Compare quantum and classical encryption — one paragraph, plain English.
Auto
→ picked Llama
— all-rounder, runs on your GPUs
// taking it further
Wondering how the apps fit together?See the whole suite & customer builds
Need to deploy this inside your perimeter?How self-hosted works
Public sector, healthcare, legal, finance?See the regulated industries page
Want the best frontier models too?Add Privacy Gate to your self-hosted stack
—the rest of the suite
Extract is usually one app in a pipeline.
The structured JSON Extract returns is the input — to a Knowledge index, a Chat conversation, or a Transcribe report. The four siblings are how it gets used.
05 — ship it
Ship your own schema in production.
The schemas in this playground are examples for testing. With production access you pass your own schema in every request — no whitelisting, no waiting on us. Drop us a line and we’ll get you a key.
// typical reply within one business day.