extract.arkintel.com

Files in.Structured data out.

Send any file and a JSON schema. We map the file into the shape you asked for — even if the file and the schema look nothing alike. No chunking, no prompt engineering, no post-processing.

// we handle OCR, vision and metadata fusion, and schema validation. you write the schema. you send the files.

01SEND02EXTRACT{ }schema.jsonYOURSPNGDOCXPDFINVOICE.PDFPOST/v1/extract// six lenses on the same file01METADATAtraditional02OCRtraditional03LAYOUTtraditional04VISIONai05REASONINGaiARBITRATE06CONSENSUSfusionARKINTEL · EXTRACT ENGINEv1ai + traditional · arbitrated · one round tripmatches your schema{invoice_no:"INV-2026-0042"issue_date:"2026-04-21"total:1400.00currency:"EUR"}VALIDATED · TYPED · YOUR SHAPE

what you can send

documents
  • pdf
  • docx
  • doc
  • pptx
  • xlsx
  • csv
  • html
  • txt
  • md
images
  • jpg
  • png
  • heic
  • webp
  • tiff
  • bmp
  • gif
mail
  • eml
  • msg

// no audio, no video — yet

01playground

Try it now.

Pick a preset schema, drop in a sample file, and inspect the JSON we hand back. The playground uses the same extraction backend as production.

tap to activate

// presets for now. with API access you'd send your own schema in the request body.

sandboxed · no signup · nothing stored

02the engine

Six lenses on the file. One JSON in your shape.

Most extractors pick a horse — pure OCR, pure vision, or one giant LLM call — and lose what the others would have caught. We run six lanes against the same file in a single round trip, weigh them against each other, and only commit a value once they agree. The answer comes back in your schema, with your field names.

// no single method gets it right. we run six and let them argue.

// the three colours in the stack

  • traditional
  • ai
  • fusion
POST/v1/extract
multipart/form-data

// your schema

    vendorstringinvoice_nostringissued_atdatedue_atdatetotal_eurnumber

// your file

atlas_invoice.pdf

scanned · OCR

// six lenses on the same file

  • metadatatraditionalembedded text, EXIF, dates
  • ocrtraditionalprinted + handwritten characters
  • layouttraditionalcolumns, table cells, key/value
  • visionailogos, signatures, charts
  • reasoningaischema-aware llm pass
  • consensusfusioncross-checks every field

// merged answer — your shape

200 OK
{  "vendor":     "Atlas Logistics GmbH",  "invoice_no": "INV-2026-0418",  "issued_at":  "2026-04-12",  "due_at":     "2026-05-12",  "total_eur":  4280.50}

03security

Built for the files you can’t afford to leak.

Zero retention. Files and responses are deleted the moment we hand back your JSON, traffic is encrypted end-to-end, and your data never trains a model.

// where it runs is the next section — your hardware or our European cloud.

04the wire

One endpoint. Boring on purpose.

Multipart POST. The schema is a JSON form field, the files are file fields, the response is your data — validated, in your shape, in the same round trip. No clever protocol to learn.

// what you don’t have to do

  • install an SDK
  • wire up webhooks
  • speak a streaming protocol
  • presign upload URLs
  • track temporary file IDs
  • poll, batch, or coordinate jobs

// the whole integration fits on a postcard.

extract.sh
curl -X POST https://api.arkintel.com/v1/extract \  -H "Authorization: Bearer $ARKINTEL_API_KEY" \  -F 'schema=@invoice.schema.json;type=application/json' \  -F "files=@invoice.pdf"

06where it runs · sovereign by design

Your hardware,
or our European cloud.

Two places, same software. European jurisdiction, open-weight models, default-deny egress at the network — either way.

Self-hosted. Inside your perimeter.

Deploy the full stack inside your own data centre, your private cloud account, or an air-gapped rack. We install it, you run it — with our team on the other end of a support channel. Default-deny egress at the network layer, not the application. Air-gap optional, with signed offline update bundles when you choose it.

  • On-prem, private cloud, or air-gapped
  • Default-deny egress · enforced at network layer
  • Open weights — yours to keep

Arkintel-managed. EU-resident. Operated by us.

Deployed on Hetzner’s EU regions — Falkenstein, Helsinki, Nuremberg — a German-headquartered provider, operated end-to-end by Arkintel as a Dutch entity. Outside the reach of the US CLOUD Act and any foreign subpoena. Dedicated tenancy, EU-only egress, same APIs and audit story as the self-hosted path.

  • EU jurisdiction · no US exposure
  • Hetzner EU regions · operated by Arkintel
  • Dedicated tenancy · EU-only egress

Want the best frontier models too?Add Privacy Gate to your self-hosted stack

Need the deployment story in detail?Read the self-hosted deep-dive

03the app suite

Production AI apps,
live today.

Five live products — secure, team-ready, and fully traceable. We can host them for you or you can self-host. Select an app below to see how it works.

Interface · multi-modelopen demo

Chat

Private chat. Configured exactly to your liking.

A self-hosted chat platform that runs entirely in your own building. Designed to guarantee there is no leaking of data or sensitive information, making it the perfect choice for sensitive or regulated industries.

  • Runs entirely on your own infrastructure
  • Configured exactly to your specific requirements
  • No data leakage — safe for sensitive industries
  • Everything auditable, nothing egressing without a rule
runsself-hosted·EU cloud
ready
model

you

Compare quantum and classical encryption — one paragraph, plain English.

you

Auto

→ picked Llama

all-rounder, runs on your GPUs

sourcesnist.govwikipedia.orgarxiv.org
ask…
tools available

// taking it further

Wondering how the apps fit together?See the whole suite & customer builds

Need to deploy this inside your perimeter?How self-hosted works

Public sector, healthcare, legal, finance?See the regulated industries page

Want the best frontier models too?Add Privacy Gate to your self-hosted stack

the rest of the suite

Extract is usually one app in a pipeline.

The structured JSON Extract returns is the input — to a Knowledge index, a Chat conversation, or a Transcribe report. The four siblings are how it gets used.

05 — ship it

Ship your own schema in production.

The schemas in this playground are examples for testing. With production access you pass your own schema in every request — no whitelisting, no waiting on us. Drop us a line and we’ll get you a key.

// typical reply within one business day.

// contact

reading inbox

Email us — humans, not a ticket queue.