How DocTranslate AI works.Format by format.
This page explains how the document workflow moves from upload to translation, rebuilding, and PDF verification.
PDF files can be certified and verified. DOCX, TXT, and MD are translation-only outputs.
The stack, in plain text.
Each part handles a specific step in the document flow.
pdfjs-distPDF parserUsed to read PDF text in the browser for translation and verification.
pdf-libPDF builderUsed to rebuild translated PDFs locally and embed certification metadata.
JSZip + OpenXMLDOCX rebuilderUsed to rebuild DOCX files by replacing text in supported document parts.
Supabase Edge FunctionTranslator proxyReceives text chunks, applies verification and rate limits, and forwards translation requests.
Third-party LLM APITranslatorProduces the translated text from the extracted content.
crypto.subtleHasherComputes SHA-256 hashes for PDF certification and verification.
The pipeline, expanded.
This is the document flow from upload to download.
- Parse the file in your browser.
PDF, DOCX, TXT, and MD are read locally first. The app extracts the text needed for translation and rebuilds the output by format.
- Detect source language when needed.
If source language is set to auto, the app samples the extracted text and requests a language label before translation starts.
- Chunk extracted text for translation.
Extracted text is split into smaller parts so translation requests stay within processing limits.
- Translate through the edge function.
Text chunks are sent to the translation function, which applies verification, rate limits, and forwards the request to the translation provider.
- Rebuild the output by format.
PDFs are rebuilt with layout-aware positioning. DOCX, TXT, and MD are rebuilt as translated files in their original format.
- Add certification only for PDFs.
Only PDFs continue into certification, metadata, and the appended certificate page.
- Verify only certified PDFs later.
The /verify page recomputes the PDF text hash locally and compares it with the value stored in the file.
What stays local, what goes online.
The browser-first architecture is useful, but it is not the same thing as a fully offline workflow.
Parsing and rebuilding start in the browser, which keeps the original file local during the first stage of the process.
Translation still requires extracted text to move through verification, rate-limiting, and model-inference steps provided by external services. That is why this product should not be described as private by design in an absolute sense.
If the document is sensitive enough that external processing is unacceptable, the safer choice is not to use this workflow for that file.
What this project does not do.
The scope is intentionally limited.
- No accounts or history.
There is no login system and no stored translation history.
- No personal data collection.
The tool does not require account creation or profile data to translate a document.
- No certification for every format.
Certification and verification are available for PDFs only.
- No translation memory or glossary support.
Each translation is processed independently, without stored terminology or custom vocabulary.
- No human review.
The output is generated automatically and is not reviewed by a person.
- Language support is still limited.
The current public language list is still limited in this phase.
Workflow FAQ.
Short answers to the questions people often infer from the browser-first model.
Does DocTranslate AI upload the original file to a server?
The workflow starts by parsing the file in the browser first. Translation still requires extracted text to move through external services, but the product is not described as uploading the original file archive itself as the first step.
What happens differently for PDFs?
PDF is the only format that continues into certification and later verification. DOCX, TXT, and MD stop at translation and rebuild.
Does verification prove legal authenticity?
No. Verification proves integrity only. It checks whether the current PDF still matches the certified state encoded into that file by this product.
Need the public policy layer?Read the documents.
The privacy page explains handling. The terms page explains responsibility. The about page explains the scope of the project in plain language.