We are looking for an experienced NLP / AI Engineer to design and build a fully offline intelligence system using a private library of 200+ historical and classical text volumes.
The role covers the full pipeline: high-precision OCR, structured text extraction, LLM fine-tuning, citation-accurate RAG, and a standalone desktop application.
- Responsibilities
• Process large collections of PDF volumes (thousands of pages) with high-accuracy OCR.
• Clean and normalize text by removing footnotes, editor notes, and modern annotations while preserving core content.
• Structure and embed rich metadata (author, title, volume, page, date).
• Fine-tune open-source LLMs (LLaMA, Mistral, or similar) for historical linguistic patterns.
• Build citation-strict RAG systems with precise references for every response.
• Develop a 100% offline desktop interface with automated ingestion pipelines.
• Iteratively refine model performance through continuous testing and collaboration.
- Requirements
• Proven experience in NLP and LLM fine-tuning (LoRA / QLoRA, quantization).
• Strong background in RAG systems and vector databases.
• Hands-on experience with high-precision OCR for complex documents.
• Proficiency in Python for local deployment (PyQt, Streamlit, or similar).
- Application
• Please submit a brief description of your technical approach and relevant experience.
Apply Now
Apply Now