Teaching Machines to See
I am the founder of Darmis AI. I specialize in training neural networks to map unstructured physical documents to clean, schema-validated structured outputs. Mapping warped pixels to computational variables, one line of code at a time.
Shlomo Stept
Applied ML Architect & Founder
Form-Native OCR Sandbox
Choose Source Document
Case Studies
InvoiceParser-v4
A specialized convolutional neural parser designed to map skewed receipts, multi-page billing sheets, and complex financial layout streams. Extracts nested values with strict compliance checks, replacing brittle, template-based scripts.
BiometricMatch-v2
A high-precision verification CNN trained to match standard passport scanned credentials directly to live security cameras. Employs advanced spatial matrix transformations to realign warped, high-contrast print anomalies in real-time.
Where I've Built
Founder & Chief Architect
Building Darmis AI, a form-native document intelligence platform designed to extract clean, nested structure values directly from complex visual forms without template fragility. Designing end-to-end vision neural pipelines and structural tokenization.
Applied ML Research Engineer
Researched and trained deep convolutional neural networks (CNNs) and transformer models optimized for visual parsing tasks. Refined affine transformation matrices and bounding box matching, improving model inference throughput by 22%.
Software Engineer (ML Core)
Engineered highly performant pixel processing algorithms and structural file tokenization libraries. Built baseline synthetic document datasets that established robust training runs for early Document OCR experiments.
System Expertise
Structured Document AI
Extracting nested, schema-compliant JSON objects from complex paper invoice forms and ID formats.
Vision Transformers
Fine-tuning vision transformer backbones to parse character fragments under high-noise inputs.
Affine Alignment Math
Mapping perspective distortions into coordinate grids for scanning matrix corrections.
Published Writings
Teaching Neural Nets to Read Warped Receipts
Applying convolutional mesh grids and perspective transform matrices to fix skewed camera inputs.
Structured Outputs: Beyond Regex in Visual Parsers
Why standard character matching falls short on unstructured forms and how native JSON encoders stabilize layouts.
Optimizing Vision Transformers for Mobile Browsers
How to structure and pack weights to run real-time visual boarding pass scanning at 30fps client-side.