Through research initiatives and enterprise solutions, we make real-world impact. See how we turn complex projects into measurable success.
Working with leading European institutions is central to what we do.
We push the boundaries of AI and data science on a daily basis, growing these multi-year projects into advancements in critical fields, such as healthcare.
Audiocracker
Generiranje glazbe
ProSTRAT-AI
MULti-Tumour /MULTIR/
CopyDat - SyntDataHub
By training deep neural networks on datasets, we mastered the attention needed to isolate vocals and instruments. We have trained our proprietary models using our novel architecture.
Project code
NPOO.C1.1.2.R2-I3.02.0494
Co-financed by EU
124.278,38 €
Total value of the project
146.209,76 €
Project timeframe
01.12.2022. – 01.12.2024.
CONTACT
Dražen Horvat, drazen.horvat@atmc.ai
Beyond client work and EU consortia, we invest in self-directed research that pushes boundaries and builds capabilities. These projects tackle hard problems we believe matter—and often become the foundation for our commercial products.
Forms are deceptively complex. A checkbox relies on the question next to it, and a crossed-out entry shifts the meaning entirely. Current AI struggles to interpret these spatial nuances, particularly when forms mix printed structure with human input. This challenge is significantly harder for underserved languages that lack robust digital resources.
Our research bridges this gap by developing advanced capabilities to interpret optical marks, handwriting, and layout context.
We are building solutions that read forms as humans do, focusing initially on Croatian and Urdu to ensure accurate, context-aware data extraction where standard tools fail.
PyTorch
PyTorch Geometric
PaddlePaddle
Vision-Language Models
DocLayoutYOLO
LayoutLMv3

# layout_engine
document {
detect( <checkboxes>)
parse( <handwriting>)
relate( <fields → context>)
output( <structured_form>)
}
Detection and classification of optical marks (checkboxes, circles, crosses, corrections)
Handwritten text recognition (HTR) adapted for Croatian and Urdu
Graph neural networks (GNN) for understanding spatial-semantic relationships
Synthetic data generation augmented with real-world edge cases
User-defined extraction schemas for custom form structures
Production-ready module integrable with document processing platforms
Specialized ML models for optical mark detection and HTR
GNN architecture for understanding relationships between form elements
Annotated datasets for Croatian and Urdu forms (valuable research resource)
Evaluation benchmarks for PaddlePaddle, DocLayoutYOLO, VLMs, LayoutLMv3

# query_bridge
nlq {
interpret( <<user_prompt>>)
detect( <<intention>>)
find_relevant( <<schema_relations>>)
generate( <<sql_query>>)
}
Querying a database in plain language sounds simple, but the reality is complex. The gap between a human question and executable SQL is filled with ambiguity and context that standard models often miss. To get a correct answer, a system needs to understand the underlying business logic and data structure, not just match keywords.
Our research bridges this gap. We are developing systems that translate natural language into SQL based on true intent, combining deep schema understanding with multi-stage validation.
The goal is to deliver the exact data the user needs, eliminating the hallucinations and errors common in generative AI.
Research Focus
Intent disambiguation from natural language queries Schema-aware query generation Business context integration through knowledge graphs Multi-stage validation against actual database structures Hallucination prevention through constraint verification Privacy preservation
Key Challenges We're Solving
Mapping ambiguous language to precise query logic Handling federated queries across multiple data sources Understanding business terminology and domain-specific meanings Validating generated SQL before execution Supporting complex joins, aggregations, and nested queries
Visual data is everywhere: documents, product images, medical scans, even audio spectrograms. But extracting real value requires more than just detection; it demands a system that understands context, relationships, and meaning. Simple object recognition isn't enough to solve complex problems.
Our research combines traditional computer vision with modern vision-language models (VLMs) to bridge this gap.
From fine-grained object detection to visual question answering, we develop the core capabilities that power intelligent products across our diverse portfolio.
PyTorch
YOLO variants
LLaVA
Hugging Face Transformers
LayoutLM
# vision_core
nlq {
detect( <<objects>>)
embed( <<features>>)
reason( <<relationships>>)
answer( <<visual_query>>)
}
Object detection and tracking
Image segmentation and classification
Optical character recognition (OCR)
Visual question answering (VQA)
Fine-tuning vision-language models for domain-specific tasks
Feature extraction from complex documents (tables, forms, diagrams)
Hawk-a-Doc: Document layout detection, form understanding
Stem&Jam: Audio spectrogram analysis for source separation
Healthcare projects: Medical image segmentation and analysis
Thanks to our research that pushes the boundaries of what's possible, the work we do with enterprise clients delivers concrete business value.
We implement data and AI solutions with organizations across industries to solve the most pressing challenges of today.





































