AI-Native Analytics Platform
A governed data platform designed for AI reasoning — Apache Iceberg lakehouse, dbt semantic layer, and a Claude-powered analyst that answers business questions with explainable, trusted metrics.
Overview
AI analysts that read raw CSV files are not AI analysts. They are sophisticated pattern-matchers operating on inconsistent, ungoverned, untrustworthy data. The results they produce reflect that — hallucinated metrics, contradictory answers, business context the model cannot possibly have.
This project demonstrates the alternative: an AI-native analytics platform built on a governed data foundation. Apache Iceberg for the lakehouse layer, dbt for transformation and metric governance, a semantic layer as the contract between data and AI, and Claude as the reasoning engine. The result is an AI analyst that gives explainable, consistent, trustworthy answers — because the data beneath it deserves that trust.
Problem & Challenges
The core problem is not AI capability. Modern LLMs can reason over data fluently. The problem is data quality, consistency, and context. Without a governed data foundation, AI analytics produces confident-sounding answers that are wrong in ways that are hard to detect.
- AI reading CSV files directly has no business context — it does not know that 'revenue' means net collected, not gross billed, or that sessions shorter than 5 minutes are test records
- Inconsistent source data produces inconsistent answers — the same question asked twice returns different numbers because the underlying data has no single source of truth
- Metric definitions live in analyst heads, not in code — making it impossible for an AI to apply them consistently
- Without a semantic layer, AI must guess at column meanings, table relationships, and business logic — and guesses compound into hallucinations
- Trust collapses quickly — one wrong AI-generated number that reaches a business decision destroys confidence in the entire system
Architecture
A modern AI-native analytics stack built in layers, each layer adding governance, trust, and context that makes AI reasoning reliable.
Data Flow
Semantic Layer
The semantic layer is the architectural contract between raw data and AI reasoning. It translates physical table structures into business concepts that both humans and AI understand and trust.
AI Layer
The AI layer is the interface between business questions and governed data. It does not replace analysts — it makes their work accessible to anyone in the organisation.
Without DWH vs With DWH
- Inconsistent answers — the same metric returns different values on different days
- Duplicated logic — revenue calculation differs between CSV files with no reconciliation
- Hallucinations — AI infers column meanings that don't match business definitions
- Missing business context — test records, cancelled transactions, and edge cases are not filtered
- No explainability — the AI cannot show its working because the working is guesswork
- Trusted metrics — every number comes from a single canonical definition enforced as code
- Reusable definitions — business logic is written once, tested continuously, used everywhere
- Explainable answers — the AI shows which metric definition it used and why
- Scalable analytics — adding a new data source extends the platform without rebuilding the AI layer
- Consistent reasoning — the same question always produces the same answer from the same governed data
Technologies
Results
- AI analyst answers business questions with consistent, explainable, governed metrics
- Zero hallucinated metric definitions — business logic enforced by dbt contracts
- Natural language interface reduces time-to-insight from hours to seconds
- Semantic layer serves as a single source of truth for both AI and human analysts
- Platform architecture is extensible — new data sources add to the semantic layer without disrupting existing AI queries
Lessons Learned
Future Vision
- Autonomous analytics agents that proactively surface insights without being asked — monitoring KPIs, detecting anomalies, and generating executive summaries on a schedule
- Multi-agent architectures where specialised AI analysts handle different business domains — revenue, operations, customer — and collaborate to answer cross-domain questions
- Real-time AI analytics over streaming data — extending the governed semantic layer to Kafka event streams for sub-second AI-powered operational intelligence
- Self-updating semantic layers where the AI proposes new metric definitions based on observed query patterns, subject to human review and governance approval