DENIS IL.
Flagship ProjectAI Workflow
Active Dev

AI-Native Analytics Platform

A governed data platform designed for AI reasoning — Apache Iceberg lakehouse, dbt semantic layer, and a Claude-powered analyst that answers business questions with explainable, trusted metrics.

Apache IcebergdbtSemantic LayerClaudeS3
01

Overview

AI analysts that read raw CSV files are not AI analysts. They are sophisticated pattern-matchers operating on inconsistent, ungoverned, untrustworthy data. The results they produce reflect that — hallucinated metrics, contradictory answers, business context the model cannot possibly have.

This project demonstrates the alternative: an AI-native analytics platform built on a governed data foundation. Apache Iceberg for the lakehouse layer, dbt for transformation and metric governance, a semantic layer as the contract between data and AI, and Claude as the reasoning engine. The result is an AI analyst that gives explainable, consistent, trustworthy answers — because the data beneath it deserves that trust.

02

Problem & Challenges

The core problem is not AI capability. Modern LLMs can reason over data fluently. The problem is data quality, consistency, and context. Without a governed data foundation, AI analytics produces confident-sounding answers that are wrong in ways that are hard to detect.

  • AI reading CSV files directly has no business context — it does not know that 'revenue' means net collected, not gross billed, or that sessions shorter than 5 minutes are test records
  • Inconsistent source data produces inconsistent answers — the same question asked twice returns different numbers because the underlying data has no single source of truth
  • Metric definitions live in analyst heads, not in code — making it impossible for an AI to apply them consistently
  • Without a semantic layer, AI must guess at column meanings, table relationships, and business logic — and guesses compound into hallucinations
  • Trust collapses quickly — one wrong AI-generated number that reaches a business decision destroys confidence in the entire system
03

Architecture

A modern AI-native analytics stack built in layers, each layer adding governance, trust, and context that makes AI reasoning reliable.

SOURCES
Raw Data
Operational Systems · Event Streams · APIs
INGEST
S3 / Object Store
Scalable storage · decoupled from compute
WAREHOUSE
Apache Iceberg
ACID transactions · schema evolution · time travel
TRANSFORM
DBT
Medallion layers · metric contracts · tests
SEMANTIC
Semantic Layer
Business context · governed metrics · NL mappings
REPORTING
AI Analyst
Claude · governed queries · explainable answers
04

Data Flow

Ingestion
Raw operational data lands in S3 via streaming or batch pipelines. Apache Iceberg provides ACID semantics, schema evolution, and time-travel queries over the raw layer.
Transformation
dbt models transform raw Iceberg tables through staging (cleaning, typing) to mart (business logic, aggregations). Every transformation is version-controlled and testable.
Metric Governance
dbt metric definitions codify business rules as data contracts. 'Revenue' has exactly one definition, tested on every run, consumed by every downstream system identically.
Semantic Mapping
The semantic layer maps business vocabulary to governed data assets. 'Last month's active customers' resolves to a specific mart table, specific columns, specific filter logic — not a best guess.
AI Reasoning
Claude queries the semantic layer, not raw tables. Every answer is grounded in a governed metric definition. Reasoning steps are visible — the AI explains what data it used and why.
05

Semantic Layer

The semantic layer is the architectural contract between raw data and AI reasoning. It translates physical table structures into business concepts that both humans and AI understand and trust.

Business Concept Mapping
Physical columns like 'trx_amt_net_usd' become governed concepts like 'Net Revenue'. The AI queries business concepts, not raw columns — eliminating the most common source of LLM hallucinations in analytics contexts.
Metric Governance
Every metric has exactly one definition, tested continuously. 'Monthly Active Users' means the same thing to the dashboard, the AI analyst, and the executive report — because it is defined once as code and consumed everywhere identically.
Natural Language Grounding
The semantic layer maps business vocabulary to governed data assets. When a user asks 'how did revenue perform last quarter?', the AI resolves 'revenue' to a specific, tested dbt metric with known caveats — not a probabilistic column-name guess.
Context Injection
Before the AI answers any question, it receives the relevant semantic layer definitions: business rules, filter logic, known data quality caveats. This context makes LLM reasoning deterministic and explainable rather than opaque and probabilistic.
06

AI Layer

The AI layer is the interface between business questions and governed data. It does not replace analysts — it makes their work accessible to anyone in the organisation.

Semantic Context Injection
Before answering any question, Claude receives the semantic layer definition for relevant metrics — business rules, column meanings, known caveats. This context eliminates the most common source of hallucinations.
Governed Query Generation
SQL is generated against the semantic layer, not raw tables. Metric definitions are enforced automatically. No analyst intervention required to ensure consistency.
Explainable Answers
Every AI-generated answer includes the metric definition used, the data range queried, and any caveats. Business users understand not just the answer but how it was produced.
Anomaly Detection
The AI layer monitors KPIs on a schedule — detecting deviations, generating hypotheses for root causes, and escalating to humans only when confidence is low.
07

Without DWH vs With DWH

Without DWH
Without DWH — AI Guesses
AI reads raw files directly. No context, no governance, no trust.
  • Inconsistent answers — the same metric returns different values on different days
  • Duplicated logic — revenue calculation differs between CSV files with no reconciliation
  • Hallucinations — AI infers column meanings that don't match business definitions
  • Missing business context — test records, cancelled transactions, and edge cases are not filtered
  • No explainability — the AI cannot show its working because the working is guesswork
With DWH
With DWH — AI Reasons
AI operates on a governed semantic layer. Every answer is grounded, explainable, and consistent.
  • Trusted metrics — every number comes from a single canonical definition enforced as code
  • Reusable definitions — business logic is written once, tested continuously, used everywhere
  • Explainable answers — the AI shows which metric definition it used and why
  • Scalable analytics — adding a new data source extends the platform without rebuilding the AI layer
  • Consistent reasoning — the same question always produces the same answer from the same governed data
08

Technologies

Apache IcebergdbtSemantic LayerClaudeS3
09

Results

  • AI analyst answers business questions with consistent, explainable, governed metrics
  • Zero hallucinated metric definitions — business logic enforced by dbt contracts
  • Natural language interface reduces time-to-insight from hours to seconds
  • Semantic layer serves as a single source of truth for both AI and human analysts
  • Platform architecture is extensible — new data sources add to the semantic layer without disrupting existing AI queries
10

Lessons Learned

The AI is only as good as the data
The most impactful improvements to AI answer quality came from improving data governance, not from prompt engineering. Clean, governed data with clear business context produces better AI reasoning than any prompt trick.
The semantic layer is the AI's operating manual
Without a semantic layer, AI analytics is a probabilistic exercise. With one, it becomes a deterministic lookup of governed business logic. The investment in semantic layer design pays back on every query.
Explainability is not optional
Business users will not trust AI answers they cannot verify. Designing explainability into the AI layer from the start — showing sources, metric definitions, and reasoning steps — was the difference between adoption and rejection.
Governance scales the platform
Every metric definition added to the semantic layer is immediately available to the AI analyst. The governance investment compounds — the platform becomes smarter as the semantic layer grows, with no additional AI work required.
11

Future Vision

  • Autonomous analytics agents that proactively surface insights without being asked — monitoring KPIs, detecting anomalies, and generating executive summaries on a schedule
  • Multi-agent architectures where specialised AI analysts handle different business domains — revenue, operations, customer — and collaborate to answer cross-domain questions
  • Real-time AI analytics over streaming data — extending the governed semantic layer to Kafka event streams for sub-second AI-powered operational intelligence
  • Self-updating semantic layers where the AI proposes new metric definitions based on observed query patterns, subject to human review and governance approval