DENIS IL.
Flagship ProjectData Platform
Active Dev

Gaming Club Data Platform

End-to-end data infrastructure for gaming venue operations — automated ingestion pipelines, dbt transformations, and decision-ready analytics over a centralised PostgreSQL warehouse.

PostgreSQLClickHouseAirflowdbtMetabasePython
01

Overview

Gaming clubs are complex operational businesses. Revenue comes from multiple streams: gaming sessions (seat rentals by the hour), monthly subscriptions, food & beverage sales, and merchandise. Each stream has its own pricing logic, customer lifecycle, and reporting requirements.

This project built an end-to-end analytics platform for a live gaming club operation — automated nightly ingestion from four operational systems into a central PostgreSQL data warehouse, dbt transformation layers with consistent metric definitions enforced as code, and a Metabase reporting layer delivering trusted business metrics daily.

02

Business Context

Understanding what a gaming club actually operates helps frame what the data platform needed to solve.

  • Gaming sessions are the core revenue unit — customers book seats by the hour using gaming PCs, consoles, or VR stations. Session data captures equipment, duration, and seat utilisation.
  • Subscriptions provide recurring revenue — monthly plans grant members priority access and discounted hourly rates. Tracking activations, renewals, and churn is critical for MRR forecasting.
  • Food & beverage sales happen concurrently with sessions — POS transactions link to active sessions and customer profiles, enabling per-visit spend analysis.
  • CRM interactions record every customer touchpoint — registration, tier changes, loyalty points, and support interactions feed into customer lifetime value and segmentation models.
  • Revenue operations require cross-stream reconciliation — a single customer visit generates events in the booking system, POS, CRM, and subscription service simultaneously.
03

Data Challenges

Before the platform existed, operational data lived in four disconnected systems with no shared data model, no agreed metric definitions, and no automated reporting.

  • Fragmented operational systems — POS, booking, CRM, and subscription service had separate databases with no shared customer identifiers or event schema.
  • Inconsistent metrics — revenue figures differed between systems because each calculated totals differently: gross vs. net, session vs. payment date, subscription period vs. collection date.
  • Manual reporting — weekly management reports were assembled by hand from CSV exports across all four systems, taking several hours and introducing copy-paste errors.
  • Duplicated business logic — discount calculations, subscription status rules, and revenue attribution logic were reimplemented independently in each reporting tool.
  • Reporting delays — by the time weekly reports reached management, the data was 5–7 days old, making fast operational decisions impossible.
04

Architecture

A layered data platform built around a central PostgreSQL data warehouse, with Airflow orchestrating nightly ingestion and dbt owning all transformation and metric definition logic.

SOURCES
Operational Systems
POS · Booking System · CRM · Subscription Service
INGEST
Ingestion Layer
Airflow DAGs · Python extractors · nightly schedule
WAREHOUSE
Raw Data
Insert-only · Full history · No transformations
WAREHOUSE
PostgreSQL DWH
Raw → Staging → Mart layer separation
TRANSFORM
DBT Models
Business logic · metric definitions · data contracts
SEMANTIC
Business Metrics
Aggregated KPIs · revenue attribution · cohort metrics
REPORTING
Dashboards
Metabase · operational reporting · management views
05

Data Flow

Payments
Transaction records extracted nightly, reconciled against session data, and classified by payment method and type.
Gaming Sessions
Session start/end events with equipment allocation, duration tracking, and hourly utilization metrics.
Subscriptions
Activation, renewal, and expiration events with revenue attribution and churn signal tracking.
CRM Data
Customer profile updates, segment classification, and acquisition channel attribution.
Operational Metrics
Hourly capacity utilization, peak period detection, and equipment availability signals.
06

Analytics Layer

The analytics layer standardises all business metrics into a single governed definition layer, ensuring every dashboard and every stakeholder works from the same numbers.

KPI Standardisation
Every metric — revenue, active sessions, churn rate, subscription MRR — has one canonical definition enforced as dbt code. No more contested numbers from different spreadsheets.
Centralised Metrics Store
All KPIs are computed once in the mart layer and consumed by downstream dashboards. Changing a metric definition in one place updates all reports automatically.
Reusable Business Logic
Revenue attribution rules, session categorisation logic, and subscription status calculations are written once as dbt models and referenced everywhere — eliminating the duplication that caused inconsistencies.
Cross-Stream Reconciliation
The analytics layer joins events across POS, booking, CRM, and subscriptions on a unified customer key, making per-visit and per-customer economics visible for the first time.
07

Automation Layer

1
Extraction DAGs
Separate Airflow DAGs for each operational system run extraction jobs that pull incremental data via Python scripts and load it into raw tables. DAGs run sequentially to respect system dependencies.
2
Transformation Pipeline
After extraction, a dbt run DAG executes the full transformation stack: raw → staging → mart. Failed model runs trigger alerts before any downstream reports are refreshed.
3
Data Quality Checks
dbt tests run after each transformation layer — not null constraints, referential integrity, row count validations, and business rule assertions. Any failure halts the pipeline.
4
Scheduled Reporting
Metabase sends the executive daily summary automatically at 8am. If the nightly pipeline fails, the report is withheld and an alert is sent — stale data never reaches management silently.
5
Pipeline Monitoring
Airflow provides full DAG execution history, retry logic for transient failures, and alerting on task-level timeouts — giving full operational visibility into the data pipeline.
08

DWH Modeling

The warehouse follows a three-layer medallion architecture enforced through dbt project structure. Each layer has a distinct contract: raw data is never modified, staging is where cleaning happens, marts are where business logic lives.

rawRaw Layer

Exact copies of source system data. Insert-only, no transformations, full history preserved. Serves as an audit log and re-processing baseline.

raw_pos_transactionsDirect copy of POS transaction records with original field names and types
raw_bookingsSession booking events exactly as received from the booking system API
raw_subscriptionsSubscription lifecycle events: activations, renewals, cancellations, expirations
raw_crm_profilesCustomer profile snapshots from CRM with full field history
stgStaging Layer

Cleaned, typed, and deduplicated source data. One staging model per source table. Business logic is explicitly forbidden at this layer.

stg_paymentsTyped payment records with nulls handled, currency normalised, duplicates removed
stg_sessionsNormalised session events with duration computed, equipment IDs standardised
stg_subscriptionsUnified subscription events with consistent status codes and date types
stg_customersMerged customer profiles with segment tags and acquisition channel attribution
martMart Layer

Business-facing tables optimised for reporting queries. Aggregated KPIs, revenue attribution logic, and cohort metrics all live here.

mart_revenueDaily and monthly revenue broken down by category, payment method, and source system
mart_sessionsHourly utilisation metrics, equipment allocation rates, peak period flags
mart_subscriptionsActive subscriber counts, churn rates, renewal pipeline, and MRR tracking
mart_customersCustomer segments, LTV estimates, acquisition cohorts, and engagement scores
09

Reporting Layer

Executive Daily Summary
Single-page management dashboard covering yesterday's revenue, active sessions, and subscriber count versus prior periods. Delivered as a scheduled Metabase email at 8am.
Revenue Analytics
Payment method mix, revenue breakdown by category (sessions vs. subscriptions vs. merchandise), trend lines, and daily deviation from weekly averages.
Session Utilisation
Equipment allocation by hour-of-day, seat utilisation heatmap, peak period identification, and capacity headroom analysis for operational planning.
Subscription Health
Active subscriber counts, renewal pipeline for the next 30 days, churn signals from declining engagement, and MRR trend with cohort breakdown.
10

Technologies

PostgreSQLClickHouseAirflowdbtMetabasePython
11

Results

  • Automated nightly pipeline replaced manual weekly CSV reporting cycle
  • Consistent metric definitions enforced across all dbt models and downstream dashboards
  • Single management dashboard covering revenue, sessions, and subscriptions
  • Data discrepancies between systems identified and resolved through reconciliation logic
  • Reporting cycle shortened from several days to overnight
  • Duplicated business logic eliminated — one definition, used everywhere
12

Lessons Learned

Data quality precedes modeling
Source system inconsistencies required more attention than the pipeline itself. Invest in profiling and reconciliation before writing transformation logic — bad source data doesn't get fixed by good SQL.
Metrics are a business conversation
Defining what 'revenue' or 'active session' means requires stakeholder alignment, not just technical decisions. Contested definitions produce contested dashboards and eroded trust.
Metric governance scales the platform
Centralising metric definitions in dbt eliminated the duplication that caused inconsistencies. When the revenue calculation changed, updating one model fixed every downstream report simultaneously.
Automation compounds in value
Each additional data source added disproportionate value once the ingestion pattern was established. Pipeline infrastructure is a multiplier — the second source costs a fraction of the first.
Scalability is a design choice made early
Choosing a medallion architecture from day one meant the platform could absorb new data sources without restructuring. Short-term overhead paid back within the first quarter.
Early mart delivery accelerates feedback
Delivering mart-layer tables early gave stakeholders something concrete to react to, which improved requirements quality for subsequent iterations considerably.
13

Future Improvements

high
ClickHouse Migration
Move analytical query workloads from PostgreSQL to ClickHouse. Sub-second aggregations on large time-series datasets will unlock real-time operational dashboards that PostgreSQL cannot serve at scale.
high
Real-time Streaming Pipeline
Add a Kafka-based streaming layer alongside the nightly batch pipeline. Priority data streams: payment events and session open/close events. Enables live utilisation monitoring and instant revenue visibility.
medium
AI Analytics Layer
Expose the dbt semantic layer via a natural language interface powered by Claude. Management self-service queries — 'what drove revenue growth last week?' — without writing SQL or relying on predefined reports.
medium
Semantic Layer API
Formalise metric definitions as a programmatic API using dbt metrics or Cube.dev. Enables consistent metric consumption across Metabase, future AI agents, and any downstream application.