An analytical solution for a jewelry and accessories e-commerce platform built from scratch. It enabled the Chief Product Officer to shape the product, understand and optimize traffic funnels, and discover new business opportunities.

(metrics shown are adjusted for demo purposes) Demo Dashboard

Stakeholders had two main goals: understanding the overall performance of the e-commerce store and understanding the shopper journey to identify which funnel steps require attention.

Traffic channel

The next step is extending the funnel to include marketing data and transforming it into end-to-end analytics.

Implementation details

The system works as follows: it processes raw API interaction data, transforms it into analytical tables, and serves business intelligence dashboards through Metabase. The architecture supports real-time data ingestion, hourly ETL updates, and historical data tracking, enabling comprehensive analytics of user behavior, catalog interactions, product views, and conversion funnels.

Data Ingestion

Raw API calls are captured in real-time and stored in the pd.api_calls table, containing request/response JSON data, user sessions, endpoints, and timestamps. Data is written in batches with millisecond precision timestamps.

ETL Processing

Python-based ETL scripts organized in jobs.py process raw data into analytical tables:

  • sessions: Aggregated session data with device information, user types, and behavioral metrics
  • catalog_sessions: Enriched catalog browsing data including search queries, filters (material, section, metal, price ranges), pagination, and sorting preferences
  • product_sessions: Product-level analytics tracking views, favorites, cart additions, and purchase events
  • snapshots: Versioned snapshots of production database entities (cart, items, favorites, stock) with change tracking

Data Warehouse Schema

ClickHouse database organized into multiple schemas:

  • pd: Raw source data from API calls
  • dwh: Analytical tables optimized for BI tools and ad-hoc analysis
  • history Historical snapshots with versioning and deletion tracking
  • mysql: Database links for seamless data migration from production MySQL
  • etl: Temporary tables for ETL processing

Scheduled Jobs

main.py orchestrates scheduled ETL jobs running hourly, with systemd service integration for production deployment. Jobs are designed to support point-in-time reprocessing for specific dates.

Tech Stack

  • Language: python
  • Database: clickhouse
  • BI Tool: metabase
  • Orchestration: systemd service for scheduled jobs
  • Data Sources: production MySQL database, API calls logs