Brand Monitoring AI is a real-time analytics tool designed for small companies looking to identify and engage with relevant discussions about their products. The system monitors Reddit enabling founders to discover opportunities to share their product with interested communities. It provides real-time analytics about the volume of relevant Reddit posts and the percentage of posts that match the search criteria.

Brand Monitoring AI

This project was developed as the final project for DE Zoomcamp 2023, inspired by a tweet from @AdriaanvRossum.

Features

  • Real-time monitoring of Reddit posts matching brand/product search phrases
  • Automated data pipeline from Reddit to analytics dashboard
  • Real-time analytics showing post volume and relevance metrics
  • Partitioned and clustered data storage for efficient querying
  • Scalable architecture supporting multiple data sources (currently Reddit, extensible to others)

Implementation details

Tech stack

Python-based Kafka producer that scrapes Reddit posts and publishes them to Kafka topics. PySpark consumer that processes messages from Kafka, applies transformations, and loads data into BigQuery. The consumer handles data partitioning and clustering for optimal query performance. Data is stored in BigQuery tables partitioned by date and clustered by subreddit for efficient analytics queries. Looker Studio dashboards for real-time analytics visualization showing post volume trends and relevance metrics.

Tech Stack

  • Languages: Python, HCL (Terraform)
  • Streaming: Apache Kafka (Confluent Cloud)
  • Processing: PySpark
  • Storage: Google BigQuery
  • Infrastructure: Terraform, Google Cloud Platform
  • Visualization: Looker Studio

View on GitHub