Customer Satisfaction Prediction & Forecasting Dashboard

Overview
Problem Statement
Core Components
- Data Processing Pipeline
- Predictive Model: Identifying Satisfaction Drivers
- Forecasting Model: Predicting Future CSAT Trends
- Dashboard: Exploring Trends, Drivers, and Scenarios
- Infrastructure: Modular and Reproducible
Potential Improvements & Next Steps
Key Learnings
Keywords

Overview

This project aimed to build a scalable, data-driven framework for understanding and influencing customer satisfaction. By combining predictive modeling, time series forecasting, and dashboard visualizations, the solution provides deeper insights into sentiment drivers and empowers teams to simulate the impact of operational decisions. The approach enables organizations to shift from reactive survey analysis to proactive customer experience management.

Problem Statement

The project was initiated to address two persistent issues:

Traditional NPS (Net Promoter Score) and CSAT (Customer Satisfaction) surveys capture feedback from only a small subset of customers, leaving most sentiment unmeasured.
There is an increasing need to connect customer experience metrics with core business outcomes such as customer retention, revenue growth, and cost efficiency.

The project’s goals were to:

Identify key operational drivers influencing customer satisfaction across journeys
Predict customer sentiment at scale by using operational and contextual data to extend analysis beyond survey respondents
Forecast future sentiment based on KPI changes and operational scenarios
Enable strategic and personalized interventions based on predictive insights
Build a repeatable, scalable solution for continuous CX monitoring

To achieve this, the solution combines two modeling components: a classification model that predicts individual CSAT outcomes and a forecasting model that projects satisfaction trends over time. These outputs are integrated into a dynamic dashboard that allows users to explore trends, analyze top drivers, and simulate the impact of potential interventions.

Core Components

Data Processing Pipeline

All data flows through a structured three-stage pipeline:

Uploaded: Raw files provided are manually uploaded
Loaded: Processed using the open-source Data Load Tool (DLT), which automates schema creation, enforces data normalization, and integrates with orchestration tools like Dagster
Base: Final structured datasets used for modeling

DLT helped streamline schema management and incremental loading, but proved somewhat cumbersome for ad-hoc testing or fast CSV inspection, making it more useful for standardized production workflows than exploratory analysis.

The base table for the predictive model is structured with each row representing a customer journey and each column capturing a specific operational driver. For forecasting, the base table aggregates CSAT scores monthly by customer segment.

Predictive Model: Identifying Satisfaction Drivers

To predict customer satisfaction and identify key operational drivers, we used a Random Forest classifier. This model was chosen for its strong baseline performance, robustness to overfitting, and ability to handle mixed feature types. It also provides native feature importance outputs, which were helpful for stakeholder interpretation.

Feature selection was handled using Recursive Feature Elimination (RFE) to reduce noise and improve model focus. We explored SHapley Additive exPlanations (SHAP) values to explain individual predictions, though the five-class CSAT output limited interpretability across classes.

Forecasting Model: Predicting Future CSAT Trends

To forecast satisfaction trends, we used Facebook Prophet, which offered reliable handling of trend and seasonality, robustness to missing data, and fast training. Input data was monthly CSAT scores per segment.

The model used expanding window cross-validation to reflect real-time forecasting dynamics and was evaluated using Mean Absolute Percentage Error (MAPE). We tested mean and median imputation strategies to address data sparsity.

This setup gave operational and CX teams a way to anticipate performance gaps before they occurred.

Dashboard: Exploring Trends, Drivers, and Scenarios

The models were integrated into an interactive dashboard built using Dash and Plotly, designed for analysts and operational managers. It contains three tabs:

Journey Performance: Visualizes historical and forecasted CSAT scores
Operational Driver Analysis: Highlights key drivers for each journey
Scenario Simulation: Allows users to define custom driver inputs and view updated CSAT forecasts

User interactions (like dropdowns or sliders) dynamically update plots and summaries using Dash callbacks, making the dashboard highly interactive and intuitive for non-technical users.

An example of the dashboard functionality is the scenario simulation tool. This feature allows users to assess the impact of proposed operational changes on future customer satisfaction. Users can select specific operational drivers - such as wait times or service delays - and input their target values over a chosen horizon (e.g., the next 3 or 6 months). Once submitted, the inputs are sent to the backend through a REST API.

The backend simulation workflow follows these steps:

Compute Baseline CSAT: Use actual driver values from the training dataset to predict CSAT using the Random Forest model and calculate the average predicted score.
Create Simulated Training Set: Replace original driver values in the training set with user-defined targets to generate a modified dataset.
Compute Simulated CSAT: Apply the predictive model to this simulated dataset and calculate the average predicted CSAT.
Estimate Simulation Effect: Measure the change in satisfaction by comparing the simulated CSAT with the baseline.
Forecast CSAT: Generate the default future CSAT trend using the forecasting model (Prophet).
Generate Scenario Forecast: Apply the simulation effect to the forecasted values to simulate what CSAT would look like if the driver targets are met.

The dashboard then visualizes both the baseline and scenario forecasts side-by-side, allowing users to clearly compare projected satisfaction under current conditions versus potential improvements.

Infrastructure: Modular and Reproducible

We adopted a clean modular architecture to support flexible development and future automation:

Dagster orchestrates all jobs (data prep, modeling, evaluation) and tracks dependencies between assets
Docker standardizes the runtime environment across machines
PostgreSQL stores pipeline metadata and run history
Local File System stores datasets, model outputs, and scenario configurations
REST API bridges backend outputs with the dashboard interface

While the original scope included automated monthly refreshes, these were descoped due to integration constraints. However, Dagster's support for schedules and sensors opens the door for full automation in future phases.

Early design work also explored production deployment options. The original goal was to host the solution on both the client's on-premise infrastructure and cloud services such as Google Cloud Platform (GCP), using Google Cloud Storage (GCS) for scalable file handling. Initial experimentation was conducted on virtual machines (VMs) to evaluate environment compatibility, access management, and data security constraints. While full deployment was paused due to shifting priorities, this groundwork enables faster onboarding if the production rollout is resumed.

Potential Improvements & Next Steps

The project faced two core bottlenecks that significantly limited value creation. The first was data acquisition: many of the required inputs were either missing, inconsistently populated, or not accompanied by clear definitions - making them difficult to use confidently. The second was limited involvement from journey and data teams, who had constrained availability and ownership, making it challenging to gather clarifications, validate features, or resolve blockers.

Looking ahead, a key next step is to transition this solution into a production-grade system hosted on the organization’s on-premise infrastructure, enabling automated data refresh, continuous monitoring, and sustainable operational impact.

The detailed improvements below outline the actions needed to address current limitations across data, code & infrastructure, and strategy.

Data:
- Coverage & Availability:
  - Improve the availability and consistency of key drivers by addressing gaps in features and time horizons
  - Strengthen access to operational drivers by enabling collaboration across teams to locate, compute, and validate journey-specific metrics
  - Integrate web and app interaction data, which is critical for tracking drop-offs, delays, and engagement; this was previously excluded due to API limitations
  - Expand CSAT collection across underrepresented journeys and segments
- Structure & Documentation:
  - Standardize dataset formats by ensuring consistent keys, mapping across sources, and documented levels of granularity
  - Mandate metadata for every dataset by including a complete data dictionary with column definitions, timestamp logic, business rules, and units
Code & Infrastructure:
- Automation, Orchestration & Deployment:
  - Automate data integration from source systems by replacing manual file uploads with scheduled API pulls, aligned to monthly or real-time data refresh needs
  - Enable Dagster schedules and sensors to orchestrate pipeline runs using timed jobs or event triggers, reducing manual effort
  - Deploy the solution on the client’s on-premise infrastructure to enable secure, production-grade operation with automated data refresh and scheduled execution
- Code & Model Training:
  - Explore additional time series models - such as Autoregressive Integrated Moving Average (ARIMA) and other classical statistical approaches - to compare against Prophet, and consider combining multiple models into an ensemble to improve forecast accuracy and robustnes
  - Add an intermediate simulation model that uses training data only, similar to forecasting and prediction models
  - Use synthetic data generation techniques, such as Gaussian copulas, to simulate realistic changes in operational drivers - ensuring that interdependencies between features are preserved when testing “what-if” scenarios within the dashboard’s simulation functionality
  - Explore ordinal classification models (e.g., mord) for the prediction model instead of the typical multi-class classification model used - given the target variable is a 1-5 CSAT score
Strategy, Business & Project Management:
- Strategic Alignment & Business Impact:
  - Link journey-level CSAT improvements to overall NPS outcomes to ensure localized gains in customer sentiment meaningfully contribute to broader CX performance
  - Connect sentiment to financial metrics to tie predicted satisfaction to customer lifetime value, churn risk, and revenue potential to drive business prioritization
  - Extend the solution to adjacent journeys with similar data structures or customer touchpoints to maximize the value of shared modeling and dashboarding logic
- Vision & Ownership:
  - Develop a long-term CX data strategy to address current fragmentation, and establish a centralized vision for access, consistency, and governance across journeys
  - Establish formal ownership for data acquisition and data quality by ensuring such projects are co-led by journey and data teams, with shared accountability for delivery, granularity, and CX relevance

Key Learnings

My learning throughout this project was broad rather than deep. I gained exposure to every stage of the workflow - from data ingestion and pipeline orchestration to modeling and dashboard deployment - developing a well-rounded understanding of how the full solution comes together. As the only client-facing member of the team, I led stakeholder management and helped translate business needs into technical priorities, ensuring alignment throughout the project. This experience gave me both a high-level view of technical delivery and a strong foundation in managing cross-functional collaboration.

One key takeaway was the importance of aligning early with end users to ensure the technical solution matches their needs. A more structured approach to requirement gathering and engagement upfront would have helped us avoid downstream challenges in dashboard tooling and design. This experience emphasized that early alignment, even if it adds upfront complexity, can prevent misalignment later in the project.

I also saw the critical role of clear and proactive communication. While we maintained regular updates, we learned that technical progress must be communicated in a simple, structured, and frequent way to ensure it resonates with all stakeholders. Clear narratives are essential, especially when translating complex work to non-technical audiences.

From a delivery perspective, I gained a deeper appreciation for project planning and cross-functional coordination. Ensuring that data teams remain actively involved throughout the project is essential, especially since understanding raw data, its structure, and its limitations often proves more challenging than the modeling itself. I also saw the impact of small planning choices, such as building in buffer time within workplans to account for inevitable delays. Even straightforward tasks benefit from generous time allocations to provide flexibility later.

This project also highlighted the importance of understanding infrastructure constraints. Technical decisions related to deployment, orchestration, and security often required careful coordination with IT teams, particularly in environments with on-premise requirements. These considerations are critical for delivering solutions that are not only functional but also deployable.

Finally, I learned the importance of understanding context early. Inheriting the project midstream from a previous team introduced complexity, especially as we had to recalibrate expectations around what could be delivered within the new scope. It underscored the value of knowing which elements can be simplified or deferred, and which ones must be understood in detail from day one.

Overall, this experience deepened my understanding of end-to-end delivery and strengthened my ability to balance technical, strategic, and stakeholder needs. It also reinforced the mindset of actively seeking responsibilities that lead to growth, while staying focused on delivering outcomes that are practical and well-aligned with user expectations.

Keywords

Tools & Technologies: Python, Pandas, NumPy, Scikit-learn, Facebook Prophet, Dash, Plotly, PowerBI, Poetry, DLT (Data Load Tool), Dagster, Docker, Git, Github, VS Code, PostgreSQL, REST API, Google Cloud Platform (GCP), Google Cloud Storage (GCS)

Tags: Customer Satisfaction (CSAT), Net Promoter Score (NPS,) Predictive Modeling, Time Series Forecasting, Machine Learning, Supervised Learning, Feature Importance, Model Interpretability, Data Pipeline, Dashboarding, Scenario Analysis, CX Strategy, Operational Drivers, Imbalanced Dataset, On-Premise Servers

Back to Projects