Parottasalna

AI, Backend Engineering & Architecture Guides

E-Commerce Customer 360, Churn & Revenue Analytics

1. Business Problem (Industry Scenario)

You are hired as a Data Scientist for an online marketplace similar to Amazon/Flipkart.

The company has 3 major issues

  1. Customers are silently churning
  2. Marketing budget is wasted on wrong users
  3. Revenue forecasting is inaccurate

Management Questions

  • Who are our most valuable customers?
  • Which customers will churn next month?
  • What will be next 30 days revenue?
  • Which payment type leads to higher spend?
  • Are delivery delays causing churn?

2. Data Source (REAL)

We will use Brazilian Olist E-commerce Dataset
https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce

Sample Database Look,

TablePurpose
customersuser info
orderspurchase header
order_itemsproduct level
paymentspayment type
reviewssatisfaction
productscategory

3. WHAT YOU MUST BUILD

You must deliver 5 layers

A. SQL Analytics Layer

You must write SQL to produce

  1. RFM Table
  2. Cohort Table
  3. Churn Label
  4. AOV & LTV
  5. Payment analysis

Example Expectations

  • Customers who ordered > 3 times
  • Revenue by state
  • Late delivery impact

B. Pandas + Statistics Layer

You must

  • Handle missing values
  • Remove outliers
  • Feature engineering
  • Statistical tests

Hypothesis Examples

  1. Do credit card users spend more than Boleto?
  2. Does bad review → churn?
  3. Is delivery delay correlated with rating?

Techniques:

  • T-test
  • Chi-square
  • Correlation
  • CLT concepts

C. PySpark Layer

You must convert above to scalable pipeline

  • CSV → Parquet
  • Window functions
  • Aggregations
  • Feature store

D. Airflow Layer (Optional)

ingest → clean → transform → train → report

4. What Is Expected From YOU

1. SQL Skills

You should produce

  • JOINs
  • GROUP BY
  • Window functions
  • CTE
  • Performance thinking

2. Pandas Skills

  • EDA notebook
  • Clean dataset
  • Visuals
  • Stats reasoning

3. PySpark Skills

  • DataFrame API
  • Partitioning
  • Optimization

4. Airflow Skills

  • DAG
  • Operators
  • Monitoring

5. Final Deliverables

You must submit

  1. SQL scripts
  2. Pandas notebook
  3. PySpark job
  4. Airflow DAG
  5. Report PDF

Discover more from Parottasalna

Subscribe now to keep reading and get access to the full archive.

Continue reading