Predicting BMW Sales Classification (2010–2024) Using Logistic Regression

Overview

To understand which vehicle attributes most influence whether BMW sales fall into high or low performance categories, a Logistic Regression classifier was trained on the dataset’s numerical and categorical variables using a Pipeline structure.

The model automatically scaled numeric features and encoded categorical ones, ensuring balanced and unbiased learning across all predictors.

Model Framework

Pipeline Components

  1. Preprocessing (ColumnTransformer)

    • Numerical features: Scaled using StandardScaler()

    • Categorical features: Encoded using OneHotEncoder()

  2. Estimator: LogisticRegression() (solver: lbfgs, max_iter=1000)

  3. Cross-Validation Accuracy: 0.9938 ± 0.00067

Model Performance

Metric High Low Average
Precision 0.98 1.00 0.99
Recall 1.00 0.99 0.99
F1-Score 0.99 0.99 0.99
Overall Accuracy 0.99 (99%)

Confusion Matrix

[30490586893]\begin{bmatrix} 3049 & 0 \\ 58 & 6893 \end{bmatrix}

  • Only 58 misclassifications out of 10,000 test records.

  • No false positives for the “High” class, indicating excellent precision and generalization.

Interpretation

  1. Model Strength:
    The logistic regression model achieved near-perfect performance, confirming strong predictive relationships between key features (e.g., Price, Engine_Size, Sales_Volume, Mileage) and sales category.

  2. High Recall:
    The model identifies nearly all high-sales cases, minimizing missed opportunities in predicting premium market segments.

  3. Feature Insight:

    • Sales_Volume and Price_USD were dominant predictors.

    • Region and Fuel_Type added contextual strength, aligning with global sales diversity.

  4. Strategic Use:

    • BMW or dealers can use such models to forecast sales potential for new models.

    • Governments or industry analysts can assess luxury car demand elasticity and policy impacts on premium automotive markets.

Policy and Industry Implications

  • Market Forecasting: Predictive analytics helps manufacturers align production with demand surges or slumps.

  • Sustainability Goals: Including Fuel_Type and Engine_Size in the prediction enhances understanding of green technology adoption patterns.

  • Revenue Planning: Accurate classification supports pricing and dealership inventory management.

Acknowledgments

  • Dataset: BMW Sales Data (2010–2024), analyzed within the DatalytIQs Academy Analytics Framework.

  • Tools & Libraries: Python (scikit-learn, pandas, numpy, matplotlib).

  • Contributors:

    • Collins Odhiambo Owino — Lead Analyst & Author, DatalytIQs Academy

    • Kaggle Automotive Datasets — Data structuring and reference

    • BMW Group Annual Reports (2010–2024) — Validation context for market trends

Author’s Note

Written by Collins Odhiambo Owino
Founder & Lead Researcher, DatalytIQs Academy
Empowering learners and professionals in Mathematics, Economics, and Finance through data-driven insights.

Comments

Leave a Reply