Overview
To understand which vehicle attributes most influence whether BMW sales fall into high or low performance categories, a Logistic Regression classifier was trained on the dataset’s numerical and categorical variables using a Pipeline structure.
The model automatically scaled numeric features and encoded categorical ones, ensuring balanced and unbiased learning across all predictors.
Model Framework
Pipeline Components
-
Preprocessing (ColumnTransformer)
-
Numerical features: Scaled using
StandardScaler() -
Categorical features: Encoded using
OneHotEncoder()
-
-
Estimator:
LogisticRegression()(solver: lbfgs, max_iter=1000) -
Cross-Validation Accuracy: 0.9938 ± 0.00067
Model Performance
| Metric | High | Low | Average |
|---|---|---|---|
| Precision | 0.98 | 1.00 | 0.99 |
| Recall | 1.00 | 0.99 | 0.99 |
| F1-Score | 0.99 | 0.99 | 0.99 |
| Overall Accuracy | 0.99 (99%) | — | — |
Confusion Matrix
-
Only 58 misclassifications out of 10,000 test records.
-
No false positives for the “High” class, indicating excellent precision and generalization.
Interpretation
-
Model Strength:
The logistic regression model achieved near-perfect performance, confirming strong predictive relationships between key features (e.g., Price, Engine_Size, Sales_Volume, Mileage) and sales category. -
High Recall:
The model identifies nearly all high-sales cases, minimizing missed opportunities in predicting premium market segments. -
Feature Insight:
-
Sales_Volume and Price_USD were dominant predictors.
-
Region and Fuel_Type added contextual strength, aligning with global sales diversity.
-
-
Strategic Use:
-
BMW or dealers can use such models to forecast sales potential for new models.
-
Governments or industry analysts can assess luxury car demand elasticity and policy impacts on premium automotive markets.
-
Policy and Industry Implications
-
Market Forecasting: Predictive analytics helps manufacturers align production with demand surges or slumps.
-
Sustainability Goals: Including Fuel_Type and Engine_Size in the prediction enhances understanding of green technology adoption patterns.
-
Revenue Planning: Accurate classification supports pricing and dealership inventory management.
Acknowledgments
-
Dataset: BMW Sales Data (2010–2024), analyzed within the DatalytIQs Academy Analytics Framework.
-
Tools & Libraries: Python (scikit-learn, pandas, numpy, matplotlib).
-
Contributors:
-
Collins Odhiambo Owino — Lead Analyst & Author, DatalytIQs Academy
-
Kaggle Automotive Datasets — Data structuring and reference
-
BMW Group Annual Reports (2010–2024) — Validation context for market trends
-
Author’s Note
Written by Collins Odhiambo Owino
Founder & Lead Researcher, DatalytIQs Academy
Empowering learners and professionals in Mathematics, Economics, and Finance through data-driven insights.

Leave a Reply
You must be logged in to post a comment.