Smart Grid Load Forecasting using Machine Learning




- Developed a machine learning model to predict future electricity load for the Netherlands in a smart grid context
- Utilized ENTSO-E dataset with 201,604 entries at 15-minute intervals (2014–2020), including load, solar, and wind generation data
- Preprocessed data with timestamp conversion, chronological indexing, and time-based interpolation to handle missing values
- Engineered temporal features (hour, day, month, year, holidays, weekends), synthetic temperature features, and lagged load/temperature features for time-series forecasting
- Baseline Model: Linear Regression achieved R² = 0.9447, MAE = 334.71 MW, RMSE = 457.78 MW
- Advanced Model: XGBoost achieved R² = 0.9818, MAE = 178.36 MW, RMSE = 262.63 MW, improving accuracy by ~47% (MAE) and ~43% (RMSE) over baseline
- Feature importance analysis highlighted lagged load values, holidays, and cyclical time features as key drivers of electricity demand
- Demonstrated the effectiveness of feature engineering and non-linear models in capturing complex load patterns
- Gained hands-on ML experience with complete machine learning pipeline: data preprocessing, feature engineering, baseline model selection, metric evaluation, advanced model experimentation, iterative evaluation, overfitting/underfitting handling
- Lagged load values, holidays, and cyclical time features significantly influence electricity demand
- Feature engineering plays a critical role in improving predictive accuracy for time-series data
- Non-linear models like XGBoost outperform traditional linear regression in capturing complex load patterns
- External real-world factors (e.g., temperature, holidays) must be considered for robust smart grid forecasting
- Data preprocessing and handling missing values are essential for reliable machine learning models
Technologies Used:
Python
scikit-learn
pandas
matplotlib
Machine Learning