Smart Grid Load Forecasting using Machine Learning

  • Developed a machine learning model to predict future electricity load for the Netherlands in a smart grid context
  • Utilized ENTSO-E dataset with 201,604 entries at 15-minute intervals (2014–2020), including load, solar, and wind generation data
  • Preprocessed data with timestamp conversion, chronological indexing, and time-based interpolation to handle missing values
  • Engineered temporal features (hour, day, month, year, holidays, weekends), synthetic temperature features, and lagged load/temperature features for time-series forecasting
  • Baseline Model: Linear Regression achieved R² = 0.9447, MAE = 334.71 MW, RMSE = 457.78 MW
  • Advanced Model: XGBoost achieved R² = 0.9818, MAE = 178.36 MW, RMSE = 262.63 MW, improving accuracy by ~47% (MAE) and ~43% (RMSE) over baseline
  • Feature importance analysis highlighted lagged load values, holidays, and cyclical time features as key drivers of electricity demand
  • Demonstrated the effectiveness of feature engineering and non-linear models in capturing complex load patterns
  • Gained hands-on ML experience with complete machine learning pipeline: data preprocessing, feature engineering, baseline model selection, metric evaluation, advanced model experimentation, iterative evaluation, overfitting/underfitting handling
  • Lagged load values, holidays, and cyclical time features significantly influence electricity demand
  • Feature engineering plays a critical role in improving predictive accuracy for time-series data
  • Non-linear models like XGBoost outperform traditional linear regression in capturing complex load patterns
  • External real-world factors (e.g., temperature, holidays) must be considered for robust smart grid forecasting
  • Data preprocessing and handling missing values are essential for reliable machine learning models

Technologies Used:

Python
scikit-learn
pandas
matplotlib
Machine Learning