Skip to content

Long term prediction

Pony Biam! edited this page Jun 7, 2020 · 10 revisions

A simple long term predictive LightGBM model can be found in this notebook. The model was trained with one year data (2016) in order to predict the following year (2017).



Based on the exploratory data analysis a simple feature engineering was performed. Based on EDA of meter readings:

  • Healthcare, Food sales and services and Utility usages shows the highest meter reading values.
  • Hotwater meter shows the highest meter reading values.
  • Monthly behaviour (meter-reading median) shows higher readings in warm season.
  • Hourly behaviour (meter-reading median) shows higher values from 6 to 19 hs.
  • Weekday behaviour: lowers during weekends.

In the following section can be found the features selected, transformed and created.


the following features were selected from each data set:

  • Building metadata
    • Building ID*
    • Site ID*
    • Primary space usage
    • Building size (sqm)
  • Weather data
    • Timestamp*
    • Site ID*
    • Air temperature
  • Meter reading data
    • Timestamp*
    • Building ID*
    • meter
    • meter reading (target)


The following features were transformed:

  • primaryspaceusage categories (16) were reduced to food sales and services, healthcare, food sales and services, utility and other
  • meter categories (8) were preserved


The following features were created:

  • month
  • day of the week
  • hour of the day

Final features

  • Timestamp*
  • Site ID
  • Building ID
  • Month
  • Hour
  • Day of the week
  • Usage (4 levels: healthcare, food, utility, other)
  • Building size (sqm)
  • Air temperature
  • Meter (8 levels)
  • Meter reading / target


Parameters for this model were not tuned, but were manually modified to perform better than default.

  • "objective": "regression"
  • "metric": "rmse"
  • "random_state": 55
  • "learning_rate": 0.01, (default 0.1)
  • "max_bin": 761 (default 255)
  • "num_leaves": 2197 (default 31)


Performance, as expected, was poor for this model. It can be used as baseline for more complex models.

Figure 1: meter_reading real values and predicted with long-term model v. timestamp.

Figure 2: meter_reading predicted with long-term model v. real values.

all 55322.5199 4.954 507.2326 -12.0286 0.7159
electricity 3176.3816 4.7 2315.3038 -2311.0688 -158.8472
water 3615.7081 6.248 926.2795 -800.8299 -6.9786
chilledwater 110294.371 4.4007 238.3562 12.6059 0.7745
hotwater 69321.352 5.4857 167.1094 12.054 0.594
gas 3326.863 6.5313 595.6342 -544.586 -1.5012
steam 70529.2322 4.6466 14962.0129 -1114.1194 -2090.692
solar 3295.8814 7.1066 13486.3753 -13482.7474 -3114.5355
irrigation 3419.1858 7.7316 1413.337 -1272.5739 -4.1593

Table 1: metrics for the long-term model, calculated for all meters alltogether and for each one.