Skip to content

Long term prediction

Pony Biam! edited this page Jun 7, 2020 · 10 revisions

A simple long term predictive LightGBM model can be found in this notebook. The model was trained with one year data (2016) in order to predict the following year (2017).

longterm-split

Features

Based on the exploratory data analysis a simple feature engineering was performed. Based on EDA of meter readings:

  • Healthcare, Food sales and services and Utility usages shows the highest meter reading values.
  • Hotwater meter shows the highest meter reading values.
  • Monthly behaviour (meter-reading median) shows higher readings in warm season.
  • Hourly behaviour (meter-reading median) shows higher values from 6 to 19 hs.
  • Weekday behaviour: lowers during weekends.

In the following section can be found the features selected, transformed and created.

Selection

the following features were selected from each data set:

  • Building metadata
    • Building ID*
    • Site ID*
    • Primary space usage
    • Building size (sqm)
  • Weather data
    • Timestamp*
    • Site ID*
    • Air temperature
  • Meter reading data
    • Timestamp*
    • Building ID*
    • meter
    • meter reading (target)

Transformation

The following features were transformed:

  • primaryspaceusage categories (16) were reduced to food sales and services, healthcare, food sales and services, utility and other
  • meter categories (8) were preserved

Creation

The following features were created:

  • month
  • day of the week
  • hour of the day

Final features

  • Timestamp*
  • Site ID
  • Building ID
  • Month
  • Hour
  • Day of the week
  • Usage (4 levels: healthcare, food, utility, other)
  • Building size (sqm)
  • Air temperature
  • Meter (8 levels)
  • Meter reading / target

Parameters

Parameters for this model were not tuned, but were manually modified to perform better than default.

  • "objective": "regression"
  • "metric": "rmse"
  • "random_state": 55
  • "learning_rate": 0.01, (default 0.1)
  • "max_bin": 761 (default 255)
  • "num_leaves": 2197 (default 31)

Results

Performance, as expected, was poor for this model. It can be used as baseline for more complex models.

longterm-plot1
Figure 1: meter_reading real values and predicted with long-term model v. timestamp.

longterm-plot2
Figure 2: meter_reading predicted with long-term model v. real values.

meter/metric RMSE RMSLE CVRMSE MBE R2
all 55322.5199 4.954 507.2326 -12.0286 0.7159
electricity 3176.3816 4.7 2315.3038 -2311.0688 -158.8472
water 3615.7081 6.248 926.2795 -800.8299 -6.9786
chilledwater 110294.371 4.4007 238.3562 12.6059 0.7745
hotwater 69321.352 5.4857 167.1094 12.054 0.594
gas 3326.863 6.5313 595.6342 -544.586 -1.5012
steam 70529.2322 4.6466 14962.0129 -1114.1194 -2090.692
solar 3295.8814 7.1066 13486.3753 -13482.7474 -3114.5355
irrigation 3419.1858 7.7316 1413.337 -1272.5739 -4.1593

Table 1: metrics for the long-term model, calculated for all meters alltogether and for each one.