Long term prediction

A simple long term predictive LightGBM model can be found in this notebook. The model was trained with one year data (2016) in order to predict the following year (2017).

longterm-split

Features
Parameters
Results

Features

Based on the exploratory data analysis a simple feature engineering was performed. Based on EDA of meter readings:

Healthcare, Food sales and services and Utility usages shows the highest meter reading values.
Hotwater meter shows the highest meter reading values.
Monthly behaviour (meter-reading median) shows higher readings in warm season.
Hourly behaviour (meter-reading median) shows higher values from 6 to 19 hs.
Weekday behaviour: lowers during weekends.

In the following section can be found the features selected, transformed and created.

Selection

the following features were selected from each data set:

Building metadata
- Building ID*
- Site ID*
- Primary space usage
- Building size (sqm)
Weather data
- Timestamp*
- Site ID*
- Air temperature
Meter reading data
- Timestamp*
- Building ID*
- meter
- meter reading (target)

Transformation

The following features were transformed:

primaryspaceusage categories (16) were reduced to food sales and services, healthcare, food sales and services, utility and other
meter categories (8) were preserved

Creation

The following features were created:

month
day of the week
hour of the day

Final features

Timestamp*
Site ID
Building ID
Month
Hour
Day of the week
Usage (4 levels: healthcare, food, utility, other)
Building size (sqm)
Air temperature
Meter (8 levels)
Meter reading / target

Parameters

Parameters for this model were not tuned, but were manually modified to perform better than default.

"objective": "regression"
"metric": "rmse"
"random_state": 55
"learning_rate": 0.01, (default 0.1)
"max_bin": 761 (default 255)
"num_leaves": 2197 (default 31)

Results

Performance, as expected, was poor for this model. It can be used as baseline for more complex models.

longterm-plot1
Figure 1: meter_reading real values and predicted with long-term model v. timestamp.

longterm-plot2
Figure 2: meter_reading predicted with long-term model v. real values.

meter/metric	RMSE	RMSLE	CVRMSE	MBE	R2
all	55322.5199	4.954	507.2326	-12.0286	0.7159
electricity	3176.3816	4.7	2315.3038	-2311.0688	-158.8472
water	3615.7081	6.248	926.2795	-800.8299	-6.9786
chilledwater	110294.371	4.4007	238.3562	12.6059	0.7745
hotwater	69321.352	5.4857	167.1094	12.054	0.594
gas	3326.863	6.5313	595.6342	-544.586	-1.5012
steam	70529.2322	4.6466	14962.0129	-1114.1194	-2090.692
solar	3295.8814	7.1066	13486.3753	-13482.7474	-3114.5355
irrigation	3419.1858	7.7316	1413.337	-1272.5739	-4.1593

Table 1: metrics for the long-term model, calculated for all meters alltogether and for each one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly