hotel-booking-demand

Forecasting hotel demand was a small datathon within the commercial team at Elder Research. It was a challenging problem with a fun solution worth sharing. All the work was done within a 3-hour timeframe.

hotel-booking-demand

Background

Forecasting with "small data" is a common and surprisingly difficult task. It is easy to overfit and difficult to distinguish the signal from the noise with so few datapoints. While there are many standard methods out there such as ARIMA and Prophet, these will often be outperformed by the mean in performance evaluation.

The dataset contains booking information for a city hotel and a resort hotel, including information on when the booking was made, length of stay, number of adults, and more. There is one row per booking in the data.

In order to forecast demand, the data has to be transformed to be one row per date per hotel type. This was completed in the notebooks/hotel_cleaning.ipynb notebook. After that was finished, the demand over time looked like this:

August 2017 was removed to be used as an evaluation dataset.

Solution

Modeling Vacancies

The two primary challenges in modeling this dataset were:

Small size, having less than two full periods
Having a hard upper bound that is different for each hotel and that the data often runs up against.

Some observations that were relevant to the model type:

There is a strong yearly seasonality that the solution should take into account, despite not having two full periods.
There is little autocorrelation
Both hotels have similar patterns, making a global model preferrable.
The test period (August) will likely be near the capacity, so it will important to take that into account
There appears to be a small weekday effect.

Based on this, I chose to model the vacancies instead of the demand, since this makes the dataset a much more common structure--right-skewed, zero-inflated. This also makes the data structure similar between the two datasets.

The LightGBM modeling framework is a flexible one allowing different loss functions that typically does well on forecasting competitions. It can't just be plugged in--features have to be hand-crafted to extract the signal from the dataset.

Feature Engineering

Autocorrelation is low on this dataset, while correlation to the previous year is high, and there is weekly seasonality. Therefore, I chose to encode the following four features:

Vacancies last year, same day
Vacancies last year, week centered on same day
Vacancies last year, 2-week period centered on same day
Day of week

LightGBM was used with the default settings except the tweedie objective function was used, which is good for right-skewed zero-inflated data. This setup ensures that it's impossible to predict above the capacity of the hotel, since tweedie will not let a prediction go below zero. The features used are able to capture that yearly seasonality well by balancing between what was seen on the exact day last year vs what happened around that day last year. The resulting predictions are shown below:

The feature importances for the model:

Conclusion

Small data forecasting challenges require careful handling. Plugging the data directly into a forecasting algorithm will often end up overfitting. In this case, by using the tweedie loss function on hotel vacancies instead of demand, and encoding the clearly strong yearly seasonality into features, LightGBM was able to produce sensible predictions that ended up winning the datathon.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hotel-booking-demand

Background

Solution

Modeling Vacancies

Feature Engineering

Conclusion

About

Releases

Packages

Languages

robert-robison/hotel-booking-demand

Folders and files

Latest commit

History

Repository files navigation

hotel-booking-demand

Background

Solution

Modeling Vacancies

Feature Engineering

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages