diff --git a/docs/_toc.yml b/docs/_toc.yml index b049af511..6b4a4102b 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -32,6 +32,8 @@ parts: title: Object Detection - file: source/usecases/emotion-analysis.rst title: Emotion Analysis + - file: source/usecases/homesale-forecast.rst + title: Home Sale Forecasting # - file: source/usecases/privategpt.rst # title: PrivateGPT diff --git a/docs/source/usecases/homesale-forecast.rst b/docs/source/usecases/homesale-forecast.rst new file mode 100644 index 000000000..5f1f2937e --- /dev/null +++ b/docs/source/usecases/homesale-forecast.rst @@ -0,0 +1,133 @@ +.. _homesale-forecasting: + +Home Sale Forecasting +===================== + +.. raw:: html + + + + + + +
+ Run on Google Colab + + View source on GitHub + + Download notebook +


+ + +Introduction +------------ + +In this tutorial, we present how to use :ref:`forecasting models` in EvaDB to predict home sale price. EvaDB makes it easy to do time series predictions using its built-in Auto Forecast function. + +.. include:: ../shared/evadb.rst + +.. include:: ../shared/postgresql.rst + +We will assume that the input data is loaded into a ``PostgreSQL`` database. +To load the home sales data into your database, see the complete `home sale forecasting notebook on Colab `_. + +Preview the Home Sales Data +------------------------------------------- + +We use the `raw_sales.csv of the House Property Sales Time Series `_ in this usecase. The data contains five columns: postcode, price, bedrooms, datesold, and propertytype. + +.. code-block:: sql + + SELECT * FROM postgres_data.home_sales LIMIT 3; + +This query previews the data in the home_sales table: + +.. code-block:: + + +---------------------+------------------+---------------------+---------------------+-------------------------+ + | home_sales.postcode | home_sales.price | home_sales.bedrooms | home_sales.datesold | home_sales.propertytype | + |---------------------|------------------|---------------------|---------------------|-------------------------| + | 2607 | 525000 | 4 | 2007-02-07 | house | + | 2906 | 290000 | 3 | 2007-02-27 | house | + | 2905 | 328000 | 3 | 2007-03-07 | house | + +---------------------+------------------+---------------------+---------------------+-------------------------+ + +Train a Home Sale Forecasting Model +----------------------------------- + +Let's next train a time-series forecasting model from the home_sales table using EvaDB's ``CREATE FUNCTION`` query. +Particularly, we are interested in the price of the properties that have three bedrooms and are in the postcode 2607 area. + +.. code-block:: sql + + CREATE FUNCTION IF NOT EXISTS HomeSaleForecast FROM + ( + SELECT propertytype, datesold, price + FROM postgres_data.home_sales + WHERE bedrooms = 3 AND postcode = 2607 + ) + TYPE Forecasting + PREDICT 'price' + TIME 'datesold' + ID 'propertytype' + FREQUENCY 'W'; + +In the ``home_sales`` dataset, we have two different property types, houses and units, and price gap between them are large. +We'd like to ask EvaDB to analyze the price of houses and units independently. +To do so, we specify the ``propertytype`` column as the ``ID `` of the time series data, which represents an identifier for the series. +Here is the query's output ``DataFrame``: + +.. note:: + + Go over :ref:`forecast` page on exploring all configurable paramters for the forecast model. + +.. code-block:: + + +----------------------------------------------+ + | Function HomeSaleForecast successfully added | + +----------------------------------------------+ + +Predict the Home Price using the Trained Model +---------------------------------------------- + +Next we use the trained ``HomeSaleForecast`` to predict the home sale price for next 3 weeks. + +.. code-block:: sql + + SELECT HomeSaleForecast(3); + +The input of the trained model is the horizon (i.e., week in this case), the steps we want to forecast in the future. Here is the query's output ``DataFrame``: + +.. code-block:: + + +-------------------------------+---------------------------+------------------------+ + | homesaleforecast.propertytype | homesaleforecast.datesold | homesaleforecast.price | + +-------------------------------+---------------------------+------------------------+ + | house | 2019-07-21 | 766572 | + | house | 2019-07-28 | 766572 | + | house | 2019-08-04 | 766572 | + | unit | 2018-12-23 | 417229 | + | unit | 2018-12-30 | 409601 | + | unit | 2019-01-06 | 402112 | + +-------------------------------+---------------------------+------------------------+ + +We can further use ``ORDER BY`` to find out which month in the following year has the lower price. + +.. code-block:: sql + + SELECT * + FROM (SELECT HomeSaleForecast(12)) AS HomeSale + ORDER BY price + LIMIT 1; + +Here is the query's output: + +.. code-block:: + + +-----------------------+-------------------+----------------+ + | HomeSale.propertytype | HomeSale.datesold | HomeSale.price | + +-----------------------+-------------------+----------------+ + | unit | 2019-03-10 | 340584 | + +-----------------------|-------------------|----------------| + +.. include:: ../shared/footer.rst diff --git a/script/test/test.sh b/script/test/test.sh index 75e7584eb..dd3c0b5a4 100644 --- a/script/test/test.sh +++ b/script/test/test.sh @@ -88,7 +88,7 @@ long_integration_test() { } notebook_test() { - PYTHONPATH=./ python -m pytest --durations=5 --nbmake --overwrite "./tutorials" --capture=sys --tb=short -v --log-level=WARNING --nbmake-timeout=3000 --ignore="tutorials/08-chatgpt.ipynb" --ignore="tutorials/14-food-review-tone-analysis-and-response.ipynb" --ignore="tutorials/15-AI-powered-join.ipynb" + PYTHONPATH=./ python -m pytest --durations=5 --nbmake --overwrite "./tutorials" --capture=sys --tb=short -v --log-level=WARNING --nbmake-timeout=3000 --ignore="tutorials/08-chatgpt.ipynb" --ignore="tutorials/14-food-review-tone-analysis-and-response.ipynb" --ignore="tutorials/15-AI-powered-join.ipynb" --ignore="tutorials/16-homesale-forecasting.ipynb" code=$? print_error_code $code "NOTEBOOK TEST" }