From c47fe4d16d6287888f148575ce8168163c8a5ea1 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 16 Apr 2020 11:47:29 +0100 Subject: [PATCH 01/61] rename and refactor :boom: --- {notebooks => scenarios}/00_quick_start/README.md | 0 {notebooks => scenarios}/00_quick_start/als_movielens.ipynb | 0 {notebooks => scenarios}/00_quick_start/dkn_synthetic.ipynb | 0 {notebooks => scenarios}/00_quick_start/fastai_movielens.ipynb | 0 {notebooks => scenarios}/00_quick_start/lightgbm_tinycriteo.ipynb | 0 {notebooks => scenarios}/00_quick_start/ncf_movielens.ipynb | 0 {notebooks => scenarios}/00_quick_start/rbm_movielens.ipynb | 0 {notebooks => scenarios}/00_quick_start/rlrmc_movielens.ipynb | 0 {notebooks => scenarios}/00_quick_start/sar_movielens.ipynb | 0 .../00_quick_start/sar_movielens_with_azureml.ipynb | 0 .../00_quick_start/sar_movieratings_with_azureml_designer.ipynb | 0 .../00_quick_start/sequential_recsys_amazondataset.ipynb | 0 {notebooks => scenarios}/00_quick_start/wide_deep_movielens.ipynb | 0 {notebooks => scenarios}/00_quick_start/xdeepfm_criteo.ipynb | 0 {notebooks => scenarios}/01_prepare_data/README.md | 0 {notebooks => scenarios}/01_prepare_data/data_split.ipynb | 0 {notebooks => scenarios}/01_prepare_data/data_transform.ipynb | 0 .../01_prepare_data/wikidata_knowledge_graph.ipynb | 0 {notebooks => scenarios}/02_model/README.md | 0 {notebooks => scenarios}/02_model/als_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/baseline_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/cornac_bpr_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/fm_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/mmlspark_lightgbm_criteo.ipynb | 0 {notebooks => scenarios}/02_model/ncf_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/rbm_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/sar_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/surprise_svd_deep_dive.ipynb | 0 {notebooks => scenarios}/02_model/vowpal_wabbit_deep_dive.ipynb | 0 {notebooks => scenarios}/03_evaluate/README.md | 0 {notebooks => scenarios}/03_evaluate/evaluation.ipynb | 0 {notebooks => scenarios}/04_model_select_and_optimize/README.md | 0 .../azureml_hyperdrive_surprise_svd.ipynb | 0 .../azureml_hyperdrive_wide_and_deep.ipynb | 0 .../04_model_select_and_optimize/nni_surprise_svd.ipynb | 0 .../04_model_select_and_optimize/tuning_spark_als.ipynb | 0 {notebooks => scenarios}/05_operationalize/README.md | 0 .../05_operationalize/aks_locust_load_test.ipynb | 0 {notebooks => scenarios}/05_operationalize/als_movie_o16n.ipynb | 0 .../05_operationalize/lightgbm_criteo_o16n.ipynb | 0 {notebooks => scenarios}/README.md | 0 {notebooks => scenarios}/run_notebook_on_azureml.ipynb | 0 {notebooks => scenarios}/template.ipynb | 0 {scripts => tools}/__init__.py | 0 {scripts => tools}/databricks_install.py | 0 {docker => tools/docker}/Dockerfile | 0 {docker => tools/docker}/README.md | 0 {scripts => tools}/generate_conda_file.py | 0 {scripts => tools}/generate_requirements_txt.py | 0 49 files changed, 0 insertions(+), 0 deletions(-) rename {notebooks => scenarios}/00_quick_start/README.md (100%) rename {notebooks => scenarios}/00_quick_start/als_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/dkn_synthetic.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/fastai_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/lightgbm_tinycriteo.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/ncf_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/rbm_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/rlrmc_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/sar_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/sar_movielens_with_azureml.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/sar_movieratings_with_azureml_designer.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/sequential_recsys_amazondataset.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/wide_deep_movielens.ipynb (100%) rename {notebooks => scenarios}/00_quick_start/xdeepfm_criteo.ipynb (100%) rename {notebooks => scenarios}/01_prepare_data/README.md (100%) rename {notebooks => scenarios}/01_prepare_data/data_split.ipynb (100%) rename {notebooks => scenarios}/01_prepare_data/data_transform.ipynb (100%) rename {notebooks => scenarios}/01_prepare_data/wikidata_knowledge_graph.ipynb (100%) rename {notebooks => scenarios}/02_model/README.md (100%) rename {notebooks => scenarios}/02_model/als_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/baseline_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/cornac_bpr_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/fm_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/mmlspark_lightgbm_criteo.ipynb (100%) rename {notebooks => scenarios}/02_model/ncf_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/rbm_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/sar_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/surprise_svd_deep_dive.ipynb (100%) rename {notebooks => scenarios}/02_model/vowpal_wabbit_deep_dive.ipynb (100%) rename {notebooks => scenarios}/03_evaluate/README.md (100%) rename {notebooks => scenarios}/03_evaluate/evaluation.ipynb (100%) rename {notebooks => scenarios}/04_model_select_and_optimize/README.md (100%) rename {notebooks => scenarios}/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb (100%) rename {notebooks => scenarios}/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb (100%) rename {notebooks => scenarios}/04_model_select_and_optimize/nni_surprise_svd.ipynb (100%) rename {notebooks => scenarios}/04_model_select_and_optimize/tuning_spark_als.ipynb (100%) rename {notebooks => scenarios}/05_operationalize/README.md (100%) rename {notebooks => scenarios}/05_operationalize/aks_locust_load_test.ipynb (100%) rename {notebooks => scenarios}/05_operationalize/als_movie_o16n.ipynb (100%) rename {notebooks => scenarios}/05_operationalize/lightgbm_criteo_o16n.ipynb (100%) rename {notebooks => scenarios}/README.md (100%) rename {notebooks => scenarios}/run_notebook_on_azureml.ipynb (100%) rename {notebooks => scenarios}/template.ipynb (100%) rename {scripts => tools}/__init__.py (100%) rename {scripts => tools}/databricks_install.py (100%) rename {docker => tools/docker}/Dockerfile (100%) rename {docker => tools/docker}/README.md (100%) rename {scripts => tools}/generate_conda_file.py (100%) rename {scripts => tools}/generate_requirements_txt.py (100%) diff --git a/notebooks/00_quick_start/README.md b/scenarios/00_quick_start/README.md similarity index 100% rename from notebooks/00_quick_start/README.md rename to scenarios/00_quick_start/README.md diff --git a/notebooks/00_quick_start/als_movielens.ipynb b/scenarios/00_quick_start/als_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/als_movielens.ipynb rename to scenarios/00_quick_start/als_movielens.ipynb diff --git a/notebooks/00_quick_start/dkn_synthetic.ipynb b/scenarios/00_quick_start/dkn_synthetic.ipynb similarity index 100% rename from notebooks/00_quick_start/dkn_synthetic.ipynb rename to scenarios/00_quick_start/dkn_synthetic.ipynb diff --git a/notebooks/00_quick_start/fastai_movielens.ipynb b/scenarios/00_quick_start/fastai_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/fastai_movielens.ipynb rename to scenarios/00_quick_start/fastai_movielens.ipynb diff --git a/notebooks/00_quick_start/lightgbm_tinycriteo.ipynb b/scenarios/00_quick_start/lightgbm_tinycriteo.ipynb similarity index 100% rename from notebooks/00_quick_start/lightgbm_tinycriteo.ipynb rename to scenarios/00_quick_start/lightgbm_tinycriteo.ipynb diff --git a/notebooks/00_quick_start/ncf_movielens.ipynb b/scenarios/00_quick_start/ncf_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/ncf_movielens.ipynb rename to scenarios/00_quick_start/ncf_movielens.ipynb diff --git a/notebooks/00_quick_start/rbm_movielens.ipynb b/scenarios/00_quick_start/rbm_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/rbm_movielens.ipynb rename to scenarios/00_quick_start/rbm_movielens.ipynb diff --git a/notebooks/00_quick_start/rlrmc_movielens.ipynb b/scenarios/00_quick_start/rlrmc_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/rlrmc_movielens.ipynb rename to scenarios/00_quick_start/rlrmc_movielens.ipynb diff --git a/notebooks/00_quick_start/sar_movielens.ipynb b/scenarios/00_quick_start/sar_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/sar_movielens.ipynb rename to scenarios/00_quick_start/sar_movielens.ipynb diff --git a/notebooks/00_quick_start/sar_movielens_with_azureml.ipynb b/scenarios/00_quick_start/sar_movielens_with_azureml.ipynb similarity index 100% rename from notebooks/00_quick_start/sar_movielens_with_azureml.ipynb rename to scenarios/00_quick_start/sar_movielens_with_azureml.ipynb diff --git a/notebooks/00_quick_start/sar_movieratings_with_azureml_designer.ipynb b/scenarios/00_quick_start/sar_movieratings_with_azureml_designer.ipynb similarity index 100% rename from notebooks/00_quick_start/sar_movieratings_with_azureml_designer.ipynb rename to scenarios/00_quick_start/sar_movieratings_with_azureml_designer.ipynb diff --git a/notebooks/00_quick_start/sequential_recsys_amazondataset.ipynb b/scenarios/00_quick_start/sequential_recsys_amazondataset.ipynb similarity index 100% rename from notebooks/00_quick_start/sequential_recsys_amazondataset.ipynb rename to scenarios/00_quick_start/sequential_recsys_amazondataset.ipynb diff --git a/notebooks/00_quick_start/wide_deep_movielens.ipynb b/scenarios/00_quick_start/wide_deep_movielens.ipynb similarity index 100% rename from notebooks/00_quick_start/wide_deep_movielens.ipynb rename to scenarios/00_quick_start/wide_deep_movielens.ipynb diff --git a/notebooks/00_quick_start/xdeepfm_criteo.ipynb b/scenarios/00_quick_start/xdeepfm_criteo.ipynb similarity index 100% rename from notebooks/00_quick_start/xdeepfm_criteo.ipynb rename to scenarios/00_quick_start/xdeepfm_criteo.ipynb diff --git a/notebooks/01_prepare_data/README.md b/scenarios/01_prepare_data/README.md similarity index 100% rename from notebooks/01_prepare_data/README.md rename to scenarios/01_prepare_data/README.md diff --git a/notebooks/01_prepare_data/data_split.ipynb b/scenarios/01_prepare_data/data_split.ipynb similarity index 100% rename from notebooks/01_prepare_data/data_split.ipynb rename to scenarios/01_prepare_data/data_split.ipynb diff --git a/notebooks/01_prepare_data/data_transform.ipynb b/scenarios/01_prepare_data/data_transform.ipynb similarity index 100% rename from notebooks/01_prepare_data/data_transform.ipynb rename to scenarios/01_prepare_data/data_transform.ipynb diff --git a/notebooks/01_prepare_data/wikidata_knowledge_graph.ipynb b/scenarios/01_prepare_data/wikidata_knowledge_graph.ipynb similarity index 100% rename from notebooks/01_prepare_data/wikidata_knowledge_graph.ipynb rename to scenarios/01_prepare_data/wikidata_knowledge_graph.ipynb diff --git a/notebooks/02_model/README.md b/scenarios/02_model/README.md similarity index 100% rename from notebooks/02_model/README.md rename to scenarios/02_model/README.md diff --git a/notebooks/02_model/als_deep_dive.ipynb b/scenarios/02_model/als_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/als_deep_dive.ipynb rename to scenarios/02_model/als_deep_dive.ipynb diff --git a/notebooks/02_model/baseline_deep_dive.ipynb b/scenarios/02_model/baseline_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/baseline_deep_dive.ipynb rename to scenarios/02_model/baseline_deep_dive.ipynb diff --git a/notebooks/02_model/cornac_bpr_deep_dive.ipynb b/scenarios/02_model/cornac_bpr_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/cornac_bpr_deep_dive.ipynb rename to scenarios/02_model/cornac_bpr_deep_dive.ipynb diff --git a/notebooks/02_model/fm_deep_dive.ipynb b/scenarios/02_model/fm_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/fm_deep_dive.ipynb rename to scenarios/02_model/fm_deep_dive.ipynb diff --git a/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb b/scenarios/02_model/mmlspark_lightgbm_criteo.ipynb similarity index 100% rename from notebooks/02_model/mmlspark_lightgbm_criteo.ipynb rename to scenarios/02_model/mmlspark_lightgbm_criteo.ipynb diff --git a/notebooks/02_model/ncf_deep_dive.ipynb b/scenarios/02_model/ncf_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/ncf_deep_dive.ipynb rename to scenarios/02_model/ncf_deep_dive.ipynb diff --git a/notebooks/02_model/rbm_deep_dive.ipynb b/scenarios/02_model/rbm_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/rbm_deep_dive.ipynb rename to scenarios/02_model/rbm_deep_dive.ipynb diff --git a/notebooks/02_model/sar_deep_dive.ipynb b/scenarios/02_model/sar_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/sar_deep_dive.ipynb rename to scenarios/02_model/sar_deep_dive.ipynb diff --git a/notebooks/02_model/surprise_svd_deep_dive.ipynb b/scenarios/02_model/surprise_svd_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/surprise_svd_deep_dive.ipynb rename to scenarios/02_model/surprise_svd_deep_dive.ipynb diff --git a/notebooks/02_model/vowpal_wabbit_deep_dive.ipynb b/scenarios/02_model/vowpal_wabbit_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/vowpal_wabbit_deep_dive.ipynb rename to scenarios/02_model/vowpal_wabbit_deep_dive.ipynb diff --git a/notebooks/03_evaluate/README.md b/scenarios/03_evaluate/README.md similarity index 100% rename from notebooks/03_evaluate/README.md rename to scenarios/03_evaluate/README.md diff --git a/notebooks/03_evaluate/evaluation.ipynb b/scenarios/03_evaluate/evaluation.ipynb similarity index 100% rename from notebooks/03_evaluate/evaluation.ipynb rename to scenarios/03_evaluate/evaluation.ipynb diff --git a/notebooks/04_model_select_and_optimize/README.md b/scenarios/04_model_select_and_optimize/README.md similarity index 100% rename from notebooks/04_model_select_and_optimize/README.md rename to scenarios/04_model_select_and_optimize/README.md diff --git a/notebooks/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb b/scenarios/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb similarity index 100% rename from notebooks/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb rename to scenarios/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb diff --git a/notebooks/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb b/scenarios/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb similarity index 100% rename from notebooks/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb rename to scenarios/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb diff --git a/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb b/scenarios/04_model_select_and_optimize/nni_surprise_svd.ipynb similarity index 100% rename from notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb rename to scenarios/04_model_select_and_optimize/nni_surprise_svd.ipynb diff --git a/notebooks/04_model_select_and_optimize/tuning_spark_als.ipynb b/scenarios/04_model_select_and_optimize/tuning_spark_als.ipynb similarity index 100% rename from notebooks/04_model_select_and_optimize/tuning_spark_als.ipynb rename to scenarios/04_model_select_and_optimize/tuning_spark_als.ipynb diff --git a/notebooks/05_operationalize/README.md b/scenarios/05_operationalize/README.md similarity index 100% rename from notebooks/05_operationalize/README.md rename to scenarios/05_operationalize/README.md diff --git a/notebooks/05_operationalize/aks_locust_load_test.ipynb b/scenarios/05_operationalize/aks_locust_load_test.ipynb similarity index 100% rename from notebooks/05_operationalize/aks_locust_load_test.ipynb rename to scenarios/05_operationalize/aks_locust_load_test.ipynb diff --git a/notebooks/05_operationalize/als_movie_o16n.ipynb b/scenarios/05_operationalize/als_movie_o16n.ipynb similarity index 100% rename from notebooks/05_operationalize/als_movie_o16n.ipynb rename to scenarios/05_operationalize/als_movie_o16n.ipynb diff --git a/notebooks/05_operationalize/lightgbm_criteo_o16n.ipynb b/scenarios/05_operationalize/lightgbm_criteo_o16n.ipynb similarity index 100% rename from notebooks/05_operationalize/lightgbm_criteo_o16n.ipynb rename to scenarios/05_operationalize/lightgbm_criteo_o16n.ipynb diff --git a/notebooks/README.md b/scenarios/README.md similarity index 100% rename from notebooks/README.md rename to scenarios/README.md diff --git a/notebooks/run_notebook_on_azureml.ipynb b/scenarios/run_notebook_on_azureml.ipynb similarity index 100% rename from notebooks/run_notebook_on_azureml.ipynb rename to scenarios/run_notebook_on_azureml.ipynb diff --git a/notebooks/template.ipynb b/scenarios/template.ipynb similarity index 100% rename from notebooks/template.ipynb rename to scenarios/template.ipynb diff --git a/scripts/__init__.py b/tools/__init__.py similarity index 100% rename from scripts/__init__.py rename to tools/__init__.py diff --git a/scripts/databricks_install.py b/tools/databricks_install.py similarity index 100% rename from scripts/databricks_install.py rename to tools/databricks_install.py diff --git a/docker/Dockerfile b/tools/docker/Dockerfile similarity index 100% rename from docker/Dockerfile rename to tools/docker/Dockerfile diff --git a/docker/README.md b/tools/docker/README.md similarity index 100% rename from docker/README.md rename to tools/docker/README.md diff --git a/scripts/generate_conda_file.py b/tools/generate_conda_file.py similarity index 100% rename from scripts/generate_conda_file.py rename to tools/generate_conda_file.py diff --git a/scripts/generate_requirements_txt.py b/tools/generate_requirements_txt.py similarity index 100% rename from scripts/generate_requirements_txt.py rename to tools/generate_requirements_txt.py From 5551d4a7bb8245e5a0555fba0587d7092000b787 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 16 Apr 2020 12:04:12 +0100 Subject: [PATCH 02/61] rename and refactor :boom: --- .../{02_model => 02_model_collaborative_filtering}/README.md | 0 .../als_deep_dive.ipynb | 0 .../baseline_deep_dive.ipynb | 0 .../cornac_bpr_deep_dive.ipynb | 0 .../fm_deep_dive.ipynb | 0 .../rbm_deep_dive.ipynb | 0 .../sar_deep_dive.ipynb | 0 .../surprise_svd_deep_dive.ipynb | 0 .../mmlspark_lightgbm_criteo.ipynb | 0 .../vowpal_wabbit_deep_dive.ipynb | 0 scenarios/{02_model => 02_model_hybrid}/ncf_deep_dive.ipynb | 0 {benchmarks => scenarios/06_benchmarks}/README.md | 0 {benchmarks => scenarios/06_benchmarks}/benchmark_utils.py | 0 {benchmarks => scenarios/06_benchmarks}/movielens.ipynb | 0 scenarios/COLD_START.md | 1 + scenarios/SCENARIO_ADS.md | 1 + scenarios/SCENARIO_ENTERTAINMENT.md | 1 + scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md | 1 + scenarios/SCENARIO_NEWS.md | 1 + scenarios/SCENARIO_RETAIL.md | 1 + scenarios/SCENARIO_TRAVEL.md | 1 + 21 files changed, 7 insertions(+) rename scenarios/{02_model => 02_model_collaborative_filtering}/README.md (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/als_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/baseline_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/cornac_bpr_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/fm_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/rbm_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/sar_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_collaborative_filtering}/surprise_svd_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_content_based_filtering}/mmlspark_lightgbm_criteo.ipynb (100%) rename scenarios/{02_model => 02_model_content_based_filtering}/vowpal_wabbit_deep_dive.ipynb (100%) rename scenarios/{02_model => 02_model_hybrid}/ncf_deep_dive.ipynb (100%) rename {benchmarks => scenarios/06_benchmarks}/README.md (100%) rename {benchmarks => scenarios/06_benchmarks}/benchmark_utils.py (100%) rename {benchmarks => scenarios/06_benchmarks}/movielens.ipynb (100%) create mode 100644 scenarios/COLD_START.md create mode 100644 scenarios/SCENARIO_ADS.md create mode 100644 scenarios/SCENARIO_ENTERTAINMENT.md create mode 100644 scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md create mode 100644 scenarios/SCENARIO_NEWS.md create mode 100644 scenarios/SCENARIO_RETAIL.md create mode 100644 scenarios/SCENARIO_TRAVEL.md diff --git a/scenarios/02_model/README.md b/scenarios/02_model_collaborative_filtering/README.md similarity index 100% rename from scenarios/02_model/README.md rename to scenarios/02_model_collaborative_filtering/README.md diff --git a/scenarios/02_model/als_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/als_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/als_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/als_deep_dive.ipynb diff --git a/scenarios/02_model/baseline_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/baseline_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/baseline_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/baseline_deep_dive.ipynb diff --git a/scenarios/02_model/cornac_bpr_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/cornac_bpr_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb diff --git a/scenarios/02_model/fm_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/fm_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/fm_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/fm_deep_dive.ipynb diff --git a/scenarios/02_model/rbm_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/rbm_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/rbm_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/rbm_deep_dive.ipynb diff --git a/scenarios/02_model/sar_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/sar_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/sar_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/sar_deep_dive.ipynb diff --git a/scenarios/02_model/surprise_svd_deep_dive.ipynb b/scenarios/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/surprise_svd_deep_dive.ipynb rename to scenarios/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb diff --git a/scenarios/02_model/mmlspark_lightgbm_criteo.ipynb b/scenarios/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb similarity index 100% rename from scenarios/02_model/mmlspark_lightgbm_criteo.ipynb rename to scenarios/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb diff --git a/scenarios/02_model/vowpal_wabbit_deep_dive.ipynb b/scenarios/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/vowpal_wabbit_deep_dive.ipynb rename to scenarios/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb diff --git a/scenarios/02_model/ncf_deep_dive.ipynb b/scenarios/02_model_hybrid/ncf_deep_dive.ipynb similarity index 100% rename from scenarios/02_model/ncf_deep_dive.ipynb rename to scenarios/02_model_hybrid/ncf_deep_dive.ipynb diff --git a/benchmarks/README.md b/scenarios/06_benchmarks/README.md similarity index 100% rename from benchmarks/README.md rename to scenarios/06_benchmarks/README.md diff --git a/benchmarks/benchmark_utils.py b/scenarios/06_benchmarks/benchmark_utils.py similarity index 100% rename from benchmarks/benchmark_utils.py rename to scenarios/06_benchmarks/benchmark_utils.py diff --git a/benchmarks/movielens.ipynb b/scenarios/06_benchmarks/movielens.ipynb similarity index 100% rename from benchmarks/movielens.ipynb rename to scenarios/06_benchmarks/movielens.ipynb diff --git a/scenarios/COLD_START.md b/scenarios/COLD_START.md new file mode 100644 index 0000000000..9c0bc8770e --- /dev/null +++ b/scenarios/COLD_START.md @@ -0,0 +1 @@ +# Managing Cold Start Scenarios in Recommendation Systems diff --git a/scenarios/SCENARIO_ADS.md b/scenarios/SCENARIO_ADS.md new file mode 100644 index 0000000000..f57b6bfe9f --- /dev/null +++ b/scenarios/SCENARIO_ADS.md @@ -0,0 +1 @@ +# Recommendation systems for Advertisement diff --git a/scenarios/SCENARIO_ENTERTAINMENT.md b/scenarios/SCENARIO_ENTERTAINMENT.md new file mode 100644 index 0000000000..bcc561a72e --- /dev/null +++ b/scenarios/SCENARIO_ENTERTAINMENT.md @@ -0,0 +1 @@ +# Recommendation systems for Entertainment diff --git a/scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md b/scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md new file mode 100644 index 0000000000..9cfe816539 --- /dev/null +++ b/scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md @@ -0,0 +1 @@ +# Recommendation systems for Food and Restaurants diff --git a/scenarios/SCENARIO_NEWS.md b/scenarios/SCENARIO_NEWS.md new file mode 100644 index 0000000000..42008b9c88 --- /dev/null +++ b/scenarios/SCENARIO_NEWS.md @@ -0,0 +1 @@ +# Recommendation systems for News diff --git a/scenarios/SCENARIO_RETAIL.md b/scenarios/SCENARIO_RETAIL.md new file mode 100644 index 0000000000..84ad705729 --- /dev/null +++ b/scenarios/SCENARIO_RETAIL.md @@ -0,0 +1 @@ +# Recommendation systems for Retail diff --git a/scenarios/SCENARIO_TRAVEL.md b/scenarios/SCENARIO_TRAVEL.md new file mode 100644 index 0000000000..a6e5a3b76a --- /dev/null +++ b/scenarios/SCENARIO_TRAVEL.md @@ -0,0 +1 @@ +# Recommendation systems for Travel From f6b04531d8d5ca88fd865fc68132fc75a52f5ff3 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 12:37:26 +0100 Subject: [PATCH 03/61] refact --- {scenarios => examples}/00_quick_start/README.md | 0 {scenarios => examples}/00_quick_start/als_movielens.ipynb | 0 {scenarios => examples}/00_quick_start/dkn_synthetic.ipynb | 0 {scenarios => examples}/00_quick_start/fastai_movielens.ipynb | 0 {scenarios => examples}/00_quick_start/lightgbm_tinycriteo.ipynb | 0 {scenarios => examples}/00_quick_start/ncf_movielens.ipynb | 0 {scenarios => examples}/00_quick_start/rbm_movielens.ipynb | 0 {scenarios => examples}/00_quick_start/rlrmc_movielens.ipynb | 0 {scenarios => examples}/00_quick_start/sar_movielens.ipynb | 0 .../00_quick_start/sar_movielens_with_azureml.ipynb | 0 .../00_quick_start/sar_movieratings_with_azureml_designer.ipynb | 0 .../00_quick_start/sequential_recsys_amazondataset.ipynb | 0 {scenarios => examples}/00_quick_start/wide_deep_movielens.ipynb | 0 {scenarios => examples}/00_quick_start/xdeepfm_criteo.ipynb | 0 {scenarios => examples}/01_prepare_data/README.md | 0 {scenarios => examples}/01_prepare_data/data_split.ipynb | 0 {scenarios => examples}/01_prepare_data/data_transform.ipynb | 0 .../01_prepare_data/wikidata_knowledge_graph.ipynb | 0 .../02_model_collaborative_filtering/README.md | 0 .../02_model_collaborative_filtering/als_deep_dive.ipynb | 0 .../02_model_collaborative_filtering/baseline_deep_dive.ipynb | 0 .../02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb | 0 .../02_model_collaborative_filtering/fm_deep_dive.ipynb | 0 .../02_model_collaborative_filtering/rbm_deep_dive.ipynb | 0 .../02_model_collaborative_filtering/sar_deep_dive.ipynb | 0 .../02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb | 0 .../mmlspark_lightgbm_criteo.ipynb | 0 .../vowpal_wabbit_deep_dive.ipynb | 0 {scenarios => examples}/02_model_hybrid/ncf_deep_dive.ipynb | 0 {scenarios => examples}/03_evaluate/README.md | 0 {scenarios => examples}/03_evaluate/evaluation.ipynb | 0 {scenarios => examples}/04_model_select_and_optimize/README.md | 0 .../azureml_hyperdrive_surprise_svd.ipynb | 0 .../azureml_hyperdrive_wide_and_deep.ipynb | 0 .../04_model_select_and_optimize/nni_surprise_svd.ipynb | 0 .../04_model_select_and_optimize/tuning_spark_als.ipynb | 0 {scenarios => examples}/05_operationalize/README.md | 0 .../05_operationalize/aks_locust_load_test.ipynb | 0 {scenarios => examples}/05_operationalize/als_movie_o16n.ipynb | 0 .../05_operationalize/lightgbm_criteo_o16n.ipynb | 0 {scenarios => examples}/06_benchmarks/README.md | 0 {scenarios => examples}/06_benchmarks/benchmark_utils.py | 0 {scenarios => examples}/06_benchmarks/movielens.ipynb | 0 {scenarios => examples}/COLD_START.md | 0 {scenarios => examples}/README.md | 0 {scenarios => examples}/SCENARIO_ADS.md | 0 {scenarios => examples}/SCENARIO_ENTERTAINMENT.md | 0 {scenarios => examples}/SCENARIO_FOOD_AND_RESTAURANTS.md | 0 {scenarios => examples}/SCENARIO_NEWS.md | 0 {scenarios => examples}/SCENARIO_RETAIL.md | 0 {scenarios => examples}/SCENARIO_TRAVEL.md | 0 {scenarios => examples}/run_notebook_on_azureml.ipynb | 0 {scenarios => examples}/template.ipynb | 0 53 files changed, 0 insertions(+), 0 deletions(-) rename {scenarios => examples}/00_quick_start/README.md (100%) rename {scenarios => examples}/00_quick_start/als_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/dkn_synthetic.ipynb (100%) rename {scenarios => examples}/00_quick_start/fastai_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/lightgbm_tinycriteo.ipynb (100%) rename {scenarios => examples}/00_quick_start/ncf_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/rbm_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/rlrmc_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/sar_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/sar_movielens_with_azureml.ipynb (100%) rename {scenarios => examples}/00_quick_start/sar_movieratings_with_azureml_designer.ipynb (100%) rename {scenarios => examples}/00_quick_start/sequential_recsys_amazondataset.ipynb (100%) rename {scenarios => examples}/00_quick_start/wide_deep_movielens.ipynb (100%) rename {scenarios => examples}/00_quick_start/xdeepfm_criteo.ipynb (100%) rename {scenarios => examples}/01_prepare_data/README.md (100%) rename {scenarios => examples}/01_prepare_data/data_split.ipynb (100%) rename {scenarios => examples}/01_prepare_data/data_transform.ipynb (100%) rename {scenarios => examples}/01_prepare_data/wikidata_knowledge_graph.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/README.md (100%) rename {scenarios => examples}/02_model_collaborative_filtering/als_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/baseline_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/fm_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/rbm_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/sar_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb (100%) rename {scenarios => examples}/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb (100%) rename {scenarios => examples}/02_model_hybrid/ncf_deep_dive.ipynb (100%) rename {scenarios => examples}/03_evaluate/README.md (100%) rename {scenarios => examples}/03_evaluate/evaluation.ipynb (100%) rename {scenarios => examples}/04_model_select_and_optimize/README.md (100%) rename {scenarios => examples}/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb (100%) rename {scenarios => examples}/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb (100%) rename {scenarios => examples}/04_model_select_and_optimize/nni_surprise_svd.ipynb (100%) rename {scenarios => examples}/04_model_select_and_optimize/tuning_spark_als.ipynb (100%) rename {scenarios => examples}/05_operationalize/README.md (100%) rename {scenarios => examples}/05_operationalize/aks_locust_load_test.ipynb (100%) rename {scenarios => examples}/05_operationalize/als_movie_o16n.ipynb (100%) rename {scenarios => examples}/05_operationalize/lightgbm_criteo_o16n.ipynb (100%) rename {scenarios => examples}/06_benchmarks/README.md (100%) rename {scenarios => examples}/06_benchmarks/benchmark_utils.py (100%) rename {scenarios => examples}/06_benchmarks/movielens.ipynb (100%) rename {scenarios => examples}/COLD_START.md (100%) rename {scenarios => examples}/README.md (100%) rename {scenarios => examples}/SCENARIO_ADS.md (100%) rename {scenarios => examples}/SCENARIO_ENTERTAINMENT.md (100%) rename {scenarios => examples}/SCENARIO_FOOD_AND_RESTAURANTS.md (100%) rename {scenarios => examples}/SCENARIO_NEWS.md (100%) rename {scenarios => examples}/SCENARIO_RETAIL.md (100%) rename {scenarios => examples}/SCENARIO_TRAVEL.md (100%) rename {scenarios => examples}/run_notebook_on_azureml.ipynb (100%) rename {scenarios => examples}/template.ipynb (100%) diff --git a/scenarios/00_quick_start/README.md b/examples/00_quick_start/README.md similarity index 100% rename from scenarios/00_quick_start/README.md rename to examples/00_quick_start/README.md diff --git a/scenarios/00_quick_start/als_movielens.ipynb b/examples/00_quick_start/als_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/als_movielens.ipynb rename to examples/00_quick_start/als_movielens.ipynb diff --git a/scenarios/00_quick_start/dkn_synthetic.ipynb b/examples/00_quick_start/dkn_synthetic.ipynb similarity index 100% rename from scenarios/00_quick_start/dkn_synthetic.ipynb rename to examples/00_quick_start/dkn_synthetic.ipynb diff --git a/scenarios/00_quick_start/fastai_movielens.ipynb b/examples/00_quick_start/fastai_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/fastai_movielens.ipynb rename to examples/00_quick_start/fastai_movielens.ipynb diff --git a/scenarios/00_quick_start/lightgbm_tinycriteo.ipynb b/examples/00_quick_start/lightgbm_tinycriteo.ipynb similarity index 100% rename from scenarios/00_quick_start/lightgbm_tinycriteo.ipynb rename to examples/00_quick_start/lightgbm_tinycriteo.ipynb diff --git a/scenarios/00_quick_start/ncf_movielens.ipynb b/examples/00_quick_start/ncf_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/ncf_movielens.ipynb rename to examples/00_quick_start/ncf_movielens.ipynb diff --git a/scenarios/00_quick_start/rbm_movielens.ipynb b/examples/00_quick_start/rbm_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/rbm_movielens.ipynb rename to examples/00_quick_start/rbm_movielens.ipynb diff --git a/scenarios/00_quick_start/rlrmc_movielens.ipynb b/examples/00_quick_start/rlrmc_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/rlrmc_movielens.ipynb rename to examples/00_quick_start/rlrmc_movielens.ipynb diff --git a/scenarios/00_quick_start/sar_movielens.ipynb b/examples/00_quick_start/sar_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/sar_movielens.ipynb rename to examples/00_quick_start/sar_movielens.ipynb diff --git a/scenarios/00_quick_start/sar_movielens_with_azureml.ipynb b/examples/00_quick_start/sar_movielens_with_azureml.ipynb similarity index 100% rename from scenarios/00_quick_start/sar_movielens_with_azureml.ipynb rename to examples/00_quick_start/sar_movielens_with_azureml.ipynb diff --git a/scenarios/00_quick_start/sar_movieratings_with_azureml_designer.ipynb b/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb similarity index 100% rename from scenarios/00_quick_start/sar_movieratings_with_azureml_designer.ipynb rename to examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb diff --git a/scenarios/00_quick_start/sequential_recsys_amazondataset.ipynb b/examples/00_quick_start/sequential_recsys_amazondataset.ipynb similarity index 100% rename from scenarios/00_quick_start/sequential_recsys_amazondataset.ipynb rename to examples/00_quick_start/sequential_recsys_amazondataset.ipynb diff --git a/scenarios/00_quick_start/wide_deep_movielens.ipynb b/examples/00_quick_start/wide_deep_movielens.ipynb similarity index 100% rename from scenarios/00_quick_start/wide_deep_movielens.ipynb rename to examples/00_quick_start/wide_deep_movielens.ipynb diff --git a/scenarios/00_quick_start/xdeepfm_criteo.ipynb b/examples/00_quick_start/xdeepfm_criteo.ipynb similarity index 100% rename from scenarios/00_quick_start/xdeepfm_criteo.ipynb rename to examples/00_quick_start/xdeepfm_criteo.ipynb diff --git a/scenarios/01_prepare_data/README.md b/examples/01_prepare_data/README.md similarity index 100% rename from scenarios/01_prepare_data/README.md rename to examples/01_prepare_data/README.md diff --git a/scenarios/01_prepare_data/data_split.ipynb b/examples/01_prepare_data/data_split.ipynb similarity index 100% rename from scenarios/01_prepare_data/data_split.ipynb rename to examples/01_prepare_data/data_split.ipynb diff --git a/scenarios/01_prepare_data/data_transform.ipynb b/examples/01_prepare_data/data_transform.ipynb similarity index 100% rename from scenarios/01_prepare_data/data_transform.ipynb rename to examples/01_prepare_data/data_transform.ipynb diff --git a/scenarios/01_prepare_data/wikidata_knowledge_graph.ipynb b/examples/01_prepare_data/wikidata_knowledge_graph.ipynb similarity index 100% rename from scenarios/01_prepare_data/wikidata_knowledge_graph.ipynb rename to examples/01_prepare_data/wikidata_knowledge_graph.ipynb diff --git a/scenarios/02_model_collaborative_filtering/README.md b/examples/02_model_collaborative_filtering/README.md similarity index 100% rename from scenarios/02_model_collaborative_filtering/README.md rename to examples/02_model_collaborative_filtering/README.md diff --git a/scenarios/02_model_collaborative_filtering/als_deep_dive.ipynb b/examples/02_model_collaborative_filtering/als_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/als_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/als_deep_dive.ipynb diff --git a/scenarios/02_model_collaborative_filtering/baseline_deep_dive.ipynb b/examples/02_model_collaborative_filtering/baseline_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/baseline_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/baseline_deep_dive.ipynb diff --git a/scenarios/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb b/examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb diff --git a/scenarios/02_model_collaborative_filtering/fm_deep_dive.ipynb b/examples/02_model_collaborative_filtering/fm_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/fm_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/fm_deep_dive.ipynb diff --git a/scenarios/02_model_collaborative_filtering/rbm_deep_dive.ipynb b/examples/02_model_collaborative_filtering/rbm_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/rbm_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/rbm_deep_dive.ipynb diff --git a/scenarios/02_model_collaborative_filtering/sar_deep_dive.ipynb b/examples/02_model_collaborative_filtering/sar_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/sar_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/sar_deep_dive.ipynb diff --git a/scenarios/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb b/examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb diff --git a/scenarios/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb similarity index 100% rename from scenarios/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb rename to examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb diff --git a/scenarios/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb b/examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb rename to examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb diff --git a/scenarios/02_model_hybrid/ncf_deep_dive.ipynb b/examples/02_model_hybrid/ncf_deep_dive.ipynb similarity index 100% rename from scenarios/02_model_hybrid/ncf_deep_dive.ipynb rename to examples/02_model_hybrid/ncf_deep_dive.ipynb diff --git a/scenarios/03_evaluate/README.md b/examples/03_evaluate/README.md similarity index 100% rename from scenarios/03_evaluate/README.md rename to examples/03_evaluate/README.md diff --git a/scenarios/03_evaluate/evaluation.ipynb b/examples/03_evaluate/evaluation.ipynb similarity index 100% rename from scenarios/03_evaluate/evaluation.ipynb rename to examples/03_evaluate/evaluation.ipynb diff --git a/scenarios/04_model_select_and_optimize/README.md b/examples/04_model_select_and_optimize/README.md similarity index 100% rename from scenarios/04_model_select_and_optimize/README.md rename to examples/04_model_select_and_optimize/README.md diff --git a/scenarios/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb b/examples/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb similarity index 100% rename from scenarios/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb rename to examples/04_model_select_and_optimize/azureml_hyperdrive_surprise_svd.ipynb diff --git a/scenarios/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb b/examples/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb similarity index 100% rename from scenarios/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb rename to examples/04_model_select_and_optimize/azureml_hyperdrive_wide_and_deep.ipynb diff --git a/scenarios/04_model_select_and_optimize/nni_surprise_svd.ipynb b/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb similarity index 100% rename from scenarios/04_model_select_and_optimize/nni_surprise_svd.ipynb rename to examples/04_model_select_and_optimize/nni_surprise_svd.ipynb diff --git a/scenarios/04_model_select_and_optimize/tuning_spark_als.ipynb b/examples/04_model_select_and_optimize/tuning_spark_als.ipynb similarity index 100% rename from scenarios/04_model_select_and_optimize/tuning_spark_als.ipynb rename to examples/04_model_select_and_optimize/tuning_spark_als.ipynb diff --git a/scenarios/05_operationalize/README.md b/examples/05_operationalize/README.md similarity index 100% rename from scenarios/05_operationalize/README.md rename to examples/05_operationalize/README.md diff --git a/scenarios/05_operationalize/aks_locust_load_test.ipynb b/examples/05_operationalize/aks_locust_load_test.ipynb similarity index 100% rename from scenarios/05_operationalize/aks_locust_load_test.ipynb rename to examples/05_operationalize/aks_locust_load_test.ipynb diff --git a/scenarios/05_operationalize/als_movie_o16n.ipynb b/examples/05_operationalize/als_movie_o16n.ipynb similarity index 100% rename from scenarios/05_operationalize/als_movie_o16n.ipynb rename to examples/05_operationalize/als_movie_o16n.ipynb diff --git a/scenarios/05_operationalize/lightgbm_criteo_o16n.ipynb b/examples/05_operationalize/lightgbm_criteo_o16n.ipynb similarity index 100% rename from scenarios/05_operationalize/lightgbm_criteo_o16n.ipynb rename to examples/05_operationalize/lightgbm_criteo_o16n.ipynb diff --git a/scenarios/06_benchmarks/README.md b/examples/06_benchmarks/README.md similarity index 100% rename from scenarios/06_benchmarks/README.md rename to examples/06_benchmarks/README.md diff --git a/scenarios/06_benchmarks/benchmark_utils.py b/examples/06_benchmarks/benchmark_utils.py similarity index 100% rename from scenarios/06_benchmarks/benchmark_utils.py rename to examples/06_benchmarks/benchmark_utils.py diff --git a/scenarios/06_benchmarks/movielens.ipynb b/examples/06_benchmarks/movielens.ipynb similarity index 100% rename from scenarios/06_benchmarks/movielens.ipynb rename to examples/06_benchmarks/movielens.ipynb diff --git a/scenarios/COLD_START.md b/examples/COLD_START.md similarity index 100% rename from scenarios/COLD_START.md rename to examples/COLD_START.md diff --git a/scenarios/README.md b/examples/README.md similarity index 100% rename from scenarios/README.md rename to examples/README.md diff --git a/scenarios/SCENARIO_ADS.md b/examples/SCENARIO_ADS.md similarity index 100% rename from scenarios/SCENARIO_ADS.md rename to examples/SCENARIO_ADS.md diff --git a/scenarios/SCENARIO_ENTERTAINMENT.md b/examples/SCENARIO_ENTERTAINMENT.md similarity index 100% rename from scenarios/SCENARIO_ENTERTAINMENT.md rename to examples/SCENARIO_ENTERTAINMENT.md diff --git a/scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md b/examples/SCENARIO_FOOD_AND_RESTAURANTS.md similarity index 100% rename from scenarios/SCENARIO_FOOD_AND_RESTAURANTS.md rename to examples/SCENARIO_FOOD_AND_RESTAURANTS.md diff --git a/scenarios/SCENARIO_NEWS.md b/examples/SCENARIO_NEWS.md similarity index 100% rename from scenarios/SCENARIO_NEWS.md rename to examples/SCENARIO_NEWS.md diff --git a/scenarios/SCENARIO_RETAIL.md b/examples/SCENARIO_RETAIL.md similarity index 100% rename from scenarios/SCENARIO_RETAIL.md rename to examples/SCENARIO_RETAIL.md diff --git a/scenarios/SCENARIO_TRAVEL.md b/examples/SCENARIO_TRAVEL.md similarity index 100% rename from scenarios/SCENARIO_TRAVEL.md rename to examples/SCENARIO_TRAVEL.md diff --git a/scenarios/run_notebook_on_azureml.ipynb b/examples/run_notebook_on_azureml.ipynb similarity index 100% rename from scenarios/run_notebook_on_azureml.ipynb rename to examples/run_notebook_on_azureml.ipynb diff --git a/scenarios/template.ipynb b/examples/template.ipynb similarity index 100% rename from scenarios/template.ipynb rename to examples/template.ipynb From 6db4148fca7211270c653bafcff083ce825d0fb6 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 12:42:10 +0100 Subject: [PATCH 04/61] scenarios --- {examples => scenarios}/COLD_START.md | 0 scenarios/README.md | 12 ++++++++++++ examples/SCENARIO_ADS.md => scenarios/ads/README.md | 0 .../entertainment/README.md | 0 .../food_and_restaurants/README.md | 0 .../SCENARIO_NEWS.md => scenarios/news/README.md | 0 .../SCENARIO_RETAIL.md => scenarios/retail/README.md | 0 .../SCENARIO_TRAVEL.md => scenarios/travel/README.md | 0 8 files changed, 12 insertions(+) rename {examples => scenarios}/COLD_START.md (100%) create mode 100644 scenarios/README.md rename examples/SCENARIO_ADS.md => scenarios/ads/README.md (100%) rename examples/SCENARIO_ENTERTAINMENT.md => scenarios/entertainment/README.md (100%) rename examples/SCENARIO_FOOD_AND_RESTAURANTS.md => scenarios/food_and_restaurants/README.md (100%) rename examples/SCENARIO_NEWS.md => scenarios/news/README.md (100%) rename examples/SCENARIO_RETAIL.md => scenarios/retail/README.md (100%) rename examples/SCENARIO_TRAVEL.md => scenarios/travel/README.md (100%) diff --git a/examples/COLD_START.md b/scenarios/COLD_START.md similarity index 100% rename from examples/COLD_START.md rename to scenarios/COLD_START.md diff --git a/scenarios/README.md b/scenarios/README.md new file mode 100644 index 0000000000..92dea9b14d --- /dev/null +++ b/scenarios/README.md @@ -0,0 +1,12 @@ +# Recommendation System Scenarios + +On this section there is listed a number of business scenarios that are common in Recommendation Systems. + +The list of scenarios are: + +* Ads +* Entertainment +* Food and restaurants +* News +* Retail +* Travel diff --git a/examples/SCENARIO_ADS.md b/scenarios/ads/README.md similarity index 100% rename from examples/SCENARIO_ADS.md rename to scenarios/ads/README.md diff --git a/examples/SCENARIO_ENTERTAINMENT.md b/scenarios/entertainment/README.md similarity index 100% rename from examples/SCENARIO_ENTERTAINMENT.md rename to scenarios/entertainment/README.md diff --git a/examples/SCENARIO_FOOD_AND_RESTAURANTS.md b/scenarios/food_and_restaurants/README.md similarity index 100% rename from examples/SCENARIO_FOOD_AND_RESTAURANTS.md rename to scenarios/food_and_restaurants/README.md diff --git a/examples/SCENARIO_NEWS.md b/scenarios/news/README.md similarity index 100% rename from examples/SCENARIO_NEWS.md rename to scenarios/news/README.md diff --git a/examples/SCENARIO_RETAIL.md b/scenarios/retail/README.md similarity index 100% rename from examples/SCENARIO_RETAIL.md rename to scenarios/retail/README.md diff --git a/examples/SCENARIO_TRAVEL.md b/scenarios/travel/README.md similarity index 100% rename from examples/SCENARIO_TRAVEL.md rename to scenarios/travel/README.md From 485c1f1e71aab31b6f2ddddf852b806ba1a2ecc0 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 12:49:00 +0100 Subject: [PATCH 05/61] retail --- scenarios/retail/README.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 84ad705729..97eff606e0 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -1 +1,19 @@ # Recommendation systems for Retail + +An increasing number of online companies are utilizing recommendation systems to increase user interaction and enrich shopping potential. Use cases of recommendation systems have been expanding rapidly across many aspects of eCommerce and online media over the last 4-5 years, and we expect this trend to continue. + +Companies across many different areas of enterprise are beginning to implement recommendation systems in an attempt to enhance their customer’s online purchasing experience, increase sales and retain customers. Business owners are recognizing potential in the fact that recommendation systems allow the collection of a huge amount of information relating to user’s behavior and their transactions within an enterprise. This information can then be systematically stored within user profiles to be used for future interactions. + +## Types of Recommendation Systems for Retail + +Typically recommendation systems in retail can be divided into three categories: + +* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. + +* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. + +* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. + + + +sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), \ No newline at end of file From 02bec05b63af54386d483157e2651f5594afebc8 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 13:02:01 +0100 Subject: [PATCH 06/61] retail --- scenarios/retail/README.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 97eff606e0..e298c223a9 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -4,6 +4,25 @@ An increasing number of online companies are utilizing recommendation systems to Companies across many different areas of enterprise are beginning to implement recommendation systems in an attempt to enhance their customer’s online purchasing experience, increase sales and retain customers. Business owners are recognizing potential in the fact that recommendation systems allow the collection of a huge amount of information relating to user’s behavior and their transactions within an enterprise. This information can then be systematically stored within user profiles to be used for future interactions. +## Typical Business Scenarios in Recommendation Systems for Retail + +The most common scenarios companies use are: + +* Others you may like (also called similar items): The "Others you may like" recommendation predicts the next product that a user is most likely to engage with or purchase. The prediction is based on both the entire shopping or viewing history of the user and the candidate product's relevance to a current specified product. + +* Frequently bought together"(shopping cart expansion): The "Frequently bought together" recommendation predicts items frequently bought together for a specific product within the same shopping session. If a list of products is being viewed, then it predicts items frequently bought with that product list. This recommendation is useful when the user has indicated an intent to purchase a particular product (or list of products) already, and you are looking to recommend complements (as opposed to substitutes). This recommendation is commonly displayed on the "add to cart" page, or on the "shopping cart" or "registry" pages (for shopping cart expansion). + +* Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. + + +## Business success metrics + +Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: + + + + + ## Types of Recommendation Systems for Retail Typically recommendation systems in retail can be divided into three categories: @@ -15,5 +34,11 @@ Typically recommendation systems in retail can be divided into three categories: * Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. +## Challenges in Recommendation systems for Retail + +* Cold start: Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. + + +## References and resources -sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), \ No newline at end of file +sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), [2](https://cloud.google.com/recommendations-ai/docs/placements), [3](https://www.researchgate.net/post/Can_anyone_explain_what_is_cold_start_problem_in_recommender_system) \ No newline at end of file From 9c359834333b9c22cf9e3b74dc19f4e413a4fb8a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 13:11:14 +0100 Subject: [PATCH 07/61] retail --- scenarios/retail/README.md | 44 ++++++++++++++++++++++++++++++++------ 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index e298c223a9..f08603ce77 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -15,29 +15,59 @@ The most common scenarios companies use are: * Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. -## Business success metrics +## Types of Recommendation Systems for Retail -Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: +Typically recommendation systems in retail can be divided into three categories: +* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. +* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. +* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. -## Types of Recommendation Systems for Retail +## Measuring Recommendation performance -Typically recommendation systems in retail can be divided into three categories: +### Machine learning metrics (offline metrics) -* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. +In Recommenders, offine metrics implementation for python are found on [python_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/python_evaluation.py) and those for PySpark are found on [spark_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/spark_evaluation.py). -* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. +Currently available metrics include: -* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. +- Root Mean Squared Error +- Mean Absolute Error +- R2 +- Explained Variance +- Precision at K +- Recall at K +- Normalized Discounted Cumulative Gain at K +- Mean Average Precision at K +- Area Under Curve +- Logistic Loss + +### Business success metrics (online metrics) + +Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: + +* Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. +* Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. + +* Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. + +### Relationship between online and offline metrics + +### A/B testing + +### Advanced A/B testing: online learning with VW ## Challenges in Recommendation systems for Retail * Cold start: Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. +* Long tail products: + +## ## References and resources From 3e3756c63eede791f0ece338c8e91e2216f58d59 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 13:21:24 +0100 Subject: [PATCH 08/61] retail --- scenarios/retail/README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index f08603ce77..49e9fd40e6 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -67,7 +67,13 @@ Below are some of the various potential benefits of recommendation systems in bu * Long tail products: -## +## Building end 2 end recommendation scenarios with Microsoft Recommenders + +In the repository we have the following examples that can be used in retail + + + + ## References and resources From 7864b8e437b3fd46429fb4c9ac5d539e2be295be Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 27 Apr 2020 13:29:04 +0100 Subject: [PATCH 09/61] retail --- scenarios/retail/README.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 49e9fd40e6..6fd2534268 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -67,11 +67,29 @@ Below are some of the various potential benefits of recommendation systems in bu * Long tail products: +## Data types + + * Explicit interactions: + + * Implicit interactions: + + * Knowledge graph data: + + * User features: + + * Item features: + + Thoughts about data size... + ## Building end 2 end recommendation scenarios with Microsoft Recommenders In the repository we have the following examples that can be used in retail +| Scenario | Description | Algorithm | Implementation | +|----------|-------------|-----------|----------------| +| Collaborative Filtering with explicit interactions in Spark environment | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | +| Content-Based Filtering for content recommendation in Spark environment | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | From 44772c7e243fd9ec266f18b932258bca9bc196d6 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 21 May 2020 13:35:25 +0100 Subject: [PATCH 10/61] comments @yueguoguo --- scenarios/COLD_START.md | 1 - scenarios/README.md | 64 +++++++++++++++++++++++++++++++++++++- scenarios/retail/README.md | 43 +++---------------------- 3 files changed, 68 insertions(+), 40 deletions(-) delete mode 100644 scenarios/COLD_START.md diff --git a/scenarios/COLD_START.md b/scenarios/COLD_START.md deleted file mode 100644 index 9c0bc8770e..0000000000 --- a/scenarios/COLD_START.md +++ /dev/null @@ -1 +0,0 @@ -# Managing Cold Start Scenarios in Recommendation Systems diff --git a/scenarios/README.md b/scenarios/README.md index 92dea9b14d..c5b4619cc7 100644 --- a/scenarios/README.md +++ b/scenarios/README.md @@ -1,6 +1,6 @@ # Recommendation System Scenarios -On this section there is listed a number of business scenarios that are common in Recommendation Systems. +On this section there is listed a number of business scenarios that are common in Recommendation Systems (RS). The list of scenarios are: @@ -10,3 +10,65 @@ The list of scenarios are: * News * Retail * Travel + +## Types of Recommendation Systems + +Typically recommendation systems in retail can be divided into three categories: + +* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. + +* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. + +* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. + +## Data in Recommendation Systems + +### Data types + +* Explicit interactions: + +* Implicit interactions: + +* Knowledge graph data: + +* User features: + +* Item features: + +### Considerations about data size + +The size of the data is important when designing the system... + + +## Metrics + +In RS, there are two types of metrics: offline and online metrics. + +### Machine learning metrics (offline metrics) + +In Recommenders, offine metrics implementation for python are found on [python_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/python_evaluation.py) and those for PySpark are found on [spark_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/spark_evaluation.py). + +Currently available metrics include: + +- Root Mean Squared Error +- Mean Absolute Error +- R2 +- Explained Variance +- Precision at K +- Recall at K +- Normalized Discounted Cumulative Gain at K +- Mean Average Precision at K +- Area Under Curve +- Logistic Loss + + +### Business success metrics (online metrics) + +Online metrics are specific on the business scenario. More details can be found on each scenario folder. + +## Managing Cold Start Scenarios in Recommendation Systems + +.... + + + diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 6fd2534268..85a6793837 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -15,35 +15,11 @@ The most common scenarios companies use are: * Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. -## Types of Recommendation Systems for Retail - -Typically recommendation systems in retail can be divided into three categories: - -* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. - -* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. - -* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. - - ## Measuring Recommendation performance ### Machine learning metrics (offline metrics) -In Recommenders, offine metrics implementation for python are found on [python_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/python_evaluation.py) and those for PySpark are found on [spark_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/spark_evaluation.py). - -Currently available metrics include: - -- Root Mean Squared Error -- Mean Absolute Error -- R2 -- Explained Variance -- Precision at K -- Recall at K -- Normalized Discounted Cumulative Gain at K -- Mean Average Precision at K -- Area Under Curve -- Logistic Loss +Please [see the main metrics description]() for understanding machine learning metrics. ### Business success metrics (online metrics) @@ -55,7 +31,10 @@ Below are some of the various potential benefits of recommendation systems in bu * Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. -### Relationship between online and offline metrics +### Relationship between online and offline metrics in retail + +There is some literature about the relationship between offline and online metrics... + ### A/B testing @@ -67,19 +46,7 @@ Below are some of the various potential benefits of recommendation systems in bu * Long tail products: -## Data types - - * Explicit interactions: - - * Implicit interactions: - - * Knowledge graph data: - - * User features: - - * Item features: - Thoughts about data size... ## Building end 2 end recommendation scenarios with Microsoft Recommenders From 5db328f7d28e37fc97497a4d567660d5538ab73d Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 4 Jun 2020 12:37:26 +0100 Subject: [PATCH 11/61] advance --- scenarios/README.md | 23 +++++++---------------- scenarios/retail/README.md | 11 +++++++++-- 2 files changed, 16 insertions(+), 18 deletions(-) diff --git a/scenarios/README.md b/scenarios/README.md index c5b4619cc7..af98cd720e 100644 --- a/scenarios/README.md +++ b/scenarios/README.md @@ -1,25 +1,16 @@ # Recommendation System Scenarios -On this section there is listed a number of business scenarios that are common in Recommendation Systems (RS). +On this section there is listed a number of business scenarios that are common in Recommendation Systems. The list of scenarios are: -* Ads -* Entertainment -* Food and restaurants -* News -* Retail -* Travel +* [Ads](ads) +* [Entertainment](entertainment) +* [Food and restaurants](food_and_restaurants) +* [News and document]() +* [Retail](retail) +* [Travel](travel) -## Types of Recommendation Systems - -Typically recommendation systems in retail can be divided into three categories: - -* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. - -* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. - -* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. ## Data in Recommendation Systems diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 85a6793837..9aae527ab4 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -1,6 +1,6 @@ # Recommendation systems for Retail -An increasing number of online companies are utilizing recommendation systems to increase user interaction and enrich shopping potential. Use cases of recommendation systems have been expanding rapidly across many aspects of eCommerce and online media over the last 4-5 years, and we expect this trend to continue. +An increasing number of online companies are utilizing recommendation systems (RS) to increase user interaction and enrich shopping potential. Use cases of recommendation systems have been expanding rapidly across many aspects of eCommerce and online media over the last 4-5 years, and we expect this trend to continue. Companies across many different areas of enterprise are beginning to implement recommendation systems in an attempt to enhance their customer’s online purchasing experience, increase sales and retain customers. Business owners are recognizing potential in the fact that recommendation systems allow the collection of a huge amount of information relating to user’s behavior and their transactions within an enterprise. This information can then be systematically stored within user profiles to be used for future interactions. @@ -14,12 +14,19 @@ The most common scenarios companies use are: * Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. +From a technical perspective, RS can be grouped in three categories: + +* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). + +* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). + +* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). ## Measuring Recommendation performance ### Machine learning metrics (offline metrics) -Please [see the main metrics description]() for understanding machine learning metrics. +Offline metrics in RS are based on rating, ranking, classification or diversity. For learning more about offline metrics, see the [definitions available in Recommenders repository](../../examples/03_evaluate) ### Business success metrics (online metrics) From 8eb19faa980c0cb903fb9cc0e0a75dc53009021e Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 4 Jun 2020 12:57:07 +0100 Subject: [PATCH 12/61] advance --- scenarios/README.md | 49 -------------------------------------- scenarios/retail/README.md | 25 +++++++++++++++++++ 2 files changed, 25 insertions(+), 49 deletions(-) diff --git a/scenarios/README.md b/scenarios/README.md index af98cd720e..e3d4c463e5 100644 --- a/scenarios/README.md +++ b/scenarios/README.md @@ -12,54 +12,5 @@ The list of scenarios are: * [Travel](travel) -## Data in Recommendation Systems - -### Data types - -* Explicit interactions: - -* Implicit interactions: - -* Knowledge graph data: - -* User features: - -* Item features: - -### Considerations about data size - -The size of the data is important when designing the system... - - -## Metrics - -In RS, there are two types of metrics: offline and online metrics. - -### Machine learning metrics (offline metrics) - -In Recommenders, offine metrics implementation for python are found on [python_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/python_evaluation.py) and those for PySpark are found on [spark_evaluation.py](https://github.com/microsoft/recommenders/blob/master/reco_utils/evaluation/spark_evaluation.py). - -Currently available metrics include: - -- Root Mean Squared Error -- Mean Absolute Error -- R2 -- Explained Variance -- Precision at K -- Recall at K -- Normalized Discounted Cumulative Gain at K -- Mean Average Precision at K -- Area Under Curve -- Logistic Loss - - -### Business success metrics (online metrics) - -Online metrics are specific on the business scenario. More details can be found on each scenario folder. - -## Managing Cold Start Scenarios in Recommendation Systems - -.... - diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 9aae527ab4..39ef11b3e3 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -22,6 +22,31 @@ From a technical perspective, RS can be grouped in three categories: * Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). +## Data in Recommendation Systems + +### Data types + +In RS for retail there are typically the following types of data + +* Explicit interactions: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. In retail, this kind of data is not very common. + +* Implicit interactions: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. + +* User features: These include all information that define the user, some examples can be name, address, email, demographics, etc. + +* Item features: These include information about the item, some examples can be SKU, description, brand, price, etc. + +* Knowledge graph data: ... + +### Considerations about data size + +The size of the data is important when designing the system... + +### Cold start scenarios + +.... + + ## Measuring Recommendation performance ### Machine learning metrics (offline metrics) From f3ddcaec936816ec596ba940ed995f512839ad4d Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 4 Jun 2020 13:28:06 +0100 Subject: [PATCH 13/61] advance --- scenarios/retail/README.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 39ef11b3e3..e109ecc3ff 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -14,7 +14,7 @@ The most common scenarios companies use are: * Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. -From a technical perspective, RS can be grouped in three categories: +From a technical perspective, RS can be grouped in these categories [1]: * Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). @@ -22,6 +22,8 @@ From a technical perspective, RS can be grouped in three categories: * Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). +* Knowledge-base: ... + ## Data in Recommendation Systems ### Data types @@ -44,8 +46,11 @@ The size of the data is important when designing the system... ### Cold start scenarios -.... +Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. + +### Long tail products +Typically, the shape of items interacted in retail follow a long tail distribution [1,2]. ## Measuring Recommendation performance @@ -72,19 +77,12 @@ There is some literature about the relationship between offline and online metri ### Advanced A/B testing: online learning with VW -## Challenges in Recommendation systems for Retail - -* Cold start: Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. - -* Long tail products: +... - - -## Building end 2 end recommendation scenarios with Microsoft Recommenders +## Examples of end 2 end recommendation scenarios with Microsoft Recommenders In the repository we have the following examples that can be used in retail - | Scenario | Description | Algorithm | Implementation | |----------|-------------|-----------|----------------| | Collaborative Filtering with explicit interactions in Spark environment | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | @@ -94,4 +92,8 @@ In the repository we have the following examples that can be used in retail ## References and resources +[1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. +[2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). +[3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). + sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), [2](https://cloud.google.com/recommendations-ai/docs/placements), [3](https://www.researchgate.net/post/Can_anyone_explain_what_is_cold_start_problem_in_recommender_system) \ No newline at end of file From f01dcb6b9189a65d244573fdff25cb4acd2d0d62 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 4 Jun 2020 14:28:57 +0100 Subject: [PATCH 14/61] review --- scenarios/retail/README.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index e109ecc3ff..0e19d6ac3b 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -14,15 +14,6 @@ The most common scenarios companies use are: * Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. -From a technical perspective, RS can be grouped in these categories [1]: - -* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). - -* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). - -* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). - -* Knowledge-base: ... ## Data in Recommendation Systems @@ -81,6 +72,16 @@ There is some literature about the relationship between offline and online metri ## Examples of end 2 end recommendation scenarios with Microsoft Recommenders +From a technical perspective, RS can be grouped in these categories [1]: + +* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). + +* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). + +* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). + +* Knowledge-base: ... + In the repository we have the following examples that can be used in retail | Scenario | Description | Algorithm | Implementation | From 1e78d527b2754b45198e6b3a7a72e3a9c2bd7d0c Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 11 Jun 2020 12:20:59 +0100 Subject: [PATCH 15/61] scenarios --- scenarios/retail/README.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 0e19d6ac3b..3f94095bca 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -1,18 +1,22 @@ # Recommendation systems for Retail -An increasing number of online companies are utilizing recommendation systems (RS) to increase user interaction and enrich shopping potential. Use cases of recommendation systems have been expanding rapidly across many aspects of eCommerce and online media over the last 4-5 years, and we expect this trend to continue. +Retail is one of the areas where recommendation systems have been more successful. According to [McKinsey](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#), 35% of what consumers purchase on Amazon and 75% of what they watch on Netflix come from product recommendations. -Companies across many different areas of enterprise are beginning to implement recommendation systems in an attempt to enhance their customer’s online purchasing experience, increase sales and retain customers. Business owners are recognizing potential in the fact that recommendation systems allow the collection of a huge amount of information relating to user’s behavior and their transactions within an enterprise. This information can then be systematically stored within user profiles to be used for future interactions. +An increasing number of online retailers are utilizing recommendation systems to increase revenue, improve customer engagement and satisfaction, increase time on the page, enhance customer’s purchasing experience, gain understanding about customers, expand the shopping cart, etc. ## Typical Business Scenarios in Recommendation Systems for Retail -The most common scenarios companies use are: +The most common scenarios retailers use are: -* Others you may like (also called similar items): The "Others you may like" recommendation predicts the next product that a user is most likely to engage with or purchase. The prediction is based on both the entire shopping or viewing history of the user and the candidate product's relevance to a current specified product. +* Personalized recommendation: This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. -* Frequently bought together"(shopping cart expansion): The "Frequently bought together" recommendation predicts items frequently bought together for a specific product within the same shopping session. If a list of products is being viewed, then it predicts items frequently bought with that product list. This recommendation is useful when the user has indicated an intent to purchase a particular product (or list of products) already, and you are looking to recommend complements (as opposed to substitutes). This recommendation is commonly displayed on the "add to cart" page, or on the "shopping cart" or "registry" pages (for shopping cart expansion). +* You might also like: This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. -* Recommended for you: The "Recommended for you" recommendation predicts the next product that a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This recommendation is typically used on the home page. +* Frequently bought together: This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. + +* Similar alternatives: This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. + +* Recommendations of product subset. In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. ## Data in Recommendation Systems @@ -97,4 +101,5 @@ In the repository we have the following examples that can be used in retail [2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). [3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). + sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), [2](https://cloud.google.com/recommendations-ai/docs/placements), [3](https://www.researchgate.net/post/Can_anyone_explain_what_is_cold_start_problem_in_recommender_system) \ No newline at end of file From 60d9587fa4e4c9585223b4009e4298b48f6e449c Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 11 Jun 2020 12:28:58 +0100 Subject: [PATCH 16/61] structure change --- scenarios/retail/README2.md | 118 ++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 scenarios/retail/README2.md diff --git a/scenarios/retail/README2.md b/scenarios/retail/README2.md new file mode 100644 index 0000000000..ffa9f4eea7 --- /dev/null +++ b/scenarios/retail/README2.md @@ -0,0 +1,118 @@ +# Recommendation systems for Retail + +Retail is one of the areas where recommendation systems have been more successful. According to [McKinsey](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#), 35% of what consumers purchase on Amazon and 75% of what they watch on Netflix come from product recommendations. + +An increasing number of online retailers are utilizing recommendation systems to increase revenue, improve customer engagement and satisfaction, increase time on the page, enhance customer’s purchasing experience, gain understanding about customers, expand the shopping cart, etc. + +Next we will list the most common scenarios retailers use + +## Personalized recommendation + +This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. + +The kind of data + +## You might also like + +This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. + +## Frequently bought together + +This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. + +## Similar alternatives + +This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. + +## Recommendations of product subset + +In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. + + + + + +## Data in Recommendation Systems + +### Data types + +In RS for retail there are typically the following types of data + +* Explicit interactions: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. In retail, this kind of data is not very common. + +* Implicit interactions: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. + +* User features: These include all information that define the user, some examples can be name, address, email, demographics, etc. + +* Item features: These include information about the item, some examples can be SKU, description, brand, price, etc. + +* Knowledge graph data: ... + +### Considerations about data size + +The size of the data is important when designing the system... + +### Cold start scenarios + +Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. + +### Long tail products + +Typically, the shape of items interacted in retail follow a long tail distribution [1,2]. + +## Measuring Recommendation performance + +### Machine learning metrics (offline metrics) + +Offline metrics in RS are based on rating, ranking, classification or diversity. For learning more about offline metrics, see the [definitions available in Recommenders repository](../../examples/03_evaluate) + +### Business success metrics (online metrics) + +Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: + +* Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. + +* Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. + +* Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. + +### Relationship between online and offline metrics in retail + +There is some literature about the relationship between offline and online metrics... + + +### A/B testing + +### Advanced A/B testing: online learning with VW + +... + +## Examples of end 2 end recommendation scenarios with Microsoft Recommenders + +From a technical perspective, RS can be grouped in these categories [1]: + +* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). + +* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). + +* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). + +* Knowledge-base: ... + +In the repository we have the following examples that can be used in retail + +| Scenario | Description | Algorithm | Implementation | +|----------|-------------|-----------|----------------| +| Collaborative Filtering with explicit interactions in Spark environment | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | +| Content-Based Filtering for content recommendation in Spark environment | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | + + + +## References and resources + +[1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. +[2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). +[3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). + + +sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), [2](https://cloud.google.com/recommendations-ai/docs/placements), [3](https://www.researchgate.net/post/Can_anyone_explain_what_is_cold_start_problem_in_recommender_system) \ No newline at end of file From 7f44a9df23c04e5d1aa928a997a6689b7cb66a57 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 11 Jun 2020 12:44:16 +0100 Subject: [PATCH 17/61] glossary --- scenarios/GLOSSARY.md | 26 ++++++++++++++++++++++++++ scenarios/retail/README2.md | 32 ++++++++++++++++++++------------ 2 files changed, 46 insertions(+), 12 deletions(-) create mode 100644 scenarios/GLOSSARY.md diff --git a/scenarios/GLOSSARY.md b/scenarios/GLOSSARY.md new file mode 100644 index 0000000000..a073d3d775 --- /dev/null +++ b/scenarios/GLOSSARY.md @@ -0,0 +1,26 @@ + + +* Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. + +* Collaborative filtering algorithms: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). + +* Content-based filtering algorithms: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). + +* Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. + +* Diversity metrics: + +* Hybrid filtering algorithms: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). + +* Knowledge-base algorithms: ... + +* Ranking metrics: + +* Rating metrics: + +* Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. + + + + + diff --git a/scenarios/retail/README2.md b/scenarios/retail/README2.md index ffa9f4eea7..7a98805b94 100644 --- a/scenarios/retail/README2.md +++ b/scenarios/retail/README2.md @@ -4,18 +4,36 @@ Retail is one of the areas where recommendation systems have been more successfu An increasing number of online retailers are utilizing recommendation systems to increase revenue, improve customer engagement and satisfaction, increase time on the page, enhance customer’s purchasing experience, gain understanding about customers, expand the shopping cart, etc. -Next we will list the most common scenarios retailers use +Next we will list the most common scenarios retailers use. ## Personalized recommendation This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. -The kind of data +The kind of data these kind of scenarios need is: + +* Interactions: + +* User information: + +* Item information: + +To measure the performance of the personalized recommendation machine learning algorithm, it is common to use [ranking metrics](../GLOSSARY.md). In production, the metrics used are CTR, + ## You might also like This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. +The kind of data these kind of scenarios need is: + +* Interactions: + +* User information: + +* Item information: + + ## Frequently bought together This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. @@ -70,11 +88,7 @@ Offline metrics in RS are based on rating, ranking, classification or diversity. Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: -* Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. -* Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. - -* Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. ### Relationship between online and offline metrics in retail @@ -91,13 +105,7 @@ There is some literature about the relationship between offline and online metri From a technical perspective, RS can be grouped in these categories [1]: -* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). - -* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). - -* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). -* Knowledge-base: ... In the repository we have the following examples that can be used in retail From 61923c747240ffcb79c0b095f6da94779c5431d1 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 11 Jun 2020 12:59:27 +0100 Subject: [PATCH 18/61] :boom: --- scenarios/GLOSSARY.md | 23 +++++++++ scenarios/retail/README2.md | 99 +++++++------------------------------ 2 files changed, 42 insertions(+), 80 deletions(-) diff --git a/scenarios/GLOSSARY.md b/scenarios/GLOSSARY.md index a073d3d775..f27c1f0162 100644 --- a/scenarios/GLOSSARY.md +++ b/scenarios/GLOSSARY.md @@ -1,7 +1,11 @@ +# Glossary +* A/B testing: * Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. +* Cold-start problem: The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. + * Collaborative filtering algorithms: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). * Content-based filtering algorithms: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). @@ -10,16 +14,35 @@ * Diversity metrics: +* Explicit interaction data: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. In retail, this kind of data is not very common. + * Hybrid filtering algorithms: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). +* Implicit interaction data: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. + +* Item information: These include information about the item, some examples can be SKU, description, brand, price, etc. + * Knowledge-base algorithms: ... +* Knowledge graph data: ... + +* Long tail products: Typically, the shape of items interacted in retail follow a long tail distribution [1,2].... + * Ranking metrics: * Rating metrics: * Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. +* User information: These include all information that define the user, some examples can be name, address, email, demographics, etc. + +## References and resources + +[1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. +[2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). +[3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). + + diff --git a/scenarios/retail/README2.md b/scenarios/retail/README2.md index 7a98805b94..90c441c50d 100644 --- a/scenarios/retail/README2.md +++ b/scenarios/retail/README2.md @@ -10,117 +10,56 @@ Next we will list the most common scenarios retailers use. This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. -The kind of data these kind of scenarios need is: +The kind of data these kind of scenarios need is [implicit interaction data](../GLOSSARY.md), [user information](../GLOSSARY.md) and [item information](../GLOSSARY.md). -* Interactions: - -* User information: - -* Item information: - -To measure the performance of the personalized recommendation machine learning algorithm, it is common to use [ranking metrics](../GLOSSARY.md). In production, the metrics used are CTR, +To measure the performance of the personalized recommendation machine learning algorithm, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). For being able to measure the business metrics in production, it is recommended to implement [A/B testing](../GLOSSARY.md). ## You might also like This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. -The kind of data these kind of scenarios need is: +The kind of data these kind of scenarios need is... -* Interactions: - -* User information: - -* Item information: +To measure the performance ... ## Frequently bought together This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. -## Similar alternatives - -This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. - -## Recommendations of product subset - -In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. - - - - - -## Data in Recommendation Systems - -### Data types - -In RS for retail there are typically the following types of data - -* Explicit interactions: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. In retail, this kind of data is not very common. - -* Implicit interactions: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. - -* User features: These include all information that define the user, some examples can be name, address, email, demographics, etc. +The kind of data these kind of scenarios need is... -* Item features: These include information about the item, some examples can be SKU, description, brand, price, etc. +To measure the performance ... -* Knowledge graph data: ... - -### Considerations about data size - -The size of the data is important when designing the system... - -### Cold start scenarios - -Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. - -### Long tail products - -Typically, the shape of items interacted in retail follow a long tail distribution [1,2]. - -## Measuring Recommendation performance - -### Machine learning metrics (offline metrics) - -Offline metrics in RS are based on rating, ranking, classification or diversity. For learning more about offline metrics, see the [definitions available in Recommenders repository](../../examples/03_evaluate) - -### Business success metrics (online metrics) +## Similar alternatives -Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: +This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. +The kind of data these kind of scenarios need is... +To measure the performance ... -### Relationship between online and offline metrics in retail +## Recommendations of product subset -There is some literature about the relationship between offline and online metrics... +In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. +The kind of data these kind of scenarios need is... -### A/B testing +To measure the performance ... -### Advanced A/B testing: online learning with VW +This scenario is tightly related to the [long tail product](../GLOSSARY.md) concept... -... ## Examples of end 2 end recommendation scenarios with Microsoft Recommenders -From a technical perspective, RS can be grouped in these categories [1]: - - - In the repository we have the following examples that can be used in retail | Scenario | Description | Algorithm | Implementation | |----------|-------------|-----------|----------------| -| Collaborative Filtering with explicit interactions in Spark environment | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | -| Content-Based Filtering for content recommendation in Spark environment | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | - - - -## References and resources - -[1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. -[2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). -[3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). +| Personalized recommendation | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | +| Personalized recommendation | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | +| You might also like | Similarity-based algorithm for implicit feedback dataset | Simple Algorithm for Recommendation (SAR)* | [python notebook using Movielens dataset](notebooks/00_quick_start/sar_movielens.ipynb) | +**NOTE**: * indicates algorithms invented/contributed by Microsoft. -sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), [2](https://cloud.google.com/recommendations-ai/docs/placements), [3](https://www.researchgate.net/post/Can_anyone_explain_what_is_cold_start_problem_in_recommender_system) \ No newline at end of file From 78986b7b792dd25560785f6aa8f3ab9b24970285 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 12 Jun 2020 10:23:49 +0100 Subject: [PATCH 19/61] readme --- scenarios/retail/README.md | 92 +++++++++++-------------------------- scenarios/retail/README2.md | 65 -------------------------- 2 files changed, 26 insertions(+), 131 deletions(-) delete mode 100644 scenarios/retail/README2.md diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 3f94095bca..90c441c50d 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -4,102 +4,62 @@ Retail is one of the areas where recommendation systems have been more successfu An increasing number of online retailers are utilizing recommendation systems to increase revenue, improve customer engagement and satisfaction, increase time on the page, enhance customer’s purchasing experience, gain understanding about customers, expand the shopping cart, etc. -## Typical Business Scenarios in Recommendation Systems for Retail +Next we will list the most common scenarios retailers use. -The most common scenarios retailers use are: +## Personalized recommendation -* Personalized recommendation: This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. +This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. -* You might also like: This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. +The kind of data these kind of scenarios need is [implicit interaction data](../GLOSSARY.md), [user information](../GLOSSARY.md) and [item information](../GLOSSARY.md). -* Frequently bought together: This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. +To measure the performance of the personalized recommendation machine learning algorithm, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). For being able to measure the business metrics in production, it is recommended to implement [A/B testing](../GLOSSARY.md). -* Similar alternatives: This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. -* Recommendations of product subset. In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. +## You might also like +This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. -## Data in Recommendation Systems +The kind of data these kind of scenarios need is... -### Data types +To measure the performance ... -In RS for retail there are typically the following types of data -* Explicit interactions: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. In retail, this kind of data is not very common. +## Frequently bought together -* Implicit interactions: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. +This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. -* User features: These include all information that define the user, some examples can be name, address, email, demographics, etc. +The kind of data these kind of scenarios need is... -* Item features: These include information about the item, some examples can be SKU, description, brand, price, etc. +To measure the performance ... -* Knowledge graph data: ... +## Similar alternatives -### Considerations about data size +This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. -The size of the data is important when designing the system... +The kind of data these kind of scenarios need is... -### Cold start scenarios +To measure the performance ... -Personalized recommender systems take advantage of users past history to make predictions. The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. +## Recommendations of product subset -### Long tail products +In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. -Typically, the shape of items interacted in retail follow a long tail distribution [1,2]. +The kind of data these kind of scenarios need is... -## Measuring Recommendation performance +To measure the performance ... -### Machine learning metrics (offline metrics) +This scenario is tightly related to the [long tail product](../GLOSSARY.md) concept... -Offline metrics in RS are based on rating, ranking, classification or diversity. For learning more about offline metrics, see the [definitions available in Recommenders repository](../../examples/03_evaluate) - -### Business success metrics (online metrics) - -Below are some of the various potential benefits of recommendation systems in business, and the metrics that tipically are used: - -* Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. - -* Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. - -* Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. - -### Relationship between online and offline metrics in retail - -There is some literature about the relationship between offline and online metrics... - - -### A/B testing - -### Advanced A/B testing: online learning with VW - -... ## Examples of end 2 end recommendation scenarios with Microsoft Recommenders -From a technical perspective, RS can be grouped in these categories [1]: - -* Collaborative filtering: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). - -* Content-based filtering: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). - -* Hybrid filtering: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). - -* Knowledge-base: ... - In the repository we have the following examples that can be used in retail | Scenario | Description | Algorithm | Implementation | |----------|-------------|-----------|----------------| -| Collaborative Filtering with explicit interactions in Spark environment | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | -| Content-Based Filtering for content recommendation in Spark environment | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | - - - -## References and resources - -[1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. -[2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). -[3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). +| Personalized recommendation | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | +| Personalized recommendation | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | +| You might also like | Similarity-based algorithm for implicit feedback dataset | Simple Algorithm for Recommendation (SAR)* | [python notebook using Movielens dataset](notebooks/00_quick_start/sar_movielens.ipynb) | +**NOTE**: * indicates algorithms invented/contributed by Microsoft. -sources: [1](https://emerj.com/ai-sector-overviews/use-cases-recommendation-systems/), [2](https://cloud.google.com/recommendations-ai/docs/placements), [3](https://www.researchgate.net/post/Can_anyone_explain_what_is_cold_start_problem_in_recommender_system) \ No newline at end of file diff --git a/scenarios/retail/README2.md b/scenarios/retail/README2.md deleted file mode 100644 index 90c441c50d..0000000000 --- a/scenarios/retail/README2.md +++ /dev/null @@ -1,65 +0,0 @@ -# Recommendation systems for Retail - -Retail is one of the areas where recommendation systems have been more successful. According to [McKinsey](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#), 35% of what consumers purchase on Amazon and 75% of what they watch on Netflix come from product recommendations. - -An increasing number of online retailers are utilizing recommendation systems to increase revenue, improve customer engagement and satisfaction, increase time on the page, enhance customer’s purchasing experience, gain understanding about customers, expand the shopping cart, etc. - -Next we will list the most common scenarios retailers use. - -## Personalized recommendation - -This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. - -The kind of data these kind of scenarios need is [implicit interaction data](../GLOSSARY.md), [user information](../GLOSSARY.md) and [item information](../GLOSSARY.md). - -To measure the performance of the personalized recommendation machine learning algorithm, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). For being able to measure the business metrics in production, it is recommended to implement [A/B testing](../GLOSSARY.md). - - -## You might also like - -This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. - -The kind of data these kind of scenarios need is... - -To measure the performance ... - - -## Frequently bought together - -This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. - -The kind of data these kind of scenarios need is... - -To measure the performance ... - -## Similar alternatives - -This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. - -The kind of data these kind of scenarios need is... - -To measure the performance ... - -## Recommendations of product subset - -In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. - -The kind of data these kind of scenarios need is... - -To measure the performance ... - -This scenario is tightly related to the [long tail product](../GLOSSARY.md) concept... - - -## Examples of end 2 end recommendation scenarios with Microsoft Recommenders - -In the repository we have the following examples that can be used in retail - -| Scenario | Description | Algorithm | Implementation | -|----------|-------------|-----------|----------------| -| Personalized recommendation | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | -| Personalized recommendation | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | -| You might also like | Similarity-based algorithm for implicit feedback dataset | Simple Algorithm for Recommendation (SAR)* | [python notebook using Movielens dataset](notebooks/00_quick_start/sar_movielens.ipynb) | - -**NOTE**: * indicates algorithms invented/contributed by Microsoft. - From 36ed9e618139fe0c0da980ba461d0966d6978337 Mon Sep 17 00:00:00 2001 From: Tao Wu Date: Sun, 14 Jun 2020 16:24:54 -0400 Subject: [PATCH 20/61] rewrite of retail readme for readability. --- scenarios/retail/README.md | 53 ++++++++++---------------------------- 1 file changed, 13 insertions(+), 40 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 90c441c50d..e56a5b5d16 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -1,65 +1,38 @@ -# Recommendation systems for Retail +# Recommender Systems for Retail -Retail is one of the areas where recommendation systems have been more successful. According to [McKinsey](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#), 35% of what consumers purchase on Amazon and 75% of what they watch on Netflix come from product recommendations. +Recommender systems have become a key growth and revenue driver for modern retail. For example, recommendation was estimated to [account for 35% of customer purchases on Amazon](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#). In addition, recommenders have been applied by retailers to delight and retain customers and improve staff productivity. -An increasing number of online retailers are utilizing recommendation systems to increase revenue, improve customer engagement and satisfaction, increase time on the page, enhance customer’s purchasing experience, gain understanding about customers, expand the shopping cart, etc. +Next we will describe several most common retail scenarios and main considerations when applying recommendations in retail. -Next we will list the most common scenarios retailers use. +## Personalized Recommendation -## Personalized recommendation - -This scenario predicts which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the home page or in a personalized newsletter. - -The kind of data these kind of scenarios need is [implicit interaction data](../GLOSSARY.md), [user information](../GLOSSARY.md) and [item information](../GLOSSARY.md). - -To measure the performance of the personalized recommendation machine learning algorithm, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). For being able to measure the business metrics in production, it is recommended to implement [A/B testing](../GLOSSARY.md). +A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb), [BPR] (https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/cornac_bpr_deep_dive.ipynb), [LightGBM] (https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF] (https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/vowpal_wabbit_deep_dive.ipynb). ## You might also like -This recommendation scenario is similar to the personalized recommendation as it predicts the next product a user is likely to engage with or purchase. However, the starting point is typically a product page, so in addition to considering the entire shopping or viewing history of the user, the relevance of the specified product in relation to other items is used to recommend additional products. - -The kind of data these kind of scenarios need is... - -To measure the performance ... +In this scenario, the user is already viewing a product page, and the task is to make recommendations that are relevant to it. Personalized recommendaton techniques are still applicable here, but relevance to the product being viewed is of special importance. As such, item similarity can be useful here, especially for cold items and cold users that do not have much interaction data. ## Frequently bought together -This scenario is the machine learning solution for up-selling and cross-selling. Frequently bought together predicts which product or set of products are complementary or usually bought together with a specified product, as opposed as substituting it. Normally, this scenario is displayed in the shopping cart page, just before buying. - -The kind of data these kind of scenarios need is... - -To measure the performance ... +In this task, the retailer tries to predict product(s) complementary to or bought together with a product that a user already put in to shopping cart. This feature is great for cross-selling and is normally displayed just before checkout. In many cases, a machine learning solution is not required for this task. ## Similar alternatives -This scenario covers down-selling or out of stock alternatives and its objective is to avoid loosing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. - -The kind of data these kind of scenarios need is... - -To measure the performance ... - -## Recommendations of product subset +This scenario covers down-selling or out of stock alternatives to avoid losing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. -In certain situations, the retailer would like to recommend products from a subset, they could be products for sale, products with a high margin or products that have a low number of units left. This scenario can be used to delimit the outputs of all previous recommendation scenarios. +## Data and evaluation -The kind of data these kind of scenarios need is... +Datasets used in retail reommendations usually include [user information](../GLOSSARY.md), [item information](../GLOSSARY.md) and [interaction data](../GLOSSARY.md), among others. -To measure the performance ... +To measure the performance of the recommender, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../GLOSSARY.md) is often applied. -This scenario is tightly related to the [long tail product](../GLOSSARY.md) concept... +## Other considerations -## Examples of end 2 end recommendation scenarios with Microsoft Recommenders +Retailers use recommendation to achieve a broad range of business objectives, such as attracting new customers through promotions, or clearing products that are at the end of their season. These objectives are often achieved by re-ranking the outputs from recommenders in scenarios above. -In the repository we have the following examples that can be used in retail -| Scenario | Description | Algorithm | Implementation | -|----------|-------------|-----------|----------------| -| Personalized recommendation | Matrix factorization algorithm for explicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | Alternating Least Squares (ALS) | [pyspark notebook using Movielens dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb) | -| Personalized recommendation | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | LightGBM/MMLSpark | [spark notebook using Criteo dataset](https://github.com/microsoft/recommenders/blob/staging/notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | -| You might also like | Similarity-based algorithm for implicit feedback dataset | Simple Algorithm for Recommendation (SAR)* | [python notebook using Movielens dataset](notebooks/00_quick_start/sar_movielens.ipynb) | -**NOTE**: * indicates algorithms invented/contributed by Microsoft. From 5b007f75a082d6a89d40117ab0990566b95ef6a4 Mon Sep 17 00:00:00 2001 From: Tao Wu Date: Sun, 14 Jun 2020 16:28:12 -0400 Subject: [PATCH 21/61] format --- scenarios/retail/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index e56a5b5d16..fa7ecbcf9a 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -6,7 +6,7 @@ Next we will describe several most common retail scenarios and main consideratio ## Personalized Recommendation -A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb), [BPR] (https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/cornac_bpr_deep_dive.ipynb), [LightGBM] (https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF] (https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/vowpal_wabbit_deep_dive.ipynb). +A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb), [BPR](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/cornac_bpr_deep_dive.ipynb), [LightGBM](https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF](https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/vowpal_wabbit_deep_dive.ipynb). ## You might also like From e1a5f515737f2e764f8362d4105be53c0070e856 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 11:39:15 +0100 Subject: [PATCH 22/61] glossary --- scenarios/GLOSSARY.md => GLOSSARY.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename scenarios/GLOSSARY.md => GLOSSARY.md (100%) diff --git a/scenarios/GLOSSARY.md b/GLOSSARY.md similarity index 100% rename from scenarios/GLOSSARY.md rename to GLOSSARY.md From 33c6e5e956ae47685512d8eae59f52c69822f55d Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 12:15:34 +0100 Subject: [PATCH 23/61] :doc: --- GLOSSARY.md | 2 +- scenarios/retail/README.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index f27c1f0162..bf2eb26975 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -1,6 +1,6 @@ # Glossary -* A/B testing: +* A/B testing: Methodology to evaluate the performance of a system in production. In the context of Recommendation Systems it is used to measure a machine learning model performance in real-time. It works by randomizing an environment response into two groups A and B. The first * Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index fa7ecbcf9a..48a93eb09d 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -11,7 +11,7 @@ A major task in applying recommenations in retail is to predict which products o ## You might also like -In this scenario, the user is already viewing a product page, and the task is to make recommendations that are relevant to it. Personalized recommendaton techniques are still applicable here, but relevance to the product being viewed is of special importance. As such, item similarity can be useful here, especially for cold items and cold users that do not have much interaction data. +In this scenario, the user is already viewing a product page, and the task is to make recommendations that are relevant to it. Personalized recommendation techniques are still applicable here, but relevance to the product being viewed is of special importance. As such, item similarity can be useful here, especially for cold items and cold users that do not have much interaction data. ## Frequently bought together @@ -24,7 +24,7 @@ This scenario covers down-selling or out of stock alternatives to avoid losing a ## Data and evaluation -Datasets used in retail reommendations usually include [user information](../GLOSSARY.md), [item information](../GLOSSARY.md) and [interaction data](../GLOSSARY.md), among others. +Datasets used in retail recommendations usually include [user information](../GLOSSARY.md), [item information](../GLOSSARY.md) and [interaction data](../GLOSSARY.md), among others. To measure the performance of the recommender, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../GLOSSARY.md) is often applied. From 40560c335303e621e48d3062c1dbbb5473d34a27 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 12:27:38 +0100 Subject: [PATCH 24/61] :doc: --- GLOSSARY.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index bf2eb26975..696468008d 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -1,8 +1,8 @@ # Glossary -* A/B testing: Methodology to evaluate the performance of a system in production. In the context of Recommendation Systems it is used to measure a machine learning model performance in real-time. It works by randomizing an environment response into two groups A and B. The first +* A/B testing: Methodology to evaluate the performance of a system in production. In the context of Recommendation Systems it is used to measure a machine learning model performance in real-time. It works by randomizing an environment response into two groups A and B, typically half of the traffic goes to the machine learning model output and the other half is left without model. By comparing the metrics from A and B branches, it is possible to evaluate whether it is beneficial the use of the model or not. A test with more than two groups it is named Multi-Variate Test. -* Click-through rate (CTR): Optimizing for CTR emphasizes engagement; you should optimize for CTR when you want to maximize the likelihood that the user interacts with the recommendation. +* Click-through rate (CTR): Ratio of the number of users who click on a link over the total number of users that visited the page. CTR is a measure of the user engagement. * Cold-start problem: The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. @@ -28,6 +28,12 @@ * Long tail products: Typically, the shape of items interacted in retail follow a long tail distribution [1,2].... +* Multi-Variate Test (MVT): Methodology to evaluate the performance of a system in production. It is similar to A/B testing, with the difference that instead of having two test groups, MVT has multiples groups. + +* Online metrics: + +* Offline metrics: + * Ranking metrics: * Rating metrics: From 65dd13c05e4d5b9d719ba16975753b3508ac1e09 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 12:59:50 +0100 Subject: [PATCH 25/61] :doc: --- GLOSSARY.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index 696468008d..ed3491b4f8 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -4,23 +4,23 @@ * Click-through rate (CTR): Ratio of the number of users who click on a link over the total number of users that visited the page. CTR is a measure of the user engagement. -* Cold-start problem: The cold start problem concerns the personalized recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for CF models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using hybrid models. These models use auxiliary information (multimodal information, side information, etc.) to overcome the cold start problem. +* Cold-start problem: The cold start problem concerns the recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for collaborative filtering models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using content-based filtering models or hybrid models. These models use auxiliary information like user or item metadata to overcome the cold start problem. -* Collaborative filtering algorithms: This type of recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky Way as well. See the [list of examples in Recommenders repository](../../examples/02_model_collaborative_filtering). +* Collaborative filtering algorithms (CF): CF algorithms make prediction of what is the likelihood of a user selecting an item based on the behavior of other users [1]. It assumes that if user A likes item X and Y, and user B likes item X, user B would probably like item Y. See the [list of CF examples in Recommenders repository](../../examples/02_model_collaborative_filtering). -* Content-based filtering algorithms: This type of recommendation system focuses on the products themselves and recommends other products that have similar attributes. Content-based filtering relies on the characteristics of the products themselves, so it doesn’t rely on other users to interact with the products before making a recommendation. See the [list of examples in Recommenders repository](../../examples/02_model_content_based_filtering). +* Content-based filtering algorithms (CB): CB algorithms make prediction of what is the likelihood of a user selecting an item based on the similarity of users and items among themselves [1]. It assumes that if user A lives in country X, has age Y and likes item Z, and user B lives in country X and has age Y, user B would probably like item Z. See the [list of CB examples in Recommenders repository](../../examples/02_model_content_based_filtering). -* Conversion rate: Optimizing for conversion rate maximizes the likelihood that the user purchases the recommended item; if you want to increase the number of purchases per session, optimize for conversion rate. +* Conversion rate: In the context of e-commerce, the conversion rate is the ratio between the number of conversions (e.g. number of bought items) over the total number of visits. In the context of recommendation systems, conversion rate measures how efficient is an algorithm to provide recommendations that the user buys. -* Diversity metrics: +* Diversity metrics: In the context of Recommendation Systems, diversity applies to a set of items, and is related to how different the items are with respect to each other [4]. -* Explicit interaction data: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. In retail, this kind of data is not very common. +* Explicit interaction data: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. -* Hybrid filtering algorithms: This type of recommendation system can implement a combination fo any two of the above systems. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). +* Hybrid filtering algorithms: This type of recommendation system can implement a combination of collaborative and content-based filtering models. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid). * Implicit interaction data: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. -* Item information: These include information about the item, some examples can be SKU, description, brand, price, etc. +* Item information: These include information about the item, some examples can be name, description, price, etc. * Knowledge-base algorithms: ... @@ -30,9 +30,11 @@ * Multi-Variate Test (MVT): Methodology to evaluate the performance of a system in production. It is similar to A/B testing, with the difference that instead of having two test groups, MVT has multiples groups. +* Novelty metrics: In Recommendation Systems, the novelty of a piece of information generally refers to how different it is with respect to "what has been previously seen" [4]. + * Online metrics: -* Offline metrics: +* Offline metrics: Metrics computed offline for measuring the performance of the machine learning model. These metrics include ranking, rating, diversity and novelty metrics. * Ranking metrics: @@ -45,8 +47,9 @@ ## References and resources [1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. -[2]. Park, Yoon-Joo, and Alexander Tuzhilin. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). +[2]. Park, Yoon-Joo, and Tuzhilin, Alexander. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). [3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). +[4] Castells, P., Vargas, S., and Wang, Jun. "Novelty and diversity metrics for recommender systems: choice, discovery and relevance." (2011). [Link to paper](https://repositorio.uam.es/bitstream/handle/10486/666094/novelty_castells_DDR_2011.pdf?sequence=1). From 573e004a2731084f7fdb2e810bdbf713c8c72af9 Mon Sep 17 00:00:00 2001 From: wutaomsft <21267949+wutaomsft@users.noreply.github.com> Date: Mon, 15 Jun 2020 08:43:57 -0400 Subject: [PATCH 26/61] Update README.md --- scenarios/retail/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 48a93eb09d..139f557d41 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -4,7 +4,7 @@ Recommender systems have become a key growth and revenue driver for modern retai Next we will describe several most common retail scenarios and main considerations when applying recommendations in retail. -## Personalized Recommendation +## Personalized recommendation A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb), [BPR](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/cornac_bpr_deep_dive.ipynb), [LightGBM](https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF](https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/vowpal_wabbit_deep_dive.ipynb). From f42e8f5c1ed47484945660193a479289c6cefa74 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 15:59:29 +0100 Subject: [PATCH 27/61] wip --- GLOSSARY.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index ed3491b4f8..18a7afdf14 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -32,13 +32,13 @@ * Novelty metrics: In Recommendation Systems, the novelty of a piece of information generally refers to how different it is with respect to "what has been previously seen" [4]. -* Online metrics: +* Online metrics: Also named business metrics. They are the metrics computed online that reflect how the Recommendation System is helping the business to improve user engagement or revenue. These metrics include CTR, conversion rate, etc. * Offline metrics: Metrics computed offline for measuring the performance of the machine learning model. These metrics include ranking, rating, diversity and novelty metrics. * Ranking metrics: -* Rating metrics: +* Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users gave to items. They include RMSE, MAE, R squared or explained variance. * Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. From a4420966911046249ba847910da60ce34388d0ba Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 19:10:47 +0100 Subject: [PATCH 28/61] glossary --- GLOSSARY.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index 18a7afdf14..967aea9ba1 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -36,7 +36,7 @@ * Offline metrics: Metrics computed offline for measuring the performance of the machine learning model. These metrics include ranking, rating, diversity and novelty metrics. -* Ranking metrics: +* Ranking metrics: These are used to evaluate how relevant recommendations are for users. They include precision at k, recall at k, nDCG and MAP. * Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users gave to items. They include RMSE, MAE, R squared or explained variance. From 47f9d25b679355f3a6da7b5b1c45f75dd09fc342 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 15 Jun 2020 19:13:53 +0100 Subject: [PATCH 29/61] glossary --- GLOSSARY.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index 967aea9ba1..1928fc3367 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -36,9 +36,9 @@ * Offline metrics: Metrics computed offline for measuring the performance of the machine learning model. These metrics include ranking, rating, diversity and novelty metrics. -* Ranking metrics: These are used to evaluate how relevant recommendations are for users. They include precision at k, recall at k, nDCG and MAP. +* Ranking metrics: These are used to evaluate how relevant recommendations are for users. They include precision at k, recall at k, nDCG and MAP. See the [list of metrics in Recommenders repository](../../examples/03_evaluate). -* Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users gave to items. They include RMSE, MAE, R squared or explained variance. +* Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users gave to items. They include RMSE, MAE, R squared or explained variance. See the [list of metrics in Recommenders repository](../../examples/03_evaluate). * Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. From f572dcf8dc8b4e3b5d8341a3f9ba6921d0ca0a27 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 15:26:32 +0100 Subject: [PATCH 30/61] kg --- GLOSSARY.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index 1928fc3367..1a3da65073 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -22,9 +22,9 @@ * Item information: These include information about the item, some examples can be name, description, price, etc. -* Knowledge-base algorithms: ... +* Knowledge graph algorithms: A knowledge graph algorithm is the one that uses knowledge graph data. In comparison with standard algorithms, it allows to explore graph's latent connections and improve the precision of results; the various relations in the graph can extend users' interest and increase the diversity of recommended items; also, these algorithms bring explainability to recommendation systems [5]. -* Knowledge graph data: ... +* Knowledge graph data: A knowledge graph is a directed heterogeneous graph in which nodes correspond to entities (items or item attributes) and edges correspond to relations [5]. * Long tail products: Typically, the shape of items interacted in retail follow a long tail distribution [1,2].... @@ -46,11 +46,11 @@ ## References and resources -[1] Aggarwal, Charu C. Recommender systems. Vol. 1. Cham: Springer International Publishing, 2016. +[1] Aggarwal, Charu C. "Recommender systems". Vol. 1. Cham: Springer International Publishing, 2016. [2]. Park, Yoon-Joo, and Tuzhilin, Alexander. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf). [3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141). [4] Castells, P., Vargas, S., and Wang, Jun. "Novelty and diversity metrics for recommender systems: choice, discovery and relevance." (2011). [Link to paper](https://repositorio.uam.es/bitstream/handle/10486/666094/novelty_castells_DDR_2011.pdf?sequence=1). - +[5] Wang, Hongwei; Zhao, Miao; Xie, Xing; Li, Wenjie and Guo, Minyi. "Knowledge Graph Convolutional Networks for Recommender Systems". The World Wide Web Conference WWW'19. 2019. [Link to paper](https://arxiv.org/abs/1904.12575). From 4930065c74dfa3b59d99a307d44036c1de64bd2e Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 15:28:36 +0100 Subject: [PATCH 31/61] fix links --- scenarios/retail/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 139f557d41..677d79a3fd 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -1,6 +1,6 @@ # Recommender Systems for Retail -Recommender systems have become a key growth and revenue driver for modern retail. For example, recommendation was estimated to [account for 35% of customer purchases on Amazon](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#). In addition, recommenders have been applied by retailers to delight and retain customers and improve staff productivity. +Recommender systems have become a key growth and revenue driver for modern retail. For example, recommendation was estimated to [account for 35% of customer purchases on Amazon](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#). In addition, recommenders have been applied by retailers to delight and retain customers and improve staff productivity. Next we will describe several most common retail scenarios and main considerations when applying recommendations in retail. @@ -24,9 +24,9 @@ This scenario covers down-selling or out of stock alternatives to avoid losing a ## Data and evaluation -Datasets used in retail recommendations usually include [user information](../GLOSSARY.md), [item information](../GLOSSARY.md) and [interaction data](../GLOSSARY.md), among others. +Datasets used in retail recommendations usually include [user information](../../GLOSSARY.md), [item information](../../GLOSSARY.md) and [interaction data](../../GLOSSARY.md), among others. -To measure the performance of the recommender, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../GLOSSARY.md) and [revenue per order](../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../GLOSSARY.md) is often applied. +To measure the performance of the recommender, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../../GLOSSARY.md) and [revenue per order](../../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../../GLOSSARY.md) is often applied. ## Other considerations From 97672a9880101fcecf8e9b509713fcc566baae23 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 15:32:39 +0100 Subject: [PATCH 32/61] readme --- scenarios/retail/README.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 677d79a3fd..ffce41d7f6 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -6,14 +6,12 @@ Next we will describe several most common retail scenarios and main consideratio ## Personalized recommendation -A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](https://github.com/microsoft/recommenders/blob/staging/notebooks/00_quick_start/als_movielens.ipynb), [BPR](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/cornac_bpr_deep_dive.ipynb), [LightGBM](https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF](https://github.com/microsoft/recommenders/blob/master/notebooks/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/vowpal_wabbit_deep_dive.ipynb). - +A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](../../examples/00_quick_start/als_movielens.ipynb), [BPR](../../examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb), [LightGBM](../../examples/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF](../../examples/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](../../examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb). ## You might also like In this scenario, the user is already viewing a product page, and the task is to make recommendations that are relevant to it. Personalized recommendation techniques are still applicable here, but relevance to the product being viewed is of special importance. As such, item similarity can be useful here, especially for cold items and cold users that do not have much interaction data. - ## Frequently bought together In this task, the retailer tries to predict product(s) complementary to or bought together with a product that a user already put in to shopping cart. This feature is great for cross-selling and is normally displayed just before checkout. In many cases, a machine learning solution is not required for this task. @@ -28,7 +26,6 @@ Datasets used in retail recommendations usually include [user information](../. To measure the performance of the recommender, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../../GLOSSARY.md) and [revenue per order](../../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../../GLOSSARY.md) is often applied. - ## Other considerations Retailers use recommendation to achieve a broad range of business objectives, such as attracting new customers through promotions, or clearing products that are at the end of their season. These objectives are often achieved by re-ranking the outputs from recommenders in scenarios above. From f022427b8b0b864dbaf52fd3d8699c464616a27a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 15:41:58 +0100 Subject: [PATCH 33/61] fix paths --- SETUP.md | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/SETUP.md b/SETUP.md index c2c0f22447..0b059ffb1e 100644 --- a/SETUP.md +++ b/SETUP.md @@ -35,8 +35,7 @@ Currently, this repository supports **Python CPU**, **Python GPU** and **PySpark * A machine running Linux, MacOS or Windows * Anaconda with Python version >= 3.6 - * This is pre-installed on Azure DSVM such that one can run the following steps directly. To setup on your local machine, - [Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a quick way to get started. + * This is pre-installed on Azure DSVM such that one can run the following steps directly. To setup on your local machine, [Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a quick way to get started. * [Apache Spark](https://spark.apache.org/downloads.html) (this is only needed for the PySpark environment). ### Dependencies setup @@ -48,7 +47,7 @@ conda update conda -n root conda update anaconda # use 'conda install anaconda' if the package is not installed ``` -We provide a script, [generate_conda_file.py](scripts/generate_conda_file.py), to generate a conda-environment yaml file +We provide a script, [generate_conda_file.py](tools/generate_conda_file.py), to generate a conda-environment yaml file which you can use to create the target environment using the Python version 3.6 with all the correct dependencies. **NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. Detailed instructions for installing `cmake` can be found [here](https://vitux.com/how-to-install-cmake-on-ubuntu-18-04/). The default version of `cmake` is 3.15.2. One can specify a different version by configuring the argument of `CMAKE` in building the Docker image. @@ -56,7 +55,7 @@ which you can use to create the target environment using the Python version 3.6 Assuming the repo is cloned as `Recommenders` in the local system, to install **a default (Python CPU) environment**: cd Recommenders - python scripts/generate_conda_file.py + python tools/generate_conda_file.py conda env create -f reco_base.yaml You can specify the environment name as well with the flag `-n`. @@ -69,7 +68,7 @@ Click on the following menus to see how to install Python GPU and PySpark enviro Assuming that you have a GPU machine, to install the Python GPU environment: cd Recommenders - python scripts/generate_conda_file.py --gpu + python tools/generate_conda_file.py --gpu conda env create -f reco_gpu.yaml @@ -80,12 +79,12 @@ Assuming that you have a GPU machine, to install the Python GPU environment: To install the PySpark environment: cd Recommenders - python scripts/generate_conda_file.py --pyspark + python tools/generate_conda_file.py --pyspark conda env create -f reco_pyspark.yaml > Additionally, if you want to test a particular version of spark, you may pass the --pyspark-version argument: > -> python scripts/generate_conda_file.py --pyspark-version 2.4.0 +> python tools/generate_conda_file.py --pyspark-version 2.4.0 Then, we need to set the environment variables `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to the conda python executable. @@ -160,7 +159,7 @@ With this environment, you can run both PySpark and Python GPU notebooks in this To install the environment: cd Recommenders - python scripts/generate_conda_file.py --gpu --pyspark + python tools/generate_conda_file.py --gpu --pyspark conda env create -f reco_full.yaml Then, we need to set the environment variables `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to the conda python executable. @@ -200,7 +199,7 @@ An example of how to create an Azure Databricks workspace and an Apache Spark cl ### Repository installation -You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries. +You can setup the repository as a library on Databricks either manually or by running an [installation script](tools/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
Quick install @@ -228,20 +227,20 @@ This option utilizes an installation script to do the setup, and it requires add The installation script has a number of options that can also deal with different databricks-cli profiles, install a version of the mmlspark library, overwrite the libraries, or prepare the cluster for operationalization. For all options, please see: ```{shell} -python scripts/databricks_install.py -h +python tools/databricks_install.py -h ``` Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands. ```{shell} cd Recommenders -python scripts/databricks_install.py +python tools/databricks_install.py ``` -**Note** If you are planning on running through the sample code for operationalization [here](notebooks/05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run. is the same as that mentioned above, and can be identified by running `databricks clusters list` and selecting the appropriate cluster. +**Note** If you are planning on running through the sample code for operationalization [here](examples/05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run. is the same as that mentioned above, and can be identified by running `databricks clusters list` and selecting the appropriate cluster. ```{shell} -python ./scripts/databricks_install.py --prepare-o16n +python tools/databricks_install.py --prepare-o16n ``` See below for details. @@ -285,7 +284,7 @@ import reco_utils ### Prepare Azure Databricks for Operationalization -This repository includes an end-to-end example notebook that uses Azure Databricks to estimate a recommendation model using matrix factorization with Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks/05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the [installation script](scripts/databricks_install.py). +This repository includes an end-to-end example notebook that uses Azure Databricks to estimate a recommendation model using matrix factorization with Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](examples/05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the [installation script](tools/databricks_install.py).
Quick install @@ -294,7 +293,7 @@ This option utilizes the installation script to do the setup. Just run the insta with an additional option. If you have already run the script once to upload and install the `Recommenders.egg` library, you can also add an `--overwrite` option: ```{shell} -python scripts/databricks_install.py --overwrite --prepare-o16n +python tools/databricks_install.py --overwrite --prepare-o16n ``` This script does all of the steps described in the *Manual setup* section below. @@ -328,7 +327,7 @@ Additionally, you must install the [spark-cosmosdb connector](https://docs.datab ## Install the utilities via PIP -A [setup.py](reco_utils/setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory. +A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory. This still requires the conda environment to be installed as described above. Once the necessary dependencies are installed, you can use the following command to install `reco_utils` as a python package. @@ -343,11 +342,11 @@ It is also possible to install directly from GitHub. Or from a specific branch a ## Setup guide for Docker -A [Dockerfile](docker/Dockerfile) is provided to build images of the repository to simplify setup for different environments. You will need [Docker Engine](https://docs.docker.com/install/) installed on your system. +A [Dockerfile](tools/docker/Dockerfile) is provided to build images of the repository to simplify setup for different environments. You will need [Docker Engine](https://docs.docker.com/install/) installed on your system. *Note: `docker` is already available on Azure Data Science Virtual Machine* -See guidelines in the Docker [README](docker/README.md) for detailed instructions of how to build and run images for different environments. +See guidelines in the Docker [README](tools/docker/README.md) for detailed instructions of how to build and run images for different environments. Example command to build and run Docker image with base CPU environment. ```{shell} From a46b18f73c99d38b1f1965b99b9641b6fbcabb68 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 15:58:21 +0100 Subject: [PATCH 34/61] fix paths --- README.md | 93 +++++++++++++++++++++++++++---------------------------- setup.py | 2 +- 2 files changed, 47 insertions(+), 48 deletions(-) diff --git a/README.md b/README.md index 516f4d9503..2f80b2538d 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,11 @@ [![Documentation Status](https://readthedocs.org/projects/microsoft-recommenders/badge/?version=latest)](https://microsoft-recommenders.readthedocs.io/en/latest/?badge=latest) This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks: -- [Prepare Data](notebooks/01_prepare_data): Preparing and loading data for each recommender algorithm -- [Model](notebooks/02_model): Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares ([ALS](https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS)) or eXtreme Deep Factorization Machines ([xDeepFM](https://arxiv.org/abs/1803.05170)). -- [Evaluate](notebooks/03_evaluate): Evaluating algorithms with offline metrics -- [Model Select and Optimize](notebooks/04_model_select_and_optimize): Tuning and optimizing hyperparameters for recommender models -- [Operationalize](notebooks/05_operationalize): Operationalizing models in a production environment on Azure +- [Prepare Data](examples/01_prepare_data): Preparing and loading data for each recommender algorithm +- [Model](examples/00_quick_start): Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares ([ALS](https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS)) or eXtreme Deep Factorization Machines ([xDeepFM](https://arxiv.org/abs/1803.05170)). +- [Evaluate](examples/03_evaluate): Evaluating algorithms with offline metrics +- [Model Select and Optimize](examples/04_model_select_and_optimize): Tuning and optimizing hyperparameters for recommender models +- [Operationalize](examples/05_operationalize): Operationalizing models in a production environment on Azure Several utilities are provided in [reco_utils](reco_utils) to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the [reco_utils documentation](https://readthedocs.org/projects/microsoft-recommenders/). @@ -48,9 +48,9 @@ python -m ipykernel install --user --name reco_base --display-name "Python (reco jupyter notebook ``` -6. Run the [SAR Python CPU MovieLens](notebooks/00_quick_start/sar_movielens.ipynb) notebook under the `00_quick_start` folder. Make sure to change the kernel to "Python (reco)". +6. Run the [SAR Python CPU MovieLens](examples/00_quick_start/sar_movielens.ipynb) notebook under the `00_quick_start` folder. Make sure to change the kernel to "Python (reco)". -**NOTE** - The [Alternating Least Squares (ALS)](notebooks/00_quick_start/als_movielens.ipynb) notebooks require a PySpark environment to run. Please follow the steps in the [setup guide](SETUP.md#dependencies-setup) to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine. +**NOTE** - The [Alternating Least Squares (ALS)](examples/00_quick_start/als_movielens.ipynb) notebooks require a PySpark environment to run. Please follow the steps in the [setup guide](SETUP.md#dependencies-setup) to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine. ## Algorithms @@ -58,30 +58,30 @@ The table below lists the recommender algorithms currently available in the repo | Algorithm | Environment | Type | Description | | --- | --- | --- | --- | -| Alternating Least Squares (ALS) | [PySpark](notebooks/00_quick_start/als_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | -| Attentive Asynchronous Singular Value Decomposition (A2SVD)* | [Python CPU / Python GPU](notebooks/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism | -| Cornac/Bayesian Personalized Ranking (BPR) | [Python CPU](notebooks/02_model/cornac_bpr_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback | -| Convolutional Sequence Embedding Recommendation (Caser) | [Python CPU / Python GPU](notebooks/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Algorithm based on convolutions that aims to capture both user’s general preferences and sequential patterns | -| Deep Knowledge-Aware Network (DKN)* | [Python CPU / Python GPU](notebooks/00_quick_start/dkn_synthetic.ipynb) | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations | -| Extreme Deep Factorization Machine (xDeepFM)* | [Python CPU / Python GPU](notebooks/00_quick_start/xdeepfm_criteo.ipynb) | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features | -| FastAI Embedding Dot Bias (FAST) | [Python CPU / Python GPU](notebooks/00_quick_start/fastai_movielens.ipynb) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items | -| LightFM/Hybrid Matrix Factorization | [Python CPU](notebooks/02_model/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks | -| LightGBM/Gradient Boosting Tree* | [Python CPU](notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | -| GRU4Rec | [Python CPU / Python GPU](notebooks/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks | -| Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | [Python CPU / Python GPU](notebooks/00_quick_start/lstur_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling | -| Neural Recommendation with Attentive Multi-View Learning (NAML)* | [Python CPU / Python GPU](notebooks/00_quick_start/naml_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning | -| Neural Collaborative Filtering (NCF) | [Python CPU / Python GPU](notebooks/00_quick_start/ncf_movielens.ipynb) | Collaborative Filtering | Deep learning algorithm with enhanced performance for implicit feedback | -| Neural Recommendation with Personalized Attention (NPA)* | [Python CPU / Python GPU](notebooks/00_quick_start/npa_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with personalized attention network | -| Neural Recommendation with Multi-Head Self-Attention (NRMS)* | [Python CPU / Python GPU](notebooks/00_quick_start/nrms_synthetic.ipynbb) | Content-Based Filtering | Neural recommendation algorithm with multi-head self-attention | -| Restricted Boltzmann Machines (RBM) | [Python CPU / Python GPU](notebooks/00_quick_start/rbm_movielens.ipynb) | Collaborative Filtering | Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback | -| Riemannian Low-rank Matrix Completion (RLRMC)* | [Python CPU](notebooks/00_quick_start/rlrmc_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption. | -| Simple Algorithm for Recommendation (SAR)* | [Python CPU](notebooks/00_quick_start/sar_movielens.ipynb) | Collaborative Filtering | Similarity-based algorithm for implicit feedback dataset | -| Short-term and Long-term preference Integrated Recommender (SLi-Rec)* | [Python CPU / Python GPU](notebooks/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller | -| Surprise/Singular Value Decomposition (SVD) | [Python CPU](notebooks/02_model/surprise_svd_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large | -| Term Frequency - Inverse Document Frequency (TF-IDF) | [Python CPU](notebooks/00_quick_start/tfidf_covid.ipynb) | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets | -| Vowpal Wabbit Family (VW)* | [Python CPU (online training)](notebooks/02_model/vowpal_wabbit_deep_dive.ipynb) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing | -| Wide and Deep | [Python CPU / Python GPU](notebooks/00_quick_start/wide_deep_movielens.ipynb) | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features | -| xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | [Python CPU](notebooks/02_model/fm_deep_dive.ipynb) | Content-Based Filtering | Quick and memory efficient algorithm to predict labels with user/item features | +| Alternating Least Squares (ALS) | [PySpark](examples/00_quick_start/als_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | +| Attentive Asynchronous Singular Value Decomposition (A2SVD)* | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism | +| Cornac/Bayesian Personalized Ranking (BPR) | [Python CPU](examples/02_model/cornac_bpr_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback | +| Convolutional Sequence Embedding Recommendation (Caser) | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Algorithm based on convolutions that aims to capture both user’s general preferences and sequential patterns | +| Deep Knowledge-Aware Network (DKN)* | [Python CPU / Python GPU](examples/00_quick_start/dkn_synthetic.ipynb) | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations | +| Extreme Deep Factorization Machine (xDeepFM)* | [Python CPU / Python GPU](examples/00_quick_start/xdeepfm_criteo.ipynb) | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features | +| FastAI Embedding Dot Bias (FAST) | [Python CPU / Python GPU](examples/00_quick_start/fastai_movielens.ipynb) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items | +| LightFM/Hybrid Matrix Factorization | [Python CPU](examples/02_model/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks | +| LightGBM/Gradient Boosting Tree* | [Python CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](examples/02_model/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | +| GRU4Rec | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks | +| Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | [Python CPU / Python GPU](examples/00_quick_start/lstur_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling | +| Neural Recommendation with Attentive Multi-View Learning (NAML)* | [Python CPU / Python GPU](examples/00_quick_start/naml_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning | +| Neural Collaborative Filtering (NCF) | [Python CPU / Python GPU](examples/00_quick_start/ncf_movielens.ipynb) | Collaborative Filtering | Deep learning algorithm with enhanced performance for implicit feedback | +| Neural Recommendation with Personalized Attention (NPA)* | [Python CPU / Python GPU](examples/00_quick_start/npa_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with personalized attention network | +| Neural Recommendation with Multi-Head Self-Attention (NRMS)* | [Python CPU / Python GPU](examples/00_quick_start/nrms_synthetic.ipynbb) | Content-Based Filtering | Neural recommendation algorithm with multi-head self-attention | +| Restricted Boltzmann Machines (RBM) | [Python CPU / Python GPU](examples/00_quick_start/rbm_movielens.ipynb) | Collaborative Filtering | Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback | +| Riemannian Low-rank Matrix Completion (RLRMC)* | [Python CPU](examples/00_quick_start/rlrmc_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption. | +| Simple Algorithm for Recommendation (SAR)* | [Python CPU](examples/00_quick_start/sar_movielens.ipynb) | Collaborative Filtering | Similarity-based algorithm for implicit feedback dataset | +| Short-term and Long-term preference Integrated Recommender (SLi-Rec)* | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller | +| Surprise/Singular Value Decomposition (SVD) | [Python CPU](examples/02_model/surprise_svd_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large | +| Term Frequency - Inverse Document Frequency (TF-IDF) | [Python CPU](examples/00_quick_start/tfidf_covid.ipynb) | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets | +| Vowpal Wabbit Family (VW)* | [Python CPU (online training)](examples/02_model/vowpal_wabbit_deep_dive.ipynb) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing | +| Wide and Deep | [Python CPU / Python GPU](examples/00_quick_start/wide_deep_movielens.ipynb) | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features | +| xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | [Python CPU](examples/02_model/fm_deep_dive.ipynb) | Content-Based Filtering | Quick and memory efficient algorithm to predict labels with user/item features | **NOTE**: * indicates algorithms invented/contributed by Microsoft. @@ -97,12 +97,12 @@ We provide a [benchmark notebook](benchmarks/movielens.ipynb) to illustrate how | Algo | MAP | nDCG@k | Precision@k | Recall@k | RMSE | MAE | R2 | Explained Variance | | --- | --- | --- | --- | --- | --- | --- | --- | --- | -| [ALS](notebooks/00_quick_start/als_movielens.ipynb) | 0.004732 | 0.044239 | 0.048462 | 0.017796 | 0.965038 | 0.753001 | 0.255647 | 0.251648 | -| [SVD](notebooks/02_model/surprise_svd_deep_dive.ipynb) | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 | -| [SAR](notebooks/00_quick_start/sar_movielens.ipynb) | 0.113028 | 0.388321 | 0.333828 | 0.183179 | N/A | N/A | N/A | N/A | -| [NCF](notebooks/02_model/ncf_deep_dive.ipynb) | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A | -| [BPR](notebooks/02_model/cornac_bpr_deep_dive.ipynb) | 0.105365 | 0.389948 | 0.349841 | 0.181807 | N/A | N/A | N/A | N/A | -| [FastAI](notebooks/00_quick_start/fastai_movielens.ipynb) | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.744337 | 0.285308 | 0.287671 | +| [ALS](examples/00_quick_start/als_movielens.ipynb) | 0.004732 | 0.044239 | 0.048462 | 0.017796 | 0.965038 | 0.753001 | 0.255647 | 0.251648 | +| [SVD](examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb) | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 | +| [SAR](examples/00_quick_start/sar_movielens.ipynb) | 0.113028 | 0.388321 | 0.333828 | 0.183179 | N/A | N/A | N/A | N/A | +| [NCF](examples/02_model_hybrid/ncf_deep_dive.ipynb) | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A | +| [BPR](examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb) | 0.105365 | 0.389948 | 0.349841 | 0.181807 | N/A | N/A | N/A | N/A | +| [FastAI](examples/00_quick_start/fastai_movielens.ipynb) | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.744337 | 0.285308 | 0.287671 | ## Contributing @@ -125,17 +125,16 @@ The following tests run on a Windows and Linux DSVM daily. These machines run 24 | **Windows GPU** | master | [![Build Status](https://dev.azure.com/best-practices/recommenders/_apis/build/status/windows-tests/dsvm_nightly_win_gpu?branchName=master)](https://dev.azure.com/best-practices/recommenders/_build/latest?definitionId=102&branchName=master) | | staging | [![Build Status](https://dev.azure.com/best-practices/recommenders/_apis/build/status/windows-tests/dsvm_nightly_win_gpu?branchName=staging)](https://dev.azure.com/best-practices/recommenders/_build/latest?definitionId=102&branchName=staging) | | **Windows Spark** | master | [![Build Status](https://dev.azure.com/best-practices/recommenders/_apis/build/status/windows-tests/dsvm_nightly_win_pyspark?branchName=master)](https://dev.azure.com/best-practices/recommenders/_build/latest?definitionId=103&branchName=master) | | staging | [![Build Status](https://dev.azure.com/best-practices/recommenders/_apis/build/status/windows-tests/dsvm_nightly_win_pyspark?branchName=staging)](https://dev.azure.com/best-practices/recommenders/_build/latest?definitionId=103&branchName=staging) | - +## Reference papers -### Related projects +* A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", *WWW 2020: International World Wide Web Conference Taipei*, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692 + +* S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", *RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems*, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967 -[Microsoft AI Github](https://github.com/microsoft/ai): Find other Best Practice projects, and Azure AI design patterns in our central repository. diff --git a/setup.py b/setup.py index 7c5581f07e..04f706bd5e 100644 --- a/setup.py +++ b/setup.py @@ -38,6 +38,6 @@ ], keywords="recommendations recommenders recommender system engine machine learning python spark gpu", package_dir={"reco_utils": "reco_utils"}, - packages=find_packages(where=".", exclude=["tests", "scripts"]), + packages=find_packages(where=".", exclude=["tests", "tools", "examples"]), python_requires=">=3.6, <4", ) From f156c0b07f39bc10c15b6eb0878a20564e5dbe11 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 16:00:14 +0100 Subject: [PATCH 35/61] fix paths --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 2f80b2538d..ce9fcf5208 100644 --- a/README.md +++ b/README.md @@ -60,13 +60,13 @@ The table below lists the recommender algorithms currently available in the repo | --- | --- | --- | --- | | Alternating Least Squares (ALS) | [PySpark](examples/00_quick_start/als_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability | | Attentive Asynchronous Singular Value Decomposition (A2SVD)* | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism | -| Cornac/Bayesian Personalized Ranking (BPR) | [Python CPU](examples/02_model/cornac_bpr_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback | +| Cornac/Bayesian Personalized Ranking (BPR) | [Python CPU](examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback | | Convolutional Sequence Embedding Recommendation (Caser) | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Algorithm based on convolutions that aims to capture both user’s general preferences and sequential patterns | | Deep Knowledge-Aware Network (DKN)* | [Python CPU / Python GPU](examples/00_quick_start/dkn_synthetic.ipynb) | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations | | Extreme Deep Factorization Machine (xDeepFM)* | [Python CPU / Python GPU](examples/00_quick_start/xdeepfm_criteo.ipynb) | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features | | FastAI Embedding Dot Bias (FAST) | [Python CPU / Python GPU](examples/00_quick_start/fastai_movielens.ipynb) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items | -| LightFM/Hybrid Matrix Factorization | [Python CPU](examples/02_model/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks | -| LightGBM/Gradient Boosting Tree* | [Python CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](examples/02_model/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | +| LightFM/Hybrid Matrix Factorization | [Python CPU](examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks | +| LightGBM/Gradient Boosting Tree* | [Python CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | | GRU4Rec | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks | | Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | [Python CPU / Python GPU](examples/00_quick_start/lstur_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling | | Neural Recommendation with Attentive Multi-View Learning (NAML)* | [Python CPU / Python GPU](examples/00_quick_start/naml_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning | @@ -77,9 +77,9 @@ The table below lists the recommender algorithms currently available in the repo | Riemannian Low-rank Matrix Completion (RLRMC)* | [Python CPU](examples/00_quick_start/rlrmc_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption. | | Simple Algorithm for Recommendation (SAR)* | [Python CPU](examples/00_quick_start/sar_movielens.ipynb) | Collaborative Filtering | Similarity-based algorithm for implicit feedback dataset | | Short-term and Long-term preference Integrated Recommender (SLi-Rec)* | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller | -| Surprise/Singular Value Decomposition (SVD) | [Python CPU](examples/02_model/surprise_svd_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large | +| Surprise/Singular Value Decomposition (SVD) | [Python CPU](examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb) | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large | | Term Frequency - Inverse Document Frequency (TF-IDF) | [Python CPU](examples/00_quick_start/tfidf_covid.ipynb) | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets | -| Vowpal Wabbit Family (VW)* | [Python CPU (online training)](examples/02_model/vowpal_wabbit_deep_dive.ipynb) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing | +| Vowpal Wabbit Family (VW)* | [Python CPU (online training)](examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing | | Wide and Deep | [Python CPU / Python GPU](examples/00_quick_start/wide_deep_movielens.ipynb) | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features | | xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | [Python CPU](examples/02_model/fm_deep_dive.ipynb) | Content-Based Filtering | Quick and memory efficient algorithm to predict labels with user/item features | From 4f7750689a226abfde2416174faa2ad18ec2f659 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 16:25:15 +0100 Subject: [PATCH 36/61] rename --- {notebooks => examples}/00_quick_start/lstur_synthetic.ipynb | 0 {notebooks => examples}/00_quick_start/naml_synthetic.ipynb | 0 {notebooks => examples}/00_quick_start/npa_synthetic.ipynb | 0 {notebooks => examples}/00_quick_start/nrms_synthetic.ipynb | 0 {notebooks => examples}/00_quick_start/tfidf_covid.ipynb | 0 .../fm_deep_dive.ipynb | 0 .../02_model => examples/02_model_hybrid}/lightfm_deep_dive.ipynb | 0 .../04_model_select_and_optimize/nni_ncf.ipynb | 0 8 files changed, 0 insertions(+), 0 deletions(-) rename {notebooks => examples}/00_quick_start/lstur_synthetic.ipynb (100%) rename {notebooks => examples}/00_quick_start/naml_synthetic.ipynb (100%) rename {notebooks => examples}/00_quick_start/npa_synthetic.ipynb (100%) rename {notebooks => examples}/00_quick_start/nrms_synthetic.ipynb (100%) rename {notebooks => examples}/00_quick_start/tfidf_covid.ipynb (100%) rename examples/{02_model_collaborative_filtering => 02_model_hybrid}/fm_deep_dive.ipynb (100%) rename {notebooks/02_model => examples/02_model_hybrid}/lightfm_deep_dive.ipynb (100%) rename {notebooks => examples}/04_model_select_and_optimize/nni_ncf.ipynb (100%) diff --git a/notebooks/00_quick_start/lstur_synthetic.ipynb b/examples/00_quick_start/lstur_synthetic.ipynb similarity index 100% rename from notebooks/00_quick_start/lstur_synthetic.ipynb rename to examples/00_quick_start/lstur_synthetic.ipynb diff --git a/notebooks/00_quick_start/naml_synthetic.ipynb b/examples/00_quick_start/naml_synthetic.ipynb similarity index 100% rename from notebooks/00_quick_start/naml_synthetic.ipynb rename to examples/00_quick_start/naml_synthetic.ipynb diff --git a/notebooks/00_quick_start/npa_synthetic.ipynb b/examples/00_quick_start/npa_synthetic.ipynb similarity index 100% rename from notebooks/00_quick_start/npa_synthetic.ipynb rename to examples/00_quick_start/npa_synthetic.ipynb diff --git a/notebooks/00_quick_start/nrms_synthetic.ipynb b/examples/00_quick_start/nrms_synthetic.ipynb similarity index 100% rename from notebooks/00_quick_start/nrms_synthetic.ipynb rename to examples/00_quick_start/nrms_synthetic.ipynb diff --git a/notebooks/00_quick_start/tfidf_covid.ipynb b/examples/00_quick_start/tfidf_covid.ipynb similarity index 100% rename from notebooks/00_quick_start/tfidf_covid.ipynb rename to examples/00_quick_start/tfidf_covid.ipynb diff --git a/examples/02_model_collaborative_filtering/fm_deep_dive.ipynb b/examples/02_model_hybrid/fm_deep_dive.ipynb similarity index 100% rename from examples/02_model_collaborative_filtering/fm_deep_dive.ipynb rename to examples/02_model_hybrid/fm_deep_dive.ipynb diff --git a/notebooks/02_model/lightfm_deep_dive.ipynb b/examples/02_model_hybrid/lightfm_deep_dive.ipynb similarity index 100% rename from notebooks/02_model/lightfm_deep_dive.ipynb rename to examples/02_model_hybrid/lightfm_deep_dive.ipynb diff --git a/notebooks/04_model_select_and_optimize/nni_ncf.ipynb b/examples/04_model_select_and_optimize/nni_ncf.ipynb similarity index 100% rename from notebooks/04_model_select_and_optimize/nni_ncf.ipynb rename to examples/04_model_select_and_optimize/nni_ncf.ipynb From 5dd3f68446e1dc602d6e5c273f222a20bd5a0522 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 16 Jun 2020 16:34:16 +0100 Subject: [PATCH 37/61] fix :bug: and paths --- README.md | 4 +-- examples/02_model_hybrid/fm_deep_dive.ipynb | 4 +-- tests/conftest.py | 28 ++++++++++++++------- tests/ncf_common.py | 3 +-- tests/notebooks_common.py | 2 +- tests/sar_common.py | 2 ++ 6 files changed, 27 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index ce9fcf5208..48cbfcf15b 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,7 @@ The table below lists the recommender algorithms currently available in the repo | Deep Knowledge-Aware Network (DKN)* | [Python CPU / Python GPU](examples/00_quick_start/dkn_synthetic.ipynb) | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations | | Extreme Deep Factorization Machine (xDeepFM)* | [Python CPU / Python GPU](examples/00_quick_start/xdeepfm_criteo.ipynb) | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features | | FastAI Embedding Dot Bias (FAST) | [Python CPU / Python GPU](examples/00_quick_start/fastai_movielens.ipynb) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items | -| LightFM/Hybrid Matrix Factorization | [Python CPU](examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks | +| LightFM/Hybrid Matrix Factorization | [Python CPU](examples/02_model_hybrid/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks | | LightGBM/Gradient Boosting Tree* | [Python CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems | | GRU4Rec | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks | | Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | [Python CPU / Python GPU](examples/00_quick_start/lstur_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling | @@ -81,7 +81,7 @@ The table below lists the recommender algorithms currently available in the repo | Term Frequency - Inverse Document Frequency (TF-IDF) | [Python CPU](examples/00_quick_start/tfidf_covid.ipynb) | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets | | Vowpal Wabbit Family (VW)* | [Python CPU (online training)](examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing | | Wide and Deep | [Python CPU / Python GPU](examples/00_quick_start/wide_deep_movielens.ipynb) | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features | -| xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | [Python CPU](examples/02_model/fm_deep_dive.ipynb) | Content-Based Filtering | Quick and memory efficient algorithm to predict labels with user/item features | +| xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | [Python CPU](examples/02_model_hybrid/fm_deep_dive.ipynb) | Content-Based Filtering | Quick and memory efficient algorithm to predict labels with user/item features | **NOTE**: * indicates algorithms invented/contributed by Microsoft. diff --git a/examples/02_model_hybrid/fm_deep_dive.ipynb b/examples/02_model_hybrid/fm_deep_dive.ipynb index e8518dfcaf..eb1d0240a3 100644 --- a/examples/02_model_hybrid/fm_deep_dive.ipynb +++ b/examples/02_model_hybrid/fm_deep_dive.ipynb @@ -15,7 +15,7 @@ "source": [ "# Factorization Machine Deep Dive\n", "\n", - "Factorization machine (FM) is one of the representative algorithms that are used for building content-based recommenders model. The algorithm is powerful in terms of capturing the effects of not just the input features but also their interactions. The algorithm provides better generalization capability and expressiveness compared to other classic algorithms such as SVMs. The most recent research extends the basic FM algorithms by using deep learning techniques, which achieve remarkable improvement in a few practical use cases.\n", + "Factorization machine (FM) is one of the representative algorithms that are used for building hybrid recommenders model. The algorithm is powerful in terms of capturing the effects of not just the input features but also their interactions. The algorithm provides better generalization capability and expressiveness compared to other classic algorithms such as SVMs. The most recent research extends the basic FM algorithms by using deep learning techniques, which achieve remarkable improvement in a few practical use cases.\n", "\n", "This notebook presents a deep dive into the Factorization Machine algorithm, and demonstrates some best practices of using the contemporary FM implementations like [`xlearn`](https://github.com/aksnzhy/xlearn) for dealing with tasks like click-through rate prediction." ] @@ -890,4 +890,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file diff --git a/tests/conftest.py b/tests/conftest.py index ce98ace4e2..dfa46f5488 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -232,31 +232,41 @@ def notebooks(): folder_notebooks, "01_prepare_data", "wikidata_knowledge_graph.ipynb" ), "als_deep_dive": os.path.join( - folder_notebooks, "02_model", "als_deep_dive.ipynb" + folder_notebooks, "02_model_collaborative_filtering", "als_deep_dive.ipynb" ), "surprise_svd_deep_dive": os.path.join( - folder_notebooks, "02_model", "surprise_svd_deep_dive.ipynb" + folder_notebooks, + "02_model_collaborative_filtering", + "surprise_svd_deep_dive.ipynb", ), "baseline_deep_dive": os.path.join( - folder_notebooks, "02_model", "baseline_deep_dive.ipynb" + folder_notebooks, + "02_model_collaborative_filtering", + "baseline_deep_dive.ipynb", ), "ncf_deep_dive": os.path.join( - folder_notebooks, "02_model", "ncf_deep_dive.ipynb" + folder_notebooks, "02_model_hybrid", "ncf_deep_dive.ipynb" ), "sar_deep_dive": os.path.join( - folder_notebooks, "02_model", "sar_deep_dive.ipynb" + folder_notebooks, "02_model_collaborative_filtering", "sar_deep_dive.ipynb" ), "vowpal_wabbit_deep_dive": os.path.join( - folder_notebooks, "02_model", "vowpal_wabbit_deep_dive.ipynb" + folder_notebooks, + "02_model_content_based_filtering", + "vowpal_wabbit_deep_dive.ipynb", ), "mmlspark_lightgbm_criteo": os.path.join( - folder_notebooks, "02_model", "mmlspark_lightgbm_criteo.ipynb" + folder_notebooks, + "02_model_content_based_filtering", + "mmlspark_lightgbm_criteo.ipynb", ), "cornac_bpr_deep_dive": os.path.join( - folder_notebooks, "02_model", "cornac_bpr_deep_dive.ipynb" + folder_notebooks, + "02_model_collaborative_filtering", + "cornac_bpr_deep_dive.ipynb", ), "xlearn_fm_deep_dive": os.path.join( - folder_notebooks, "02_model", "fm_deep_dive.ipynb" + folder_notebooks, "02_model_hybrid", "fm_deep_dive.ipynb" ), "evaluation": os.path.join(folder_notebooks, "03_evaluate", "evaluation.ipynb"), "spark_tuning": os.path.join( diff --git a/tests/ncf_common.py b/tests/ncf_common.py index fa0a9c42be..df85a8f2cf 100644 --- a/tests/ncf_common.py +++ b/tests/ncf_common.py @@ -33,8 +33,7 @@ def python_dataset_ncf(test_specs_ncf): def random_date_generator(start_date, range_in_days): """Helper function to generate random timestamps. - Reference: https://stackoverflow.com/questions/41006182/generate-random-dates-within-a - -range-in-numpy + Reference: https://stackoverflow.com/questions/41006182/generate-random-dates-within-a-range-in-numpy """ days_to_add = np.arange(0, range_in_days) random_dates = [] diff --git a/tests/notebooks_common.py b/tests/notebooks_common.py index 910c1d9baa..2354f4855f 100644 --- a/tests/notebooks_common.py +++ b/tests/notebooks_common.py @@ -12,5 +12,5 @@ def path_notebooks(): """Returns the path of the notebooks folder""" return os.path.abspath( - os.path.join(os.path.dirname(__file__), os.path.pardir, "notebooks") + os.path.join(os.path.dirname(__file__), os.path.pardir, "examples") ) diff --git a/tests/sar_common.py b/tests/sar_common.py index 820bbce2dc..a225d9eebd 100644 --- a/tests/sar_common.py +++ b/tests/sar_common.py @@ -34,6 +34,7 @@ def load_userpred(file, k=10): def read_matrix(file, row_map=None, col_map=None): """read in test matrix and hash it""" reader = _csv_reader_url(file) + # skip the header col_ids = next(reader)[1:] row_ids = [] @@ -42,6 +43,7 @@ def read_matrix(file, row_map=None, col_map=None): rows += [row[1:]] row_ids += [row[0]] array = np.array(rows) + # now map the rows and columns to the right values if row_map is not None and col_map is not None: row_index = [row_map[x] for x in row_ids] From 123a737e2f4f900085e4d8d5c9b0354217eddfe4 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Wed, 17 Jun 2020 19:30:19 +0100 Subject: [PATCH 38/61] tests --- tests/README.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/tests/README.md b/tests/README.md index fffae0081a..bea213bcde 100644 --- a/tests/README.md +++ b/tests/README.md @@ -22,13 +22,13 @@ Note: Spark tests are not currently run on AzureML and may be set up in the futu
Unit tests -Unit tests ensure that each class or function behaves as it should. Every time a developer makes a pull request to staging or master branch, a battery of unit tests is executed. +Unit tests ensure that each class or function behaves as it should. Every time a developer makes a pull request to staging or master branch, a battery of unit tests is executed. **Note that the next instructions execute the tests from the root folder.** For executing the Python unit tests for the utilities: - pytest tests/unit -m "not notebooks and not spark and not gpu" + pytest tests/unit -m "not notebooks and not spark and not gpu" --durations 0 For executing the Python unit tests for the notebooks: @@ -61,17 +61,17 @@ Smoke tests make sure that the system works and are executed just before the int For executing the Python smoke tests: - pytest --durations=0 tests/smoke -m "smoke and not spark and not gpu" + pytest tests/smoke -m "smoke and not spark and not gpu" --durations 0 For executing the Python GPU smoke tests: - pytest --durations=0 tests/smoke -m "smoke and not spark and gpu" + pytest tests/smoke -m "smoke and not spark and gpu" --durations 0 For executing the PySpark smoke tests: - pytest --durations=0 tests/smoke -m "smoke and spark and not gpu" + pytest tests/smoke -m "smoke and spark and not gpu" --durations 0 -*NOTE: Adding `--durations=0` shows the computation time of all tests.* +*NOTE: Adding `--durations 0` shows the computation time of all tests.*
@@ -84,17 +84,17 @@ Integration tests make sure that the program results are acceptable. For executing the Python integration tests: - pytest --durations=0 tests/integration -m "integration and not spark and not gpu" + pytest tests/integration -m "integration and not spark and not gpu" --durations 0 For executing the Python GPU integration tests: - pytest --durations=0 tests/integration -m "integration and not spark and gpu" + pytest tests/integration -m "integration and not spark and gpu" --durations 0 For executing the PySpark integration tests: - pytest --durations=0 tests/integration -m "integration and spark and not gpu" + pytest tests/integration -m "integration and spark and not gpu" --durations 0 -*NOTE: Adding `--durations=0` shows the computation time of all tests.* +*NOTE: Adding `--durations 0` shows the computation time of all tests.*
@@ -136,7 +136,7 @@ Several of the tests are skipped for various reasons which are noted below. In order to skip a test because there is an OS or upstream issue which cannot be resolved you can use pytest [annotations](https://docs.pytest.org/en/latest/skipping.html). - + Example: @pytest.mark.skip(reason="") @@ -154,7 +154,7 @@ In the notebooks of this repo, we use [Papermill](https://github.com/nteract/pap Executing a notebook with Papermill is easy, this is what we mostly do in the unit tests. Next we show just one of the tests that we have in [tests/unit/test_notebooks_python.py](unit/test_notebooks_python.py). -``` +```python import pytest import papermill as pm from tests.notebooks_common import OUTPUT_NOTEBOOK, KERNEL_NAME @@ -171,7 +171,7 @@ For executing this test, first make sure you are in the correct environment as d **Note that the next instruction executes the tests from the root folder.** -``` +```bash pytest tests/unit/test_notebooks_python.py::test_sar_single_node_runs ``` @@ -179,15 +179,15 @@ pytest tests/unit/test_notebooks_python.py::test_sar_single_node_runs A more advanced option is used in the smoke and integration tests, where we not only execute the notebook, but inject parameters and recover the computed metrics. -The first step is to tag the parameters that we are going to inject. For it we need to modify the notebook. We will add a tag with the name `parameters`. To add a tag, go the the notebook menu, View, Cell Toolbar and Tags. A tag field will appear on every cell. The variables in the cell tagged with `parameters` can be injected. The typical variables that we inject are `MOVIELENS_DATA_SIZE`, `EPOCHS` and other configuration variables for our algorithms. +The first step is to tag the parameters that we are going to inject. For it we need to modify the notebook. We will add a tag with the name `parameters`. To add a tag, go the the notebook menu, View, Cell Toolbar and Tags. A tag field will appear on every cell. The variables in the cell tagged with `parameters` can be injected. The typical variables that we inject are `MOVIELENS_DATA_SIZE`, `EPOCHS` and other configuration variables for our algorithms. -The way papermill works to inject parameters is very simple, it generates a copy of the notebook (in our code we call it `OUTPUT_NOTEBOOK`), and creates a new cell with the injected variables. +The way papermill works to inject parameters is very simple, it generates a copy of the notebook (in our code we call it `OUTPUT_NOTEBOOK`), and creates a new cell with the injected variables. The second modification that we need to do to the notebook is to record the metrics we want to test using `pm.record("output_variable", python_variable_name)`. We normally use the last cell of the notebook to record all the metrics. These are the metrics that we are going to control to in the smoke and integration tests. This is an example on how we do a smoke test. The complete code can be found in [tests/smoke/test_notebooks_python.py](smoke/test_notebooks_python.py): -``` +```python import pytest import papermill as pm from tests.notebooks_common import OUTPUT_NOTEBOOK, KERNEL_NAME From 14d7c5054bfa2a3569bd1011615fe7d1e276b62d Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Wed, 17 Jun 2020 19:48:40 +0100 Subject: [PATCH 39/61] fixing tests --- tests/unit/test_lightfm_utils.py | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/unit/test_lightfm_utils.py b/tests/unit/test_lightfm_utils.py index 0cef7bc273..b04cb25295 100644 --- a/tests/unit/test_lightfm_utils.py +++ b/tests/unit/test_lightfm_utils.py @@ -103,6 +103,7 @@ def fitting(model, interactions, df): test_interactions=test_interactions, user_features=user_features, item_features=item_features, + no_epochs=1, show_plot=False, ) return output, fitted_model From 1ee91fa3af4f7f3d9d69992afaf1317216ed020c Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Wed, 17 Jun 2020 22:59:08 +0100 Subject: [PATCH 40/61] :bug: --- .../mmlspark_lightgbm_criteo.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb index 4bd13a1ff0..fe8005bad4 100644 --- a/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb +++ b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb @@ -77,7 +77,7 @@ "# Setup MML Spark\n", "if not is_databricks():\n", " # get the maven coordinates for MML Spark from databricks_install script\n", - " from scripts.databricks_install import MMLSPARK_INFO\n", + " from tools.databricks_install import MMLSPARK_INFO\n", " packages = [MMLSPARK_INFO[\"maven\"][\"coordinates\"]]\n", " repo = MMLSPARK_INFO[\"maven\"].get(\"repo\")\n", " spark = start_or_get_spark(packages=packages, repository=repo)\n", @@ -466,7 +466,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" + "version": "3.6.0" } }, "nbformat": 4, From d90c9a37c1d21a7f60e31774436da071af6e1dd9 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 18 Jun 2020 14:32:43 +0100 Subject: [PATCH 41/61] :bug: --- examples/04_model_select_and_optimize/nni_surprise_svd.ipynb | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb b/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb index bdda2e997e..de9c2aaa95 100644 --- a/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb +++ b/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb @@ -101,6 +101,7 @@ "# Select Movielens data size: 100k, 1m\n", "MOVIELENS_DATA_SIZE = '100k'\n", "SURPRISE_READER = 'ml-100k'\n", + "tmp_dir = TemporaryDirectory()\n", "TMP_DIR = tmp_dir.name\n", "NUM_EPOCHS = 30\n", "MAX_TRIAL_NUM = 10\n", From 39705c14f826a00a284e5a4bd67448bc6a857a46 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 18 Jun 2020 15:34:01 +0100 Subject: [PATCH 42/61] typo --- .../01_prepare_data/wikidata_knowledge_graph.ipynb | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/examples/01_prepare_data/wikidata_knowledge_graph.ipynb b/examples/01_prepare_data/wikidata_knowledge_graph.ipynb index 909f712e26..ce58430023 100644 --- a/examples/01_prepare_data/wikidata_knowledge_graph.ipynb +++ b/examples/01_prepare_data/wikidata_knowledge_graph.ipynb @@ -211,7 +211,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Visualize KG using networkx" + "### Visualize KG using network" ] }, { @@ -550,13 +550,6 @@ "# Record results with papermill for unit-tests\n", "pm.record(\"length_result\", number_movies)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { @@ -576,7 +569,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" + "version": "3.6.10" } }, "nbformat": 4, From e1bbd2a7c2e6dd74dc01d515325e3d192c6fe918 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 19 Jun 2020 11:20:28 +0100 Subject: [PATCH 43/61] fix :bug: test lightfm --- tests/unit/test_lightfm_utils.py | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tests/unit/test_lightfm_utils.py b/tests/unit/test_lightfm_utils.py index b04cb25295..ee80c53eaa 100644 --- a/tests/unit/test_lightfm_utils.py +++ b/tests/unit/test_lightfm_utils.py @@ -137,7 +137,17 @@ def test_interactions(interactions): def test_fitting(fitting): output, _ = fitting - assert output.shape == (600, 4) + assert output.shape == (4, 4) + target = np.array( + [ + [0, 0.10000000894069672, "train", "Precision"], + [0, 0.10000000149011612, "test", "Precision"], + [0, 1.0, "train", "Recall"], + [0, 1.0, "test", "Recall"], + ], + dtype="object", + ) + np.testing.assert_array_equal(output, target) def test_sim_users(sim_users): From 9d7c661167af655721c8efd1f361315f7d45c497 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 19 Jun 2020 14:06:15 +0100 Subject: [PATCH 44/61] papers --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 48cbfcf15b..cbad18188c 100644 --- a/README.md +++ b/README.md @@ -136,5 +136,6 @@ The following tests run on a Windows and Linux DSVM daily. These machines run 24 * A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", *WWW 2020: International World Wide Web Conference Taipei*, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692 -* S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", *RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems*, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967 +* L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", *ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019)*, 2019. +* S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", *RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems*, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967 From 44b4843f7fef422ccf1d6a2492719bbc374eb824 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 19 Jun 2020 14:06:41 +0100 Subject: [PATCH 45/61] papers --- README.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index cbad18188c..e7edb3f292 100644 --- a/README.md +++ b/README.md @@ -134,8 +134,6 @@ The following tests run on a Windows and Linux DSVM daily. These machines run 24 ## Reference papers -* A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", *WWW 2020: International World Wide Web Conference Taipei*, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692 - -* L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", *ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019)*, 2019. - -* S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", *RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems*, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967 +- A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", *WWW 2020: International World Wide Web Conference Taipei*, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692 +- L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", *ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019)*, 2019. +- S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", *RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems*, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967 From da7cdbfb8d2172c41f532ca197e1ba4682da21b2 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 19 Jun 2020 14:07:35 +0100 Subject: [PATCH 46/61] typo --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index e7edb3f292..3c85cfc769 100644 --- a/README.md +++ b/README.md @@ -127,10 +127,10 @@ The following tests run on a Windows and Linux DSVM daily. These machines run 24 ## Related projects -[Microsoft AI Github](https://github.com/microsoft/ai): Find other Best Practice projects, and Azure AI design patterns in our central repository. -[NLP best practices](https://github.com/microsoft/nlp-recipes): Best practices and examples on NLP. -[Computer vision best practices](https://github.com/microsoft/computervision-recipes): Best practices and examples on computer vision. -[Forecasting best practices](https://github.com/microsoft/forecasting): Best practices and examples on time series forecasting. +- [Microsoft AI Github](https://github.com/microsoft/ai): Find other Best Practice projects, and Azure AI design patterns in our central repository. +- [NLP best practices](https://github.com/microsoft/nlp-recipes): Best practices and examples on NLP. +- [Computer vision best practices](https://github.com/microsoft/computervision-recipes): Best practices and examples on computer vision. +- [Forecasting best practices](https://github.com/microsoft/forecasting): Best practices and examples on time series forecasting. ## Reference papers From a6e441e9a66f2b96d76cd2fa238385c5e872cbbe Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 07:59:23 +0000 Subject: [PATCH 47/61] fixed :bug: with pymanopt --- tools/generate_conda_file.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/generate_conda_file.py b/tools/generate_conda_file.py index 1056fa8af0..1aba5abc1b 100644 --- a/tools/generate_conda_file.py +++ b/tools/generate_conda_file.py @@ -83,7 +83,7 @@ "memory-profiler": "memory-profiler>=0.54.0", "nbconvert": "nbconvert==5.5.0", "pydocumentdb": "pydocumentdb>=2.3.3", - "pymanopt": "pymanopt==0.2.3", + "pymanopt": "pymanopt==0.2.5", "xlearn": "xlearn==0.40a1", "transformers": "transformers==2.5.0", "tensorflow": "tensorflow==1.15.2", From 57b0c8a0d8b69d4cf03ef97e361691ba9a75e229 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 08:19:16 +0000 Subject: [PATCH 48/61] long tail --- GLOSSARY.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/GLOSSARY.md b/GLOSSARY.md index 1a3da65073..9bd7ec03d9 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -26,7 +26,7 @@ * Knowledge graph data: A knowledge graph is a directed heterogeneous graph in which nodes correspond to entities (items or item attributes) and edges correspond to relations [5]. -* Long tail products: Typically, the shape of items interacted in retail follow a long tail distribution [1,2].... +* Long tail items: Typically, the item interaction distribution has the form of long tail, where items in the tail have a small number of interactions, corresponding to unpopular items, and items in the head have a large number of interactions [1,2]. From the algorithmic point of view, items in the tail suffer from the cold-start problem, making them hard for recommendation systems to use. However, from the business point of view, the items in the tail can be highly profitable, since these items are less popular, business can apply a higher margin to them. Recommendation systems that optimize metrics like novelty and diversity, can help to find users willing to get these long tail items. * Multi-Variate Test (MVT): Methodology to evaluate the performance of a system in production. It is similar to A/B testing, with the difference that instead of having two test groups, MVT has multiples groups. @@ -38,7 +38,7 @@ * Ranking metrics: These are used to evaluate how relevant recommendations are for users. They include precision at k, recall at k, nDCG and MAP. See the [list of metrics in Recommenders repository](../../examples/03_evaluate). -* Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users gave to items. They include RMSE, MAE, R squared or explained variance. See the [list of metrics in Recommenders repository](../../examples/03_evaluate). +* Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users give to items. They include RMSE, MAE, R squared or explained variance. See the [list of metrics in Recommenders repository](../../examples/03_evaluate). * Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type. From c0185c1d7131f85a9af9f0ba086d992b7b4647f9 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 10:33:14 +0000 Subject: [PATCH 49/61] spark --- SETUP.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/SETUP.md b/SETUP.md index 0b059ffb1e..49471cf4c4 100644 --- a/SETUP.md +++ b/SETUP.md @@ -96,6 +96,8 @@ To set these variables every time the environment is activated, we can follow th First, get the path of the environment `reco_pyspark` is installed: RECO_ENV=$(conda env list | grep reco_pyspark | awk '{print $NF}') + mkdir -p $RECO_ENV/etc/conda/activate.d + mkdir -p $RECO_ENV/etc/conda/deactivate.d Then, create the file `$RECO_ENV/etc/conda/activate.d/env_vars.sh` and add: @@ -107,8 +109,7 @@ Then, create the file `$RECO_ENV/etc/conda/activate.d/env_vars.sh` and add: unset SPARK_HOME This will export the variables every time we do `conda activate reco_pyspark`. -To unset these variables when we deactivate the environment, -create the file `$RECO_ENV/etc/conda/deactivate.d/env_vars.sh` and add: +To unset these variables when we deactivate the environment, create the file `$RECO_ENV/etc/conda/deactivate.d/env_vars.sh` and add: #!/bin/sh unset PYSPARK_PYTHON @@ -180,6 +181,7 @@ If you are using the DSVM, you can [connect to JupyterHub](https://docs.microsof ### Troubleshooting for the DSVM * We found that there can be problems if the Spark version of the machine is not the same as the one in the conda file. You can use the option `--pyspark-version` to address this issue. + * When running Spark on a single local node it is possible to run out of disk space as temporary files are written to the user's home directory. To avoid this on a DSVM, we attached an additional disk to the DSVM and made modifications to the Spark configuration. This is done by including the following lines in the file at `/dsvm/tools/spark/current/conf/spark-env.sh`. ```{shell} @@ -188,6 +190,8 @@ SPARK_WORKER_DIR="/mnt" SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.appDataTtl=3600, -Dspark.worker.cleanup.interval=300, -Dspark.storage.cleanupFilesAfterExecutorExit=true" ``` +* Another source of problems is when the variable `SPARK_HOME` is not set correctly. In the Azure DSVM, `SPARK_HOME` should be `/dsvm/tools/spark/current`. + ## Setup guide for Azure Databricks ### Requirements of Azure Databricks From b0f8a592067217aaa388bd8e6626c32afabca89b Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 12:00:01 +0000 Subject: [PATCH 50/61] ignore --- .gitignore | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index 7ef2c5419f..254e930c1b 100644 --- a/.gitignore +++ b/.gitignore @@ -134,6 +134,7 @@ reco_*.yaml *.dat *.csv *.zip +*.7z .vscode/ u.item ml-100k/ @@ -150,7 +151,8 @@ ml-20m/ *.ckpt* *.png *.jpg -*.gif *.jpeg +*.gif *.model -*.mml \ No newline at end of file +*.mml +nohup.out From 841fc49489d2333b6c36d5f07992df72944e46ab Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 12:29:14 +0000 Subject: [PATCH 51/61] mmlspark lgb criteo --- .../mmlspark_lightgbm_criteo.ipynb | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb index fe8005bad4..53c7d46b96 100644 --- a/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb +++ b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb @@ -39,7 +39,7 @@ "This notebook can be run in a Spark environment in a DSVM or in Azure Databricks. For more details about the installation process, please refer to the [setup instructions](../../SETUP.md).\n", "\n", "**NOTE for Azure Databricks:**\n", - "* A python script is provided to simplify setting up Azure Databricks with the correct dependencies. Run ```python scripts/databricks_install.py -h``` for more details.\n", + "* A python script is provided to simplify setting up Azure Databricks with the correct dependencies. Run ```python tools/databricks_install.py -h``` for more details.\n", "* MMLSpark should not be run on a cluster with autoscaling enabled. Disable the flag in the Azure Databricks Cluster configuration before running this notebook." ] }, @@ -354,7 +354,15 @@ "metadata": {}, "outputs": [], "source": [ - "model = lgbm.fit(train)\n", + "model = lgbm.fit(train)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ "predictions = model.transform(test)" ] }, From 871ef72e649027fc8a60f2285e48ee42678744b5 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 12:35:43 +0000 Subject: [PATCH 52/61] :bug: --- examples/06_benchmarks/movielens.ipynb | 30 ++++++++++++++------------ 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/examples/06_benchmarks/movielens.ipynb b/examples/06_benchmarks/movielens.ipynb index 3af571cc43..ee2768bef4 100644 --- a/examples/06_benchmarks/movielens.ipynb +++ b/examples/06_benchmarks/movielens.ipynb @@ -69,31 +69,33 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "System version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n", + "System version: 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21) \n", "[GCC 7.3.0]\n", - "Pandas version: 0.25.1\n", - "PySpark version: 2.3.1\n", + "Pandas version: 0.25.3\n", + "PySpark version: 2.4.4\n", "Surprise version: 1.1.0\n", - "PyTorch version: 1.2.0\n", + "PyTorch version: 1.4.0\n", "Fast AI version: 1.0.46\n", - "Cornac version: 1.2.0\n", - "Tensorflow version: 1.12.0\n", - "CUDA version: CUDA Version 10.1.168\n", - "CuDNN version: 7.5.1\n", - "Number of cores: 48\n" + "Cornac version: 1.6.1\n", + "Tensorflow version: 1.15.2\n", + "CUDA version: CUDA Version 10.1.243\n", + "CuDNN version: 7.6.5\n", + "Number of cores: 6\n", + "The autoreload extension is already loaded. To reload it, use:\n", + " %reload_ext autoreload\n" ] } ], "source": [ "import sys\n", - "sys.path.append(\"../\")\n", + "sys.path.append(\"../../\")\n", "import os\n", "import json\n", "import pandas as pd\n", @@ -857,9 +859,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python (reco_full)", + "display_name": "reco_full", "language": "python", - "name": "reco_full" + "name": "conda-env-reco_full-py" }, "language_info": { "codemirror_mode": { @@ -871,7 +873,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" + "version": "3.6.10" } }, "nbformat": 4, From 18810667267be05f5d306ecafc273d154d9496ae Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 12:49:57 +0000 Subject: [PATCH 53/61] java8 --- SETUP.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/SETUP.md b/SETUP.md index 49471cf4c4..c6bfe507e7 100644 --- a/SETUP.md +++ b/SETUP.md @@ -192,6 +192,13 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a * Another source of problems is when the variable `SPARK_HOME` is not set correctly. In the Azure DSVM, `SPARK_HOME` should be `/dsvm/tools/spark/current`. +* Java 11 might produce errors when running the notebooks. To change it to Java 8: + +``` +sudo apt install openjdk-8-jdk +sudo update-alternatives --config java +``` + ## Setup guide for Azure Databricks ### Requirements of Azure Databricks From 24b6ba9664b808abb41f118c9adefb983b56be1d Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 22 Jun 2020 15:02:21 +0000 Subject: [PATCH 54/61] benchmark --- examples/06_benchmarks/movielens.ipynb | 269 ++++++++++--------- reco_utils/recommender/ncf/ncf_singlenode.py | 58 ++-- tools/databricks_install.py | 4 +- 3 files changed, 176 insertions(+), 155 deletions(-) diff --git a/examples/06_benchmarks/movielens.ipynb b/examples/06_benchmarks/movielens.ipynb index ee2768bef4..9d18fc9eb4 100644 --- a/examples/06_benchmarks/movielens.ipynb +++ b/examples/06_benchmarks/movielens.ipynb @@ -26,7 +26,7 @@ "\n", "* Environment\n", " * The comparison is run on a [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). \n", - " * The virtual machine size is Standard NC6s_v2 (6 vcpus, 112 GB memory, 1P100 GPU).\n", + " * The virtual machine size is Standard NC6 (6 vcpus, 55 GB memory, 1K80 GPU).\n", " * It should be noted that the single node DSVM is not supposed to run scalable benchmarking analysis. Either scaling up or out the computing instances is necessary to run the benchmarking in an run-time efficient way without any memory issue.\n", " * **NOTE ABOUT THE DEPENDENCIES TO INSTALL**: This notebook uses CPU, GPU and PySpark algorithms, so make sure you install the `full environment` as detailed in the [SETUP.md](../SETUP.md). \n", " \n", @@ -64,12 +64,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 0 Globals settings" + "## Globals settings" ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 1, "metadata": {}, "outputs": [ { @@ -87,9 +87,7 @@ "Tensorflow version: 1.15.2\n", "CUDA version: CUDA Version 10.1.243\n", "CuDNN version: 7.6.5\n", - "Number of cores: 6\n", - "The autoreload extension is already loaded. To reload it, use:\n", - " %reload_ext autoreload\n" + "Number of cores: 6\n" ] } ], @@ -415,7 +413,7 @@ "outputs": [], "source": [ "data_sizes = [\"100k\", \"1m\"] # Movielens data size: 100k, 1m, 10m, or 20m\n", - "algorithms = [\"als\", \"svd\", \"sar\", \"ncf\", \"fastai\", \"bpr\"]" + "algorithms = [\"als\", \"svd\", \"sar\", \"ncf\", \"fastai\", \"bpr\"]\n" ] }, { @@ -429,7 +427,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "100%|██████████| 4.81k/4.81k [00:01<00:00, 2.79kKB/s]\n" + "100%|██████████| 4.81k/4.81k [00:00<00:00, 8.98kKB/s]\n" ] }, { @@ -445,9 +443,23 @@ "Computing sar algorithm on Movielens 100k\n", "\n", "Computing ncf algorithm on Movielens 100k\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", "\n", - "Computing fastai algorithm on Movielens 100k\n", + "WARNING:tensorflow:From /anaconda/envs/reco_full/lib/python3.6/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1866: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /anaconda/envs/reco_full/lib/python3.6/site-packages/tensorflow_core/python/ops/losses/losses_impl.py:121: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", "\n", + "Computing fastai algorithm on Movielens 100k\n", + "█\n", "Computing bpr algorithm on Movielens 100k\n" ] }, @@ -455,7 +467,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "100%|██████████| 5.78k/5.78k [00:01<00:00, 3.10kKB/s]\n" + "100%|██████████| 5.78k/5.78k [00:00<00:00, 9.57kKB/s]\n" ] }, { @@ -473,10 +485,10 @@ "Computing ncf algorithm on Movielens 1m\n", "\n", "Computing fastai algorithm on Movielens 1m\n", - "\n", + "█\n", "Computing bpr algorithm on Movielens 1m\n", - "CPU times: user 48min 7s, sys: 4min 42s, total: 52min 49s\n", - "Wall time: 50min 1s\n" + "CPU times: user 39min 59s, sys: 4min 12s, total: 44min 11s\n", + "Wall time: 45min 35s\n" ] } ], @@ -596,204 +608,204 @@ " \n", " \n", " \n", - " 1\n", + " 1\n", " 100k\n", " als\n", " 10\n", - " 5.2630\n", - " 0.0637\n", - " 0.970278\n", - " 0.756116\n", - " 0.247909\n", - " 0.243499\n", - " 0.0937\n", - " 0.004784\n", - " 0.045017\n", - " 0.047508\n", - " 0.015420\n", + " 4.6377\n", + " 0.0359\n", + " 0.967222\n", + " 0.752874\n", + " 0.252683\n", + " 0.248257\n", + " 0.0588\n", + " 0.004343\n", + " 0.042099\n", + " 0.046023\n", + " 0.014854\n", " \n", " \n", - " 2\n", + " 2\n", " 100k\n", " svd\n", " 10\n", - " 4.2469\n", - " 0.6350\n", + " 4.0243\n", + " 0.2475\n", " 0.938681\n", " 0.742690\n", " 0.291967\n", " 0.291971\n", - " 17.1032\n", + " 15.7942\n", " 0.012873\n", " 0.095930\n", " 0.091198\n", " 0.032783\n", " \n", " \n", - " 3\n", + " 3\n", " 100k\n", " sar\n", " 10\n", - " 0.2794\n", + " 0.3573\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", - " 0.1062\n", + " 0.0892\n", " 0.113028\n", " 0.388321\n", " 0.333828\n", " 0.183179\n", " \n", " \n", - " 4\n", + " 4\n", " 100k\n", " ncf\n", " 10\n", - " 70.0113\n", + " 47.5429\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", - " 4.2467\n", - " 0.102130\n", - " 0.379410\n", - " 0.331707\n", - " 0.173080\n", + " 3.0374\n", + " 0.106153\n", + " 0.391953\n", + " 0.347402\n", + " 0.182610\n", " \n", " \n", - " 5\n", + " 5\n", " 100k\n", " fastai\n", " 10\n", - " 94.1108\n", - " 0.0459\n", - " 0.943084\n", - " 0.744337\n", - " 0.285308\n", - " 0.287671\n", - " 3.3439\n", - " 0.025503\n", - " 0.147866\n", - " 0.130329\n", - " 0.053824\n", + " 70.8712\n", + " 0.0389\n", + " 0.943093\n", + " 0.744332\n", + " 0.285296\n", + " 0.287658\n", + " 2.9128\n", + " 0.025590\n", + " 0.148119\n", + " 0.130541\n", + " 0.053903\n", " \n", " \n", - " 6\n", + " 6\n", " 100k\n", " bpr\n", " 10\n", - " 0.6386\n", + " 0.4485\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", - " 1.1675\n", + " 1.2334\n", " 0.105365\n", " 0.389948\n", " 0.349841\n", " 0.181807\n", " \n", " \n", - " 7\n", + " 7\n", " 1m\n", " als\n", " 10\n", - " 2.9332\n", - " 0.0222\n", - " 0.860514\n", - " 0.679387\n", - " 0.412306\n", - " 0.406364\n", - " 0.0778\n", - " 0.002144\n", - " 0.025975\n", - " 0.032284\n", - " 0.010219\n", + " 3.3197\n", + " 0.0124\n", + " 0.860612\n", + " 0.679803\n", + " 0.412120\n", + " 0.406228\n", + " 0.0348\n", + " 0.001747\n", + " 0.022182\n", + " 0.028458\n", + " 0.008963\n", " \n", " \n", - " 8\n", + " 8\n", " 1m\n", " svd\n", " 10\n", - " 42.5560\n", - " 3.5602\n", + " 41.3965\n", + " 3.1037\n", " 0.883017\n", " 0.695366\n", " 0.374910\n", " 0.374911\n", - " 251.1241\n", + " 233.4965\n", " 0.008828\n", " 0.089320\n", " 0.082856\n", " 0.021582\n", " \n", " \n", - " 9\n", + " 9\n", " 1m\n", " sar\n", " 10\n", - " 2.3723\n", + " 3.4665\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", - " 2.1282\n", + " 2.2988\n", " 0.066214\n", " 0.313502\n", " 0.279692\n", " 0.111135\n", " \n", " \n", - " 10\n", + " 10\n", " 1m\n", " ncf\n", " 10\n", - " 868.2188\n", + " 676.7188\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", - " 46.7333\n", - " 0.063082\n", - " 0.350428\n", - " 0.322213\n", - " 0.109614\n", + " 44.3242\n", + " 0.065849\n", + " 0.355279\n", + " 0.324648\n", + " 0.111693\n", " \n", " \n", - " 11\n", + " 11\n", " 1m\n", " fastai\n", " 10\n", - " 745.9277\n", - " 0.4392\n", - " 0.874488\n", - " 0.695508\n", - " 0.386927\n", - " 0.389560\n", - " 57.9075\n", - " 0.026265\n", - " 0.184667\n", - " 0.168561\n", - " 0.055841\n", + " 664.2760\n", + " 0.3803\n", + " 0.874491\n", + " 0.695498\n", + " 0.386921\n", + " 0.389541\n", + " 57.5519\n", + " 0.026166\n", + " 0.184278\n", + " 0.167881\n", + " 0.055688\n", " \n", " \n", - " 12\n", + " 12\n", " 1m\n", " bpr\n", " 10\n", - " 5.2941\n", + " 4.9206\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", " NaN\n", - " 20.4489\n", + " 23.9635\n", " 0.067077\n", " 0.353277\n", " 0.324449\n", @@ -805,45 +817,45 @@ ], "text/plain": [ " Data Algo K Train time (s) Predicting time (s) RMSE MAE \\\n", - "1 100k als 10 5.2630 0.0637 0.970278 0.756116 \n", - "2 100k svd 10 4.2469 0.6350 0.938681 0.742690 \n", - "3 100k sar 10 0.2794 NaN NaN NaN \n", - "4 100k ncf 10 70.0113 NaN NaN NaN \n", - "5 100k fastai 10 94.1108 0.0459 0.943084 0.744337 \n", - "6 100k bpr 10 0.6386 NaN NaN NaN \n", - "7 1m als 10 2.9332 0.0222 0.860514 0.679387 \n", - "8 1m svd 10 42.5560 3.5602 0.883017 0.695366 \n", - "9 1m sar 10 2.3723 NaN NaN NaN \n", - "10 1m ncf 10 868.2188 NaN NaN NaN \n", - "11 1m fastai 10 745.9277 0.4392 0.874488 0.695508 \n", - "12 1m bpr 10 5.2941 NaN NaN NaN \n", + "1 100k als 10 4.6377 0.0359 0.967222 0.752874 \n", + "2 100k svd 10 4.0243 0.2475 0.938681 0.742690 \n", + "3 100k sar 10 0.3573 NaN NaN NaN \n", + "4 100k ncf 10 47.5429 NaN NaN NaN \n", + "5 100k fastai 10 70.8712 0.0389 0.943093 0.744332 \n", + "6 100k bpr 10 0.4485 NaN NaN NaN \n", + "7 1m als 10 3.3197 0.0124 0.860612 0.679803 \n", + "8 1m svd 10 41.3965 3.1037 0.883017 0.695366 \n", + "9 1m sar 10 3.4665 NaN NaN NaN \n", + "10 1m ncf 10 676.7188 NaN NaN NaN \n", + "11 1m fastai 10 664.2760 0.3803 0.874491 0.695498 \n", + "12 1m bpr 10 4.9206 NaN NaN NaN \n", "\n", " R2 Explained Variance Recommending time (s) MAP nDCG@k \\\n", - "1 0.247909 0.243499 0.0937 0.004784 0.045017 \n", - "2 0.291967 0.291971 17.1032 0.012873 0.095930 \n", - "3 NaN NaN 0.1062 0.113028 0.388321 \n", - "4 NaN NaN 4.2467 0.102130 0.379410 \n", - "5 0.285308 0.287671 3.3439 0.025503 0.147866 \n", - "6 NaN NaN 1.1675 0.105365 0.389948 \n", - "7 0.412306 0.406364 0.0778 0.002144 0.025975 \n", - "8 0.374910 0.374911 251.1241 0.008828 0.089320 \n", - "9 NaN NaN 2.1282 0.066214 0.313502 \n", - "10 NaN NaN 46.7333 0.063082 0.350428 \n", - "11 0.386927 0.389560 57.9075 0.026265 0.184667 \n", - "12 NaN NaN 20.4489 0.067077 0.353277 \n", + "1 0.252683 0.248257 0.0588 0.004343 0.042099 \n", + "2 0.291967 0.291971 15.7942 0.012873 0.095930 \n", + "3 NaN NaN 0.0892 0.113028 0.388321 \n", + "4 NaN NaN 3.0374 0.106153 0.391953 \n", + "5 0.285296 0.287658 2.9128 0.025590 0.148119 \n", + "6 NaN NaN 1.2334 0.105365 0.389948 \n", + "7 0.412120 0.406228 0.0348 0.001747 0.022182 \n", + "8 0.374910 0.374911 233.4965 0.008828 0.089320 \n", + "9 NaN NaN 2.2988 0.066214 0.313502 \n", + "10 NaN NaN 44.3242 0.065849 0.355279 \n", + "11 0.386921 0.389541 57.5519 0.026166 0.184278 \n", + "12 NaN NaN 23.9635 0.067077 0.353277 \n", "\n", " Precision@k Recall@k \n", - "1 0.047508 0.015420 \n", + "1 0.046023 0.014854 \n", "2 0.091198 0.032783 \n", "3 0.333828 0.183179 \n", - "4 0.331707 0.173080 \n", - "5 0.130329 0.053824 \n", + "4 0.347402 0.182610 \n", + "5 0.130541 0.053903 \n", "6 0.349841 0.181807 \n", - "7 0.032284 0.010219 \n", + "7 0.028458 0.008963 \n", "8 0.082856 0.021582 \n", "9 0.279692 0.111135 \n", - "10 0.322213 0.109614 \n", - "11 0.168561 0.055841 \n", + "10 0.324648 0.111693 \n", + "11 0.167881 0.055688 \n", "12 0.324449 0.118385 " ] }, @@ -855,6 +867,13 @@ "source": [ "df_results" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/reco_utils/recommender/ncf/ncf_singlenode.py b/reco_utils/recommender/ncf/ncf_singlenode.py index 7dc1e20da1..36f464ab66 100644 --- a/reco_utils/recommender/ncf/ncf_singlenode.py +++ b/reco_utils/recommender/ncf/ncf_singlenode.py @@ -55,7 +55,7 @@ def __init__( """ # seed - tf.set_random_seed(seed) + tf.compat.v1.set_random_seed(seed) np.random.seed(seed) self.seed = seed @@ -83,28 +83,28 @@ def __init__( # create ncf model self._create_model() # set GPU use with demand growth - gpu_options = tf.GPUOptions(allow_growth=True) + gpu_options = tf.compat.v1.GPUOptions(allow_growth=True) # set TF Session - self.sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) + self.sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options)) # parameters initialization - self.sess.run(tf.global_variables_initializer()) + self.sess.run(tf.compat.v1.global_variables_initializer()) def _create_model(self,): # reset graph - tf.reset_default_graph() + tf.compat.v1.reset_default_graph() - with tf.variable_scope("input_data", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("input_data", reuse=tf.compat.v1.AUTO_REUSE): # input: index of users, items and ground truth - self.user_input = tf.placeholder(tf.int32, shape=[None, 1]) - self.item_input = tf.placeholder(tf.int32, shape=[None, 1]) - self.labels = tf.placeholder(tf.float32, shape=[None, 1]) + self.user_input = tf.compat.v1.placeholder(tf.int32, shape=[None, 1]) + self.item_input = tf.compat.v1.placeholder(tf.int32, shape=[None, 1]) + self.labels = tf.compat.v1.placeholder(tf.float32, shape=[None, 1]) - with tf.variable_scope("embedding", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("embedding", reuse=tf.compat.v1.AUTO_REUSE): # set embedding table self.embedding_gmf_P = tf.Variable( - tf.truncated_normal( + tf.random.truncated_normal( shape=[self.n_users, self.n_factors], mean=0.0, stddev=0.01, @@ -115,7 +115,7 @@ def _create_model(self,): ) self.embedding_gmf_Q = tf.Variable( - tf.truncated_normal( + tf.random.truncated_normal( shape=[self.n_items, self.n_factors], mean=0.0, stddev=0.01, @@ -127,7 +127,7 @@ def _create_model(self,): # set embedding table self.embedding_mlp_P = tf.Variable( - tf.truncated_normal( + tf.random.truncated_normal( shape=[self.n_users, int(self.layer_sizes[0] / 2)], mean=0.0, stddev=0.01, @@ -138,7 +138,7 @@ def _create_model(self,): ) self.embedding_mlp_Q = tf.Variable( - tf.truncated_normal( + tf.random.truncated_normal( shape=[self.n_items, int(self.layer_sizes[0] / 2)], mean=0.0, stddev=0.01, @@ -148,7 +148,7 @@ def _create_model(self,): dtype=tf.float32, ) - with tf.variable_scope("gmf", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("gmf", reuse=tf.compat.v1.AUTO_REUSE): # get user embedding p and item embedding q self.gmf_p = tf.reduce_sum( @@ -161,7 +161,7 @@ def _create_model(self,): # get gmf vector self.gmf_vector = self.gmf_p * self.gmf_q - with tf.variable_scope("mlp", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("mlp", reuse=tf.compat.v1.AUTO_REUSE): # get user embedding p and item embedding q self.mlp_p = tf.reduce_sum( @@ -188,7 +188,7 @@ def _create_model(self,): # self.output = tf.sigmoid(tf.reduce_sum(self.mlp_vector, axis=1, keepdims=True)) - with tf.variable_scope("ncf", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("ncf", reuse=tf.compat.v1.AUTO_REUSE): if self.model_type == "gmf": # GMF only @@ -231,15 +231,15 @@ def _create_model(self,): ) self.output = tf.sigmoid(output) - with tf.variable_scope("loss", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("loss", reuse=tf.compat.v1.AUTO_REUSE): # set loss function - self.loss = tf.losses.log_loss(self.labels, self.output) + self.loss = tf.compat.v1.losses.log_loss(self.labels, self.output) - with tf.variable_scope("optimizer", reuse=tf.AUTO_REUSE): + with tf.compat.v1.variable_scope("optimizer", reuse=tf.compat.v1.AUTO_REUSE): # set optimizer - self.optimizer = tf.train.AdamOptimizer( + self.optimizer = tf.compat.v1.train.AdamOptimizer( learning_rate=self.learning_rate ).minimize(self.loss) @@ -253,7 +253,7 @@ def save(self, dir_name): # save trained model if not os.path.exists(dir_name): os.makedirs(dir_name) - saver = tf.train.Saver() + saver = tf.compat.v1.train.Saver() saver.save(self.sess, os.path.join(dir_name, MODEL_CHECKPOINT)) def load(self, gmf_dir=None, mlp_dir=None, neumf_dir=None, alpha=0.5): @@ -277,15 +277,15 @@ def load(self, gmf_dir=None, mlp_dir=None, neumf_dir=None, alpha=0.5): # load pre-trained model if self.model_type == "gmf" and gmf_dir is not None: - saver = tf.train.Saver() + saver = tf.compat.v1.train.Saver() saver.restore(self.sess, os.path.join(gmf_dir, MODEL_CHECKPOINT)) elif self.model_type == "mlp" and mlp_dir is not None: - saver = tf.train.Saver() + saver = tf.compat.v1.train.Saver() saver.restore(self.sess, os.path.join(mlp_dir, MODEL_CHECKPOINT)) elif self.model_type == "neumf" and neumf_dir is not None: - saver = tf.train.Saver() + saver = tf.compat.v1.train.Saver() saver.restore(self.sess, os.path.join(neumf_dir, MODEL_CHECKPOINT)) elif self.model_type == "neumf" and gmf_dir is not None and mlp_dir is not None: @@ -300,24 +300,24 @@ def _load_neumf(self, gmf_dir, mlp_dir, alpha): NeuMF model --> load parameters in `gmf_dir` and `mlp_dir` """ # load gmf part - variables = tf.global_variables() + variables = tf.compat.v1.global_variables() # get variables with 'gmf' var_flow_restore = [ val for val in variables if "gmf" in val.name and "ncf" not in val.name ] # load 'gmf' variable - saver = tf.train.Saver(var_flow_restore) + saver = tf.compat.v1.train.Saver(var_flow_restore) # restore saver.restore(self.sess, os.path.join(gmf_dir, MODEL_CHECKPOINT)) # load mlp part - variables = tf.global_variables() + variables = tf.compat.v1.global_variables() # get variables with 'gmf' var_flow_restore = [ val for val in variables if "mlp" in val.name and "ncf" not in val.name ] # load 'gmf' variable - saver = tf.train.Saver(var_flow_restore) + saver = tf.compat.v1.train.Saver(var_flow_restore) # restore saver.restore(self.sess, os.path.join(mlp_dir, MODEL_CHECKPOINT)) diff --git a/tools/databricks_install.py b/tools/databricks_install.py index a7b865f4d1..6f9c47fb4d 100644 --- a/tools/databricks_install.py +++ b/tools/databricks_install.py @@ -48,7 +48,9 @@ "5": "https://search.maven.org/remotecontent?filepath=com/microsoft/azure/azure-cosmosdb-spark_2.4.0_2.11/1.3.5/azure-cosmosdb-spark_2.4.0_2.11-1.3.5-uber.jar", } - +# "coordinates": "Azure:mmlspark:0.17" +# "coordinates": "com.microsoft.ml.spark:mmlspark_2.11:0.18.1" +# "coordinates": "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1" # not working MMLSPARK_INFO = { "maven": { "coordinates": "Azure:mmlspark:0.17", From 4e9263a3b2eaca9841862928d82bbfb97fc9f31a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 23 Jun 2020 11:19:56 +0000 Subject: [PATCH 55/61] retail --- scenarios/retail/README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index ffce41d7f6..600f60b913 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -2,21 +2,23 @@ Recommender systems have become a key growth and revenue driver for modern retail. For example, recommendation was estimated to [account for 35% of customer purchases on Amazon](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers#). In addition, recommenders have been applied by retailers to delight and retain customers and improve staff productivity. +## Scenarios + Next we will describe several most common retail scenarios and main considerations when applying recommendations in retail. -## Personalized recommendation +### Personalized recommendation A major task in applying recommenations in retail is to predict which products or set of products a user is most likely to engage with or purchase, based on the shopping or viewing history of that user. This scenario is commonly shown on the personalized home page, feed or newsletter. Most models in this repo such as [ALS](../../examples/00_quick_start/als_movielens.ipynb), [BPR](../../examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb), [LightGBM](../../examples/00_quick_start/lightgbm_tinycriteo.ipynb) and [NCF](../../examples/00_quick_start/ncf_movielens.ipynb) can be used for personalization. [Azure Personalizer](https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concept-active-learning) also provides a cloud-based personalization service using reinforcement learning based on [Vowpal Wabbit](../../examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb). -## You might also like +### You might also like In this scenario, the user is already viewing a product page, and the task is to make recommendations that are relevant to it. Personalized recommendation techniques are still applicable here, but relevance to the product being viewed is of special importance. As such, item similarity can be useful here, especially for cold items and cold users that do not have much interaction data. -## Frequently bought together +### Frequently bought together In this task, the retailer tries to predict product(s) complementary to or bought together with a product that a user already put in to shopping cart. This feature is great for cross-selling and is normally displayed just before checkout. In many cases, a machine learning solution is not required for this task. -## Similar alternatives +### Similar alternatives This scenario covers down-selling or out of stock alternatives to avoid losing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. @@ -24,7 +26,7 @@ This scenario covers down-selling or out of stock alternatives to avoid losing a Datasets used in retail recommendations usually include [user information](../../GLOSSARY.md), [item information](../../GLOSSARY.md) and [interaction data](../../GLOSSARY.md), among others. -To measure the performance of the recommender, it is common to use [ranking metrics](../GLOSSARY.md). In production, the business metrics used are [CTR](../../GLOSSARY.md) and [revenue per order](../../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../../GLOSSARY.md) is often applied. +To measure the performance of the recommender, it is common to use [ranking metrics](../../GLOSSARY.md). In production, the business metrics used are [CTR](../../GLOSSARY.md) and [revenue per order](../../GLOSSARY.md). To evaluate a model's performance in production in an online manner, [A/B testing](../../GLOSSARY.md) is often applied. ## Other considerations From 16baaed7c5c49c5652541b7360c6ab2ed6c2c096 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 23 Jun 2020 12:00:58 +0000 Subject: [PATCH 56/61] spark 2.4.3 --- .../mmlspark_lightgbm_criteo.ipynb | 43 ++++++++++--------- tools/databricks_install.py | 5 +-- tools/generate_conda_file.py | 4 +- 3 files changed, 25 insertions(+), 27 deletions(-) diff --git a/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb index 53c7d46b96..5b5e39e793 100644 --- a/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb +++ b/examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb @@ -45,16 +45,17 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "MMLSpark version: Azure:mmlspark:0.17\n", - "System version: 3.6.8 |Anaconda, Inc.| (default, Feb 11 2019, 15:03:47) [MSC v.1915 64 bit (AMD64)]\n", - "PySpark version: 2.3.1\n" + "MMLSpark version: com.microsoft.ml.spark:mmlspark_2.11:0.18.1\n", + "System version: 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21) \n", + "[GCC 7.3.0]\n", + "PySpark version: 2.4.3\n" ] } ], @@ -84,8 +85,8 @@ " dbutils = None\n", " print(\"MMLSpark version: {}\".format(MMLSPARK_INFO['maven']['coordinates']))\n", "\n", - "from mmlspark import ComputeModelStatistics\n", - "from mmlspark import LightGBMClassifier\n", + "from mmlspark.train import ComputeModelStatistics\n", + "from mmlspark.lightgbm import LightGBMClassifier\n", "\n", "print(\"System version: {}\".format(sys.version))\n", "print(\"PySpark version: {}\".format(pyspark.version.__version__))" @@ -93,7 +94,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 3, "metadata": { "tags": [ "parameters" @@ -129,14 +130,14 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "8.79MB [00:00, 32.6MB/s] \n" + "100%|██████████| 8.58k/8.58k [00:01<00:00, 5.15kKB/s]\n" ] }, { @@ -253,7 +254,7 @@ "[2 rows x 40 columns]" ] }, - "execution_count": 3, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -276,7 +277,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -285,7 +286,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -295,7 +296,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [], "source": [ @@ -321,7 +322,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ @@ -350,7 +351,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": {}, "outputs": [], "source": [ @@ -359,7 +360,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ @@ -368,7 +369,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 11, "metadata": {}, "outputs": [ { @@ -378,7 +379,7 @@ "+---------------+------------------+\n", "|evaluation_type| AUC|\n", "+---------------+------------------+\n", - "| Classification|0.6870253907336659|\n", + "| Classification|0.6892773832319504|\n", "+---------------+------------------+\n", "\n" ] @@ -460,9 +461,9 @@ "metadata": { "celltoolbar": "Tags", "kernelspec": { - "display_name": "Python (reco_pyspark)", + "display_name": "reco_full", "language": "python", - "name": "reco_pyspark" + "name": "conda-env-reco_full-py" }, "language_info": { "codemirror_mode": { @@ -474,7 +475,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.0" + "version": "3.6.10" } }, "nbformat": 4, diff --git a/tools/databricks_install.py b/tools/databricks_install.py index 6f9c47fb4d..3e0b2f0317 100644 --- a/tools/databricks_install.py +++ b/tools/databricks_install.py @@ -48,12 +48,9 @@ "5": "https://search.maven.org/remotecontent?filepath=com/microsoft/azure/azure-cosmosdb-spark_2.4.0_2.11/1.3.5/azure-cosmosdb-spark_2.4.0_2.11-1.3.5-uber.jar", } -# "coordinates": "Azure:mmlspark:0.17" -# "coordinates": "com.microsoft.ml.spark:mmlspark_2.11:0.18.1" -# "coordinates": "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1" # not working MMLSPARK_INFO = { "maven": { - "coordinates": "Azure:mmlspark:0.17", + "coordinates": "com.microsoft.ml.spark:mmlspark_2.11:0.18.1", "repo": "https://mvnrepository.com/artifact", } } diff --git a/tools/generate_conda_file.py b/tools/generate_conda_file.py index 1aba5abc1b..c2481bdeaf 100644 --- a/tools/generate_conda_file.py +++ b/tools/generate_conda_file.py @@ -61,7 +61,7 @@ "tqdm": "tqdm>=4.31.1", } -CONDA_PYSPARK = {"pyarrow": "pyarrow>=0.8.0", "pyspark": "pyspark==2.3.1"} +CONDA_PYSPARK = {"pyarrow": "pyarrow>=0.8.0", "pyspark": "pyspark==2.4.3"} CONDA_GPU = { "fastai": "fastai==1.0.46", @@ -133,7 +133,7 @@ "PySpark version input must be valid numeric format (e.g. --pyspark-version=2.3.1)" ) else: - args.pyspark_version = "2.3.1" + args.pyspark_version = "2.4.3" # set name for environment and output yaml file conda_env = "reco_base" From fd1eb0bfe5c5d4075d0429041397c4d2315927f6 Mon Sep 17 00:00:00 2001 From: Andreas Argyriou Date: Thu, 25 Jun 2020 14:15:45 +0100 Subject: [PATCH 57/61] Update README.md --- scenarios/retail/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scenarios/retail/README.md b/scenarios/retail/README.md index 600f60b913..1208adb47b 100644 --- a/scenarios/retail/README.md +++ b/scenarios/retail/README.md @@ -20,7 +20,7 @@ In this task, the retailer tries to predict product(s) complementary to or bough ### Similar alternatives -This scenario covers down-selling or out of stock alternatives to avoid losing a sale. Similar alternatives predicts other products with similar features, like price, type, brand or visual appearance. +This scenario covers down-selling or out of stock alternatives to avoid losing a sale. Similar alternatives predict other products with similar features, like price, type, brand or visual appearance. ## Data and evaluation From d4a524411ba3e3f9f852d122c4afd32646b5ea4e Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 25 Jun 2020 15:07:47 +0100 Subject: [PATCH 58/61] fix :bug: in readme --- examples/README.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/examples/README.md b/examples/README.md index 5b94fb3d0e..32c808f364 100644 --- a/examples/README.md +++ b/examples/README.md @@ -13,12 +13,15 @@ The following summarizes each directory of the best practice notebooks. | Directory | Runs Local | Description | | --- | --- | --- | -| [00_quick_start](./00_quick_start)| Yes | Quick start notebooks that demonstrate workflow of developing a recommender by using an algorithm in local environment| -| [01_prepare_data](./01_prepare_data) | Yes | Data preparation notebooks for each recommender algorithm| -| [02_model](./02_model) | Yes | Deep dive notebooks about model building by using various classical and deep learning recommender algorithms| -| [03_evaluate](./03_evaluate) | Yes | Notebooks that introduce different evaluation methods for recommenders| -| [04_model_select_and_optimize](04_model_select_and_optimize) | Some local, some on Azure | Best practice notebooks for model tuning and selecting by using Azure Machine Learning Service and/or open source technologies| -| [05_operationalize](05_operationalize) | No, Run on Azure | Operationalization notebooks that illustrate an end-to-end pipeline by using a recommender algorithm for a certain real-world use case scenario| +| [00_quick_start](00_quick_start)| Yes | Quick start notebooks that demonstrate workflow of developing a recommender by using an algorithm in local environment| +| [01_prepare_data](01_prepare_data) | Yes | Data preparation notebooks for each recommender algorithm| +| [02_model_collaborative_filtering](02_model_collaborative_filtering) | Yes | Deep dive notebooks about model training and evaluation using collaborative filtering algorithms | +| [02_model_content_based_filtering](02_model_content_based_filtering) | Yes |Deep dive notebooks about model training and evaluation using content-based filtering algorithms | +| [02_model_hybrid](02_model_hybrid) | Yes | Deep dive notebooks about model training and evaluation using hybrid algorithms | +| [03_evaluate](03_evaluate) | Yes | Notebooks that introduce different evaluation methods for recommenders | +| [04_model_select_and_optimize](04_model_select_and_optimize) | Some local, some on Azure | Best practice notebooks for model tuning and selecting by using Azure Machine Learning Service and/or open source technologies | +| [05_operationalize](05_operationalize) | No, Run on Azure | Operationalization notebooks that illustrate an end-to-end pipeline by using a recommender algorithm for a certain real-world use case scenario | +| [06_benchmarks](06_benchmarks) | Yes | Benchmark comparison of several recommender algorithms | ## On-premise notebooks @@ -57,11 +60,13 @@ those will be provided in the notebooks. ### Submit an existing notebook to Azure Machine Learning The [run_notebook_on_azureml](./run_notebook_on_azureml.ipynb) notebook provides a scaffold to directly submit an existing notebook to AzureML compute targets. After setting up a compute target and creating a run configuration, simply replace the notebook file name and submit the notebook directly. + ```python cfg = NotebookRunConfig(source_directory='../', - notebook='notebooks/00_quick_start/' + NOTEBOOK_NAME, + notebook='examples/00_quick_start/' + NOTEBOOK_NAME, output_notebook='outputs/out.ipynb', parameters={"MOVIELENS_DATA_SIZE": "100k", "TOP_K": 10}, run_config=run_config) ``` + All metrics and parameters logged with `pm.record` will be stored on the run as tracked metrics. The initial notebook that was submitted, will be stored as an output notebook ```out.ipynb``` in the outputs tab of the Azure Portal. From 845964ac4ed5d135681f050450606183d7b3c23b Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 25 Jun 2020 15:20:13 +0100 Subject: [PATCH 59/61] readms --- examples/02_model_collaborative_filtering/README.md | 13 +++---------- examples/02_model_content_based_filtering/README.md | 10 ++++++++++ examples/02_model_hybrid/README.md | 11 +++++++++++ 3 files changed, 24 insertions(+), 10 deletions(-) create mode 100644 examples/02_model_content_based_filtering/README.md create mode 100644 examples/02_model_hybrid/README.md diff --git a/examples/02_model_collaborative_filtering/README.md b/examples/02_model_collaborative_filtering/README.md index b527e56395..620682c5b9 100644 --- a/examples/02_model_collaborative_filtering/README.md +++ b/examples/02_model_collaborative_filtering/README.md @@ -1,22 +1,15 @@ -# Model +# Deep dive in collaborative filtering algorithms -In this directory, notebooks are provided to give a deep dive into training models using different algorithms such as - Alternating Least Squares ([ALS](https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS)) and Singular Value Decomposition (SVD) using [Surprise](http://surpriselib.com/) python package. The notebooks make use of the utility functions ([reco_utils](../../reco_utils)) - available in the repo. +In this directory, notebooks are provided to give a deep dive of collaborative filtering recommendation algorithms. The notebooks make use of the utility functions ([reco_utils](../../reco_utils)) available in the repo. | Notebook | Environment | Description | | --- | --- | --- | | [als_deep_dive](als_deep_dive.ipynb) | PySpark | Deep dive on the ALS algorithm and implementation. | [baseline_deep_dive](baseline_deep_dive.ipynb) | --- | Deep dive on baseline performance estimation. | [cornac_bpr_deep_dive](cornac_bpr_deep_dive.ipynb) | Python CPU | Deep dive on the BPR algorithm and implementation. -| [fm_deep_dive](fm_deep_dive.ipynb) | Python CPU | Deep dive into factorization machine (FM) and field-aware FM (FFM) algorithm. -| [lightfm_deep_dive](lightfm_deep_dive.ipynb) | Python CPU | Deep dive into hybrid matrix factorisation model with LightFM. -| [mmlspark_lightgbm_criteo](mmlspark_lightgbm_criteo.ipynb) | PySpark | LightGBM gradient boosting tree algorithm implementation in MML Spark with Criteo dataset. -| [ncf_deep_dive](ncf_deep_dive.ipynb) | Python CPU, GPU | Deep dive on a NCF algorithm and implementation. +| [lightgcn_deep_dive](lightgcn_deep_dive.ipynb) | Python CPU, GPU | Deep dive on a LightGCN algorithm and implementation. | [rbm_deep_dive](rbm_deep_dive.ipynb)| Python CPU, GPU | Deep dive on the rbm algorithm and its implementation. | [sar_deep_dive](sar_deep_dive.ipynb) | Python CPU | Deep dive on the SAR algorithm and implementation. | [surprise_svd_deep_dive](surprise_svd_deep_dive.ipynb) | Python CPU | Deep dive on a SVD algorithm and implementation. -| [vowpal_wabbit_deep_dive](vowpal_wabbit_deep_dive.ipynb) | Python CPU | Deep dive into using Vowpal Wabbit for regression and matrix factorization. -| [lightgcn_deep_dive](lightgcn_deep_dive.ipynb) | Python CPU, GPU | Deep dive on a LightGCN algorithm and implementation. Details on model training are best found inside each notebook. diff --git a/examples/02_model_content_based_filtering/README.md b/examples/02_model_content_based_filtering/README.md new file mode 100644 index 0000000000..67b69fdbb5 --- /dev/null +++ b/examples/02_model_content_based_filtering/README.md @@ -0,0 +1,10 @@ +# Deep dive in content-based filtering algorithms + +In this directory, notebooks are provided to give a deep dive of content-based filtering recommendation algorithms. The notebooks make use of the utility functions ([reco_utils](../../reco_utils)) available in the repo. + +| Notebook | Environment | Description | +| --- | --- | --- | +| [mmlspark_lightgbm_criteo](mmlspark_lightgbm_criteo.ipynb) | PySpark | LightGBM gradient boosting tree algorithm implementation in MML Spark with Criteo dataset. +| [vowpal_wabbit_deep_dive](vowpal_wabbit_deep_dive.ipynb) | Python CPU | Deep dive into using Vowpal Wabbit for regression and matrix factorization. + +Details on model training are best found inside each notebook. diff --git a/examples/02_model_hybrid/README.md b/examples/02_model_hybrid/README.md new file mode 100644 index 0000000000..836db7d734 --- /dev/null +++ b/examples/02_model_hybrid/README.md @@ -0,0 +1,11 @@ +# Deep dive in hybrid algorithms + +In this directory, notebooks are provided to give a deep dive of hybrid recommendation algorithms. The notebooks make use of the utility functions ([reco_utils](../../reco_utils)) available in the repo. + +| Notebook | Environment | Description | +| --- | --- | --- | +| [fm_deep_dive](fm_deep_dive.ipynb) | Python CPU | Deep dive into factorization machine (FM) and field-aware FM (FFM) algorithm. +| [lightfm_deep_dive](lightfm_deep_dive.ipynb) | Python CPU | Deep dive into hybrid matrix factorisation model with LightFM. +| [ncf_deep_dive](ncf_deep_dive.ipynb) | Python CPU, GPU | Deep dive on a NCF algorithm and implementation. + +Details on model training are best found inside each notebook. From d5ae933f762f777a1d1542e496b44dd05f9c1932 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Thu, 25 Jun 2020 15:27:23 +0100 Subject: [PATCH 60/61] update authors --- AUTHORS.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/AUTHORS.md b/AUTHORS.md index dcdad68ec1..f6a91e2955 100644 --- a/AUTHORS.md +++ b/AUTHORS.md @@ -12,8 +12,10 @@ They have admin access to the repo and provide support reviewing issues and pull * Reco utils metrics computations * Tests for Surprise * Model selection notebooks (AzureML for SVD, NNI) -* **[Jeremy Reynolds](https://github.com/jreynolds01)** - * Reference architecture +* **[Jianxun Lian](https://github.com/Leavingseason)** + * xDeepFM algorithm + * DKN algorithm + * Review, development and optimization of MSRA algorithms. * **[Jun Ki Min](https://github.com/loomlike)** * ALS notebook * Wide & Deep algorithm @@ -27,10 +29,6 @@ They have admin access to the repo and provide support reviewing issues and pull * Reco utils review, development and optimization. * Github statistics. * Continuous integration build / test setup. -* **[Nikhil Joglekar](https://github.com/nikhilrj)** - * Improving documentation - * Quick start notebook - * Operationalization notebook * **[Scott Graham](https://github.com/gramhagen)** * Improving documentation * VW notebook @@ -61,9 +59,8 @@ To contributors: please add your name to the list when you submit a patch to the * Spark optimization and support * **[Heather Spetalnick (Shapiro)](https://github.com/heatherbshapiro)** * AzureML documentation and support -* **[Jianxun Lian](https://github.com/Leavingseason)** - * xDeepFM algorithm - * DKN algorithm +* **[Jeremy Reynolds](https://github.com/jreynolds01)** + * Reference architecture * **[Markus Cozowicz](https://github.com/eisber)** * SAR improvements on Spark * **[Max Kaznady](https://github.com/maxkazmsft)** @@ -76,6 +73,10 @@ To contributors: please add your name to the list when you submit a patch to the * Restricted Boltzmann Machine algorithm * **[Nicolas Hug](https://github.com/NicolasHug)** * Jupyter notebook demonstrating the use of [Surprise](https://github.com/NicolasHug/Surprise) library for recommendations +* **[Nikhil Joglekar](https://github.com/nikhilrj)** + * Improving documentation + * Quick start notebook + * Operationalization notebook * **[Pratik Jawanpuria](https://github.com/pratikjawanpuria)** * RLRMC algorithm * **[Qi Wan](https://github.com/Qcactus)** From 930427f4a4ed25f01c61cdd7c7897ddcb94caea8 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 29 Jun 2020 13:14:51 +0100 Subject: [PATCH 61/61] :bug: --- examples/00_quick_start/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/examples/00_quick_start/README.md b/examples/00_quick_start/README.md index b582d2a174..48c6503cd8 100644 --- a/examples/00_quick_start/README.md +++ b/examples/00_quick_start/README.md @@ -19,8 +19,8 @@ In this directory, notebooks are provided to perform a quick demonstration of di | [sar_azureml](sar_movielens_with_azureml.ipynb)| MovieLens | Python CPU | An example of how to utilize and evaluate SAR using the [Azure Machine Learning service](https://docs.microsoft.com/azure/machine-learning/service/overview-what-is-azure-ml) (AzureML). It takes the content of the [sar quickstart notebook](sar_movielens.ipynb) and demonstrates how to use the power of the cloud to manage data, switch to powerful GPU machines, and monitor runs while training a model. | [a2svd](sequential_recsys_amazondataset.ipynb) | Amazon | Python CPU, GPU | Use A2SVD [11] to predict a set of movies the user is going to interact in a short time. | | [caser](sequential_recsys_amazondataset.ipynb) | Amazon | Python CPU, GPU | Use Caser [12] to predict a set of movies the user is going to interact in a short time. | -| [nextitnet](sequential_recsys_amazondataset.ipynb) | Amazon | Python CPU, GPU | Use NextItNet [12] to predict a set of movies the user is going to interact in a short time. | | [gru4rec](sequential_recsys_amazondataset.ipynb) | Amazon | Python CPU, GPU | Use GRU4Rec [13] to predict a set of movies the user is going to interact in a short time. | +| [nextitnet](sequential_recsys_amazondataset.ipynb) | Amazon | Python CPU, GPU | Use NextItNet [14] to predict a set of movies the user is going to interact in a short time. | | [sli-rec](sequential_recsys_amazondataset.ipynb) | Amazon | Python CPU, GPU | Use SLi-Rec [11] to predict a set of movies the user is going to interact in a short time. | | [wide-and-deep](wide_deep_movielens.ipynb) | MovieLens | Python CPU, GPU | Utilizing Wide-and-Deep Model (Wide-and-Deep) [5] to predict movie ratings in a Python+GPU (TensorFlow) environment. | [xdeepfm](xdeepfm_criteo.ipynb) | Criteo, Synthetic Data | Python CPU, GPU | Utilizing the eXtreme Deep Factorization Machine (xDeepFM) [3] to learn both low and high order feature interactions for predicting CTR, in a Python+GPU (TensorFlow) environment. @@ -35,6 +35,7 @@ In this directory, notebooks are provided to perform a quick demonstration of di [8] _NRMS: Neural News Recommendation with Multi-Head Self-Attention_, Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang, Xing Xie. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).
[9] _LSTUR: Neural News Recommendation with Long- and Short-term User Representations_, Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu and Xing Xie. ACL 2019.
[10] _NPA: Neural News Recommendation with Personalized Attention_, Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie. KDD 2019, ADS track.
-[11] _Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation_, Z. Yu, J. Lian, A. Mahmoody, G. Liu and X. Xie, IJCAI 2019.
-[12] _Personalized top-n sequential recommendation via convolutional sequence embedding_, J. Tang and K. Wang, ACM WSDM 2018.
-[13] _Session-based Recommendations with Recurrent Neural Networks_, B. Hidasi, A. Karatzoglou, L. Baltrunas and D. Tikk, ICLR 2016.
+[11] _Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation_, Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu and Xing Xie, IJCAI 2019.
+[12] _Personalized top-n sequential recommendation via convolutional sequence embedding_, Jiaxi Tang and Ke Wang, ACM WSDM 2018.
+[13] _Session-based Recommendations with Recurrent Neural Networks_, Balazs Hidasi, Alexandros Karatzoglou, Linas Baltrunas and Domonkos Tikk, ICLR 2016.
+[14] _A Simple Convolutional Generative Network for Next Item Recommendation_, Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose and Xiangnan He, WSDM 2019.