Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DKN quick start notebook and deep dive #1165

Merged
merged 35 commits into from
Jul 31, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
6456ae0
dkn
miguelgfierro Jul 24, 2020
4017096
refact dkn
miguelgfierro Jul 27, 2020
88d5245
dkn quick
miguelgfierro Jul 27, 2020
1a7a0a8
dkn deep dive
miguelgfierro Jul 27, 2020
8c4fd45
dkn
miguelgfierro Jul 27, 2020
370564d
split PRs
miguelgfierro Jul 27, 2020
12aa0ab
tests
miguelgfierro Jul 27, 2020
4886398
:bug:
miguelgfierro Jul 27, 2020
f6af6c6
unit test working
miguelgfierro Jul 27, 2020
245561c
Merge branch 'staging' into dkn_fix
miguelgfierro Jul 27, 2020
5a8c9da
:bug:
miguelgfierro Jul 27, 2020
1c0d2f1
dkn deep dive
miguelgfierro Jul 27, 2020
698a991
mind
miguelgfierro Jul 27, 2020
63e0533
smoke
miguelgfierro Jul 27, 2020
adb9f37
integration
miguelgfierro Jul 27, 2020
5c32e42
mind
miguelgfierro Jul 27, 2020
b5f619e
mind
miguelgfierro Jul 27, 2020
9c31e90
wip
miguelgfierro Jul 27, 2020
cb49110
embeddings generation
miguelgfierro Jul 27, 2020
64ed25f
wip
miguelgfierro Jul 27, 2020
c46f412
wip
miguelgfierro Jul 27, 2020
38728a5
wip
miguelgfierro Jul 27, 2020
14e4272
wip
miguelgfierro Jul 27, 2020
a6e2ec3
wip
miguelgfierro Jul 27, 2020
c0e018e
training
miguelgfierro Jul 27, 2020
5c7afc2
training
miguelgfierro Jul 28, 2020
6a32582
metrics
miguelgfierro Jul 28, 2020
c02f237
Merge pull request #1166 from microsoft/dkn_deep_dive
miguelgfierro Jul 28, 2020
adf0f0e
tempdir
miguelgfierro Jul 29, 2020
d3900e1
rerun
miguelgfierro Jul 31, 2020
d2157dd
rerun
miguelgfierro Jul 31, 2020
9fc4440
:memo:
miguelgfierro Jul 31, 2020
d24fd9b
:memo:
miguelgfierro Jul 31, 2020
168b15a
:memo:
miguelgfierro Jul 31, 2020
b5a81d2
:memo:
miguelgfierro Jul 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
407 changes: 121 additions & 286 deletions examples/00_quick_start/dkn_MIND.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions examples/02_model_content_based_filtering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ In this directory, notebooks are provided to give a deep dive of content-based f

| Notebook | Environment | Description |
| --- | --- | --- |
| [dkn_deep_dive](dkn_deep_dive.ipynb) | Python GPU | Deep dive into DKN algorithm for news recommendation. |
| [mmlspark_lightgbm_criteo](mmlspark_lightgbm_criteo.ipynb) | PySpark | LightGBM gradient boosting tree algorithm implementation in MML Spark with Criteo dataset.
| [vowpal_wabbit_deep_dive](vowpal_wabbit_deep_dive.ipynb) | Python CPU | Deep dive into using Vowpal Wabbit for regression and matrix factorization.

Expand Down
641 changes: 641 additions & 0 deletions examples/02_model_content_based_filtering/dkn_deep_dive.ipynb

Large diffs are not rendered by default.

18 changes: 17 additions & 1 deletion reco_utils/dataset/download_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import logging
import requests
import math
import zipfile
from contextlib import contextmanager
from tempfile import TemporaryDirectory
from tqdm import tqdm
Expand Down Expand Up @@ -44,7 +45,7 @@ def maybe_download(url, filename=None, work_directory=".", expected_bytes=None):
):
file.write(data)
else:
log.debug("File {} already downloaded".format(filepath))
log.info("File {} already downloaded".format(filepath))
if expected_bytes is not None:
statinfo = os.stat(filepath)
if statinfo.st_size != expected_bytes:
Expand Down Expand Up @@ -79,3 +80,18 @@ def download_path(path=None):
else:
path = os.path.realpath(path)
yield path


def unzip_file(zip_src, dst_dir, clean_zip_file=True):
"""Unzip a file

Args:
zip_src (str): Zip file.
dst_dir (str): Destination folder.
clean_zip_file (bool): Whether or not to clean the zip file.
"""
fz = zipfile.ZipFile(zip_src, "r")
for file in fz.namelist():
fz.extract(file, dst_dir)
if clean_zip_file:
os.remove(zip_src)
Loading