Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SSL Error #4825

Merged
merged 3 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions python/cugraph/cugraph/datasets/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import cudf
import dask_cudf
import yaml
import os
import pandas as pd
import cugraph.dask as dcg
import yaml
from pathlib import Path
import urllib.request
from urllib.requests import urlretrieve

import cudf
import cugraph.dask as dcg
import dask_cudf
from cugraph.structure.graph_classes import Graph
from cugraph.utilities import install_ssl_cert


class DefaultDownloadDir:
Expand Down Expand Up @@ -142,7 +144,8 @@ def __download_csv(self, url):
filename = self.metadata["name"] + self.metadata["file_type"]
if self._dl_path.path.is_dir():
self._path = self._dl_path.path / filename
urllib.request.urlretrieve(url, str(self._path))
install_ssl_cert()
urlretrieve(url, str(self._path))

else:
raise RuntimeError(
Expand Down Expand Up @@ -458,7 +461,8 @@ def download_all(force=False):
filename = meta["name"] + meta["file_type"]
save_to = default_download_dir.path / filename
if not save_to.is_file() or force:
urllib.request.urlretrieve(meta["url"], str(save_to))
install_ssl_cert()
urlretrieve(meta["url"], str(save_to))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up needing to push another change, I recommend reverting these import changes, which just look like stylistic changes to me (please correct me).

That'd make this whole file drop out of the diff, and make us even more confident that this is safe to merge this late in the release cycle.

But the changes look fine to me so don't push another commit and go through another round of CI just to revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't believe I have any more changes to push as of now. The PR is just about to pass CI so let me know if you think it'll be better to still revert the changes to have as minimal of a diff has possible. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those finding this from search... we talked offline and agreed that given the time-sensitivity of this, it wasn't worth another round of CI to revert this change.



def set_download_dir(path):
Expand Down
5 changes: 2 additions & 3 deletions python/cugraph/cugraph/testing/resultset.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,14 @@

import warnings
import tarfile

import urllib.request

import cudf
from cugraph.datasets.dataset import (
DefaultDownloadDir,
default_download_dir,
)

# results_dir_path = utils.RAPIDS_DATASET_ROOT_DIR_PATH / "tests" / "resultsets"
from cugraph.utilities import install_ssl_cert


class Resultset:
Expand Down Expand Up @@ -107,6 +105,7 @@ def load_resultset(resultset_name, resultset_download_url):
if not curr_resultset_download_dir.exists():
curr_resultset_download_dir.mkdir(parents=True, exist_ok=True)
if not compressed_file_path.exists():
install_ssl_cert()
urllib.request.urlretrieve(resultset_download_url, compressed_file_path)
tar = tarfile.open(str(compressed_file_path), "r:gz")
# TODO: pass filter="fully_trusted" when minimum supported Python version >=3.12
Expand Down
3 changes: 2 additions & 1 deletion python/cugraph/cugraph/utilities/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2022, NVIDIA CORPORATION.
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Expand All @@ -18,6 +18,7 @@
from cugraph.utilities.nx_factory import df_edge_score_to_dictionary
from cugraph.utilities.nx_factory import cugraph_to_nx
from cugraph.utilities.utils import (
install_ssl_cert,
import_optional,
ensure_cugraph_obj,
ensure_cugraph_obj_for_nx,
Expand Down
14 changes: 14 additions & 0 deletions python/cugraph/cugraph/utilities/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@

from warnings import warn

import certifi
from ssl import create_default_context
from urllib.request import build_opener, HTTPSHandler, install_opener

# optional dependencies
try:
import cupy as cp
Expand Down Expand Up @@ -549,3 +553,13 @@ def create_directory_with_overwrite(directory):
if os.path.exists(directory):
shutil.rmtree(directory)
os.makedirs(directory)


def install_ssl_cert():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this here in cugraph.utilities means that now certifi needs to be introduced as a runtime dependency of cugraph.

ModuleNotFoundError: No module named 'certifi'

(build link)

I'm really nervous about the idea of introducing a new runtime dependency this long after code freeze and this close to the release.

And even just in general... does this really need to be in the cugraph package? Are those code paths in datasets.py intended to be used by downstream users, or are they just in the cugraph package for convenience in its own testing (and testing for cugraph-gnn libraries)?

Here in 24.12, to minimize risk, I think we should only run this at test time in CI here.

That'd probably mean:

  • make certifi a test-only dependency
  • put this bit of Python code in a script like ci/install-certifi-certs.py
  • run that in each ci/test_* script, before any tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is not a good time to add an additional runtime dependency..

The codepaths in datasets.py is used both by end users and for our own testing purposes. You can think of them as analogous to NetworkX's built-in graphs

I don't think that the SSL issue would affect our users. I think this is just affecting CI because of how our images are setup. It would probably be fine to go with the solution you suggested and install it for our tests. I'll go ahead and add that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it, thanks for the explanation! Then yeah, I think doing this only as a testing dependency is a good solution.

I tested whether this could just be run outside of of pytest, like I suggested... and you're absolutely right, it can't. I guess something in this solution must also modify os.environ or similar.

code I used to confirm that (click me)
docker run \
    --rm \
    -it rapidsai/citestwheel:cuda11.8.0-rockylinux8-py3.10 \
    bash

python -c 'import urllib.request; urllib.request.urlretrieve("https://data.rapids.ai/cugraph/results/resultsets.tar.gz", "foo.tar.gz")'
# urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED

cat > ./install-ca-certs.py <<EOF
import certifi
from ssl import create_default_context
from urllib.request import build_opener, HTTPSHandler, install_opener

ssl_context = create_default_context(cafile=certifi.where())
https_handler = HTTPSHandler(context=ssl_context)
install_opener(build_opener(https_handler))
EOF

python -m pip install certifi
python ./install-ca-certs.py

python -c 'import urllib.request; urllib.request.urlretrieve("https://data.rapids.ai/cugraph/results/resultsets.tar.gz", "foo.tar.gz")'
# urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED

Given that, I think the way you've set this up as a testing-only dependency is a good solution. Thanks for working through it with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for helping track this down and brainstorming a solution with me.

"""
Build and install an opener with the custom HTTPS handler. Use this when
downloading datasets to have the proper SSL certificate.
"""
ssl_context = create_default_context(cafile=certifi.where())
https_handler = HTTPSHandler(context=ssl_context)
install_opener(build_opener(https_handler))
Loading