Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] warning message dumps large amount of data to stdout #537

Open
s4ke opened this issue Jun 20, 2024 · 6 comments
Open

[BUG] warning message dumps large amount of data to stdout #537

s4ke opened this issue Jun 20, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@s4ke
Copy link

s4ke commented Jun 20, 2024

Description

We just ran into the case where the warnings logged out huge chunks of actual data into stdout. While I understand where this comes from, this is a bad idea because this can leak PII data into stdout/stderr by accident. Plus: The old behaviour was "just working as intended" for our usecase.

In our case, this was caused by this class: https://github.com/neuroforgede/nfcompose/blob/1ad30313e1bdbdb7c3d8e35fd74f905924e2003e/client/compose_client/library/models/definition/datapoint.py#L32

Note that we are using dataclass_json in serialization, but have extra code preventing the non primitive data reaching the actual serialization into json.

Example log (from python 3.8):

/home/<....>/integration/function_layout/compose/venv/lib/python3.8/site-packages/dataclasses_json/core.py:342: UserWarning: Failed to decode {'data': {'season': '', 'branchCombination': '',

Code snippet that reproduces the issue

# This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. 
# If a copy of the MPL was not distributed with this file, 
# You can obtain one at https://mozilla.org/MPL/2.0/.
# This file is part of NF Compose
# [2019] - [2024] © NeuroForge GmbH & Co. KG

from dataclasses import dataclass, field
from typing import Any, Dict, Union, BinaryIO

from dataclasses_json import dataclass_json, Undefined

from compose_client.library.models.definition.data_series_definition import DataSeriesDefinition
from compose_client.library.models.identifiable import Identifiable
from compose_client.library.models.raw.datapoint import RawDataPoint


# explicitly no dataclass so we dont forget to implement a proper serializer
class FileTypeContent:
    url: str

    def __init__(self, url: str):
        self.url = url


Primitive = Union[str, float, int, bool]


@dataclass_json(undefined=Undefined.EXCLUDE)
@dataclass
class DataPoint(Identifiable):
    external_id: str
    payload: Dict[str, Union[Primitive, FileTypeContent, BinaryIO]]
    identify_dimensions_by_external_id: bool = field(default=True)

    @staticmethod
    def from_raw(raw: RawDataPoint, definition: DataSeriesDefinition) -> 'DataPoint':
        payload = raw.payload.copy()
        for file_like_fact in [*definition.structure.file_facts, *definition.structure.image_facts]:
            if file_like_fact.external_id in payload:
                payload[file_like_fact.external_id] = FileTypeContent(url=payload[file_like_fact.external_id])
        return DataPoint(
            external_id=raw.external_id,
            payload=payload
        )

    def to_dict(self) -> Any: ...

    @staticmethod
    def from_dict(dict: Dict[str, Any]) -> 'DataPoint': ...

Describe the results you expected

The warning should not log out the data unprompted. This is a data security issue.

Python version you are using

Python 3.10.12

Environment description

certifi==2024.6.2
charset-normalizer==3.3.2
click==8.1.7
compose_client @ https://github.com/neuroforgede/nfcompose/releases/download/2.2.1/compose_client-2.2.1.tar.gz#sha256=08b5d99570e34734b1c5938c26cd57b456282443961d19a69754a096b8f8b14d
dataclasses-json==0.6.7
idna==3.7
marshmallow==3.21.3
mypy-extensions==1.0.0
packaging==24.1
requests==2.32.3
typing-inspect==0.9.0
typing_extensions==4.12.2
urllib3==2.2.2

@s4ke s4ke added the bug Something isn't working label Jun 20, 2024
@s4ke
Copy link
Author

s4ke commented Jun 20, 2024

I think we should improve the code on our end. I am just pointing out that maybe data should not be logged out to stdout by default.

@george-zubrienko
Copy link
Collaborator

@DavidCain do you think it will be possible to turn warnings off with a config flag?

@DavidCain
Copy link
Contributor

@george-zubrienko - oh absolutely, one could definitely make a config flag if desired. However, filterwarnings can also be used right now to silence warnings on dataclasses-json==0.6.7 (perhaps that's helpful to you, @s4ke ?)

I think @s4ke makes a solid point though about having sensible logging as the default -- it's probably not wise to log the entirety of the payload when the warning message simply needs to say that the Union wasn't properly handled (the traceback can be used for more information).

Just to be clear, though, I believe this is a different warning than the one I recently added. The warning message in this bug report comes from:

f"Failed to decode {value} Union dataclasses."
f"Expected Union to include a matching dataclass and it didn't."

The warning message I wrote should only ever mention the type annotation, and I would hope that PII or other sensitive information isn't in a type annotation.

@DavidCain
Copy link
Contributor

tl;dr: I imagine you could solve this concern with a smaller warning message:

                    warnings.warn(
-                        f"Failed to decode {value} Union dataclasses."
+                        "Failed to decode Union dataclasses. "
-                        f"Expected Union to include a matching dataclass and it didn't."
+                        "Expected Union to include a matching dataclass and it didn't."
                    )

@DavidCain
Copy link
Contributor

@s4ke you'd originally commented on my PR:

We just ran into the case where according to this PR buggy behaviour logged out huge chunks of actual data into stdout. While I understand where this comes from, this is a bad idea because this can leak PII data into stdout by accident. Plus: The old behaviour was "just working as intended" for our usecase.

I assume you deleted the comment because the warning is actually coming from a different part of the codebase? (Let me know if you'd prefer me not repeat your comments here, I can edit to delete).

@s4ke
Copy link
Author

s4ke commented Jun 21, 2024

Its fine. The comment didnt make sense after all. After some digging I found that the issue was not introduced by new code but rather already existed for a while. Turns out the Code path that caused this issue was not really known yet.

This was an error on my part. Sorry about that. I deleted my message because it was nonsense and completely in the wrong place. And to be fair, the tone was out of place as well.

@s4ke s4ke closed this as completed Jun 21, 2024
@s4ke s4ke reopened this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants