Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer underlying dataset issues #3971

Open
datajoely opened this issue Jun 28, 2024 · 6 comments
Open

Clearer underlying dataset issues #3971

datajoely opened this issue Jun 28, 2024 · 6 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@datajoely
Copy link
Contributor

Description

A user reported that Kedro was unable to read the CSV, they get the following logs in AWS:
image

The "No columns to parse from file" is being thrown by the underlying pandas implementation in this file

It would be helpful if Kedro could bubble up that the error is thrown in pandas.io.parsers.python_parser so that it is clear where the issue lies. The error above, mentions kedro.io.core.DatasetError is it not possible to do the same?

@datajoely datajoely added the Issue: Feature Request New feature or improvement to existing feature label Jun 28, 2024
@astrojuanlu
Copy link
Member

It is unclear why those logs don't show tracebacks.

Anyway, the current implementation of AbstractDataset is responsible for that DatasetError:

kedro/kedro/io/core.py

Lines 192 to 202 in adfc593

try:
return self._load()
except DatasetError:
raise
except Exception as exc:
# This exception handling is by design as the composed data sets
# can throw any type of exception.
message = (
f"Failed while loading data from data set {str(self)}.\n{str(exc)}"
)
raise DatasetError(message) from exc

@datajoely
Copy link
Contributor Author

They must be in the exc object somewhere, I refuse to believe otherwise

@ElenaKhaustova
Copy link
Contributor

Thank you @datajoely! Could you please provide some more context on what AWS service was used to run kedro pipeline? We would like to check if the service is filtering the error messages as it seems like we always showcase the entire error log.

@datajoely
Copy link
Contributor Author

I've asked the user to comment here to double check, but I think it was:

Docker image running on AWS ECS

@astrojuanlu
Copy link
Member

First, I amend my comment above: the traceback is there (File /usr/local/...).

The problem of AbstractDataset hiding the real error has been mentioned in other places (#1936 (comment), #2199 (comment)) although I don't think we have an issue for it (@ElenaKhaustova?). If that's the case, maybe we can keep this issue open?

@astrojuanlu
Copy link
Member

In #2943 we partly addressed the issue of unclear errors with datasets. Yet we have a bit more evidence about this still being a problem.

For example: https://kedro.hall.community/running-kedroviz-on-docker-without-installing-the-library-H0d61LTldx29#bae33c48-aa82-447b-82e7-80486a95ecef

The user was getting

Class 'projx.models.audio.io.LargeModel' not found, is this a typo?

but the actual underlying error was:

>>> from projx.models.audio.io import LargeModel
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/src/projx/models/audio/__init__.py", line 1, in <module>
    from .base import LAM
  File "/app/src/projx/models/audio/base.py", line 1, in <module>
    from elevenlabs.client import ElevenLabs
ModuleNotFoundError: No module named 'elevenlabs'

Another internal user reported this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: No status
Development

No branches or pull requests

3 participants