Clearer underlying dataset issues #3971

datajoely · 2024-06-28T10:28:01Z

Description

A user reported that Kedro was unable to read the CSV, they get the following logs in AWS:

The "No columns to parse from file" is being thrown by the underlying pandas implementation in this file

It would be helpful if Kedro could bubble up that the error is thrown in pandas.io.parsers.python_parser so that it is clear where the issue lies. The error above, mentions kedro.io.core.DatasetError is it not possible to do the same?

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2024-06-30T21:40:50Z

It is unclear why those logs don't show tracebacks.

Anyway, the current implementation of AbstractDataset is responsible for that DatasetError:

kedro/kedro/io/core.py

Lines 192 to 202 in adfc593

    
           try: 
        
               return self._load() 
        
           except DatasetError: 
        
               raise 
        
           except Exception as exc: 
        
               # This exception handling is by design as the composed data sets 
        
               # can throw any type of exception. 
        
               message = ( 
        
                   f"Failed while loading data from data set {str(self)}.\n{str(exc)}" 
        
               ) 
        
               raise DatasetError(message) from exc

datajoely · 2024-07-01T09:29:47Z

They must be in the exc object somewhere, I refuse to believe otherwise

ElenaKhaustova · 2024-07-29T13:15:47Z

Thank you @datajoely! Could you please provide some more context on what AWS service was used to run kedro pipeline? We would like to check if the service is filtering the error messages as it seems like we always showcase the entire error log.

datajoely · 2024-07-29T13:19:31Z

I've asked the user to comment here to double check, but I think it was:

Docker image running on AWS ECS

astrojuanlu · 2024-11-04T23:05:53Z

First, I amend my comment above: the traceback is there (File /usr/local/...).

The problem of AbstractDataset hiding the real error has been mentioned in other places (#1936 (comment), #2199 (comment)) although I don't think we have an issue for it (@ElenaKhaustova?). If that's the case, maybe we can keep this issue open?

astrojuanlu · 2024-11-13T16:37:14Z

In #2943 we partly addressed the issue of unclear errors with datasets. Yet we have a bit more evidence about this still being a problem.

For example: https://kedro.hall.community/running-kedroviz-on-docker-without-installing-the-library-H0d61LTldx29#bae33c48-aa82-447b-82e7-80486a95ecef

The user was getting

Class 'projx.models.audio.io.LargeModel' not found, is this a typo?

but the actual underlying error was:

>>> from projx.models.audio.io import LargeModel
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/src/projx/models/audio/__init__.py", line 1, in <module>
    from .base import LAM
  File "/app/src/projx/models/audio/base.py", line 1, in <module>
    from elevenlabs.client import ElevenLabs
ModuleNotFoundError: No module named 'elevenlabs'

Another internal user reported this today.

datajoely added the Issue: Feature Request New feature or improvement to existing feature label Jun 28, 2024

ElenaKhaustova added this to Kedro Framework Jun 28, 2024

github-actions bot mentioned this issue Jul 1, 2024

Monthly issue metrics report #3975

Open

yury-fedotov mentioned this issue Nov 13, 2024

Make import failures in kedro-datasets clearer, take 2 #4331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clearer underlying dataset issues #3971

Clearer underlying dataset issues #3971

datajoely commented Jun 28, 2024

astrojuanlu commented Jun 30, 2024

datajoely commented Jul 1, 2024

ElenaKhaustova commented Jul 29, 2024

datajoely commented Jul 29, 2024

astrojuanlu commented Nov 4, 2024

astrojuanlu commented Nov 13, 2024

Clearer underlying dataset issues #3971

Clearer underlying dataset issues #3971

Comments

datajoely commented Jun 28, 2024

Description

astrojuanlu commented Jun 30, 2024

datajoely commented Jul 1, 2024

ElenaKhaustova commented Jul 29, 2024

datajoely commented Jul 29, 2024

astrojuanlu commented Nov 4, 2024

astrojuanlu commented Nov 13, 2024