"Dataset could not be previewed" #1847

astrojuanlu · 2024-04-08T07:54:35Z

Description

I tried adding preview support to a custom dataset like this:

diff --git a/conf/base/catalog.yml b/conf/base/catalog.yml
index 5a9adb8..aef45cf 100644
--- a/conf/base/catalog.yml
+++ b/conf/base/catalog.yml
@@ -2,6 +2,10 @@ submissions-raw:
   type: reddit_summarizer.datasets.DeltaPolarsDataset
   filepath: s3://reddit-submissions/submissions-raw
   credentials: minio_object_store
+  metadata:
+    kedro-viz:
+      preview_args:
+        nrows: 5
 
 submissions-summaries:
   type: reddit_summarizer.datasets.DeltaPolarsDataset
diff --git a/src/reddit_summarizer/datasets/delta_polars_dataset.py b/src/reddit_summarizer/datasets/delta_polars_dataset.py
index e6f0fff..a9a2724 100644
--- a/src/reddit_summarizer/datasets/delta_polars_dataset.py
+++ b/src/reddit_summarizer/datasets/delta_polars_dataset.py
@@ -3,6 +3,8 @@ import typing as t
 import polars as pl
 from kedro.io import AbstractDataset
 
+from kedro_datasets._typing import TablePreview
+
 
 class DeltaPolarsDataset(AbstractDataset[pl.DataFrame, pl.DataFrame]):
     """``DeltaDataset`` loads/saves data from/to a Delta Table using an underlying
@@ -48,3 +50,10 @@ class DeltaPolarsDataset(AbstractDataset[pl.DataFrame, pl.DataFrame]):
             storage_options=self._storage_options,
             metadata=self._metadata,
         )
+
+    def preview(self, nrows: int) -> TablePreview:
+        subset = self._load().head(nrows)
+        df_dict = {}
+        for column in subset.columns:
+            df_dict[column] = subset[column]
+        return df_dict

And yet I got an error:

[04/08/24 09:52:40] WARNING  'submissions-raw' could not be previewed. Full exception: TypeError: DeltaPolarsDataset.preview() missing 1 required positional argument:      flowchart.py:814
                             'nrows'

What am I doing wrong?

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

Web browser system and version:
Operating system and version:
NodeJS version used (if relevant):
Kedro version used (if relevant): 0.19.3
Python version used (if relevant):

Checklist

Include labels so that we can categorise your issue

The text was updated successfully, but these errors were encountered:

rashidakanchwala · 2024-04-08T13:59:37Z

The code looks right to me. @SajidAlamQB , since u added 4 dataset previews recently, did you encounter this error? Is there anything u think looks incorrect?

SajidAlamQB · 2024-04-08T14:07:56Z

Maybe you could try setting a default value for nrows in the method definition nrows: int = 5.

astrojuanlu · 2024-04-08T14:36:12Z

Setting a default value avoids that error, although there must be something fishy happening here:

kedro-viz/package/kedro_viz/models/flowchart.py

Lines 806 to 811 in 3eeb565

    
           preview_args = ( 
        
               cls.data_node.get_preview_args() if cls.data_node.viz_metadata else None 
        
           ) 
        
           if preview_args is None: 
        
               return cls.dataset.preview() 
        
           return cls.dataset.preview(**preview_args)

On the other hand, now I get another error, but that's unrelated to this issue:

[04/08/24 16:31:28] ERROR    Exception in ASGI application                                                                      httptools_impl.py:424
                                                                                                                                                     
                             ╭────────────────────────────── Traceback (most recent call last) ───────────────────────────────╮                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/uvicorn/protocols/http/httptools_impl.py:419 in run_asgi                            │                      
                             │                                                                                                │                      
                             │   416 │   # ASGI exception wrapper                                                             │                      
                             │   417 │   async def run_asgi(self, app: "ASGI3Application") -> None:                           │                      
                             │   418 │   │   try:                                                                             │                      
                             │ ❱ 419 │   │   │   result = await app(  # type: ignore[func-returns-value]                      │                      
                             │   420 │   │   │   │   self.scope, self.receive, self.send                                      │                      
                             │   421 │   │   │   )                                                                            │                      
                             │   422 │   │   except BaseException as exc:                                                     │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/uvicorn/middleware/proxy_headers.py:84 in __call__                                  │                      
                             │                                                                                                │                      
                             │   81 │   │   │   │   │   port = 0                                                              │                      
                             │   82 │   │   │   │   │   scope["client"] = (host, port)  # type: ignore[arg-type]              │                      
                             │   83 │   │                                                                                     │                      
                             │ ❱ 84 │   │   return await self.app(scope, receive, send)                                       │                      
                             │   85                                                                                           │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/fastapi/applications.py:1054 in __call__                                            │                      
                             │                                                                                                │                      
                             │   1051 │   async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:       │                      
                             │   1052 │   │   if self.root_path:                                                              │                      
                             │   1053 │   │   │   scope["root_path"] = self.root_path                                         │                      
                             │ ❱ 1054 │   │   await super().__call__(scope, receive, send)                                    │                      
                             │   1055 │                                                                                       │                      
                             │   1056 │   def add_api_route(                                                                  │                      
                             │   1057 │   │   self,                                                                           │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/applications.py:123 in __call__                                           │                      
                             │                                                                                                │                      
                             │   120 │   │   scope["app"] = self                                                              │                      
                             │   121 │   │   if self.middleware_stack is None:                                                │                      
                             │   122 │   │   │   self.middleware_stack = self.build_middleware_stack()                        │                      
                             │ ❱ 123 │   │   await self.middleware_stack(scope, receive, send)                                │                      
                             │   124 │                                                                                        │                      
                             │   125 │   def on_event(self, event_type: str) -> typing.Callable:  # type: ignore[type-arg]    │                      
                             │   126 │   │   return self.router.on_event(event_type)  # pragma: nocover                       │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/middleware/errors.py:186 in __call__                                      │                      
                             │                                                                                                │                      
                             │   183 │   │   │   # We always continue to raise the exception.                                 │                      
                             │   184 │   │   │   # This allows servers to log the error, or allows test clients               │                      
                             │   185 │   │   │   # to optionally raise the error within the test case.                        │                      
                             │ ❱ 186 │   │   │   raise exc                                                                    │                      
                             │   187 │                                                                                        │                      
                             │   188 │   def format_line(                                                                     │                      
                             │   189 │   │   self, index: int, line: str, frame_lineno: int, frame_index: int                 │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/middleware/errors.py:164 in __call__                                      │                      
                             │                                                                                                │                      
                             │   161 │   │   │   await send(message)                                                          │                      
                             │   162 │   │                                                                                    │                      
                             │   163 │   │   try:                                                                             │                      
                             │ ❱ 164 │   │   │   await self.app(scope, receive, _send)                                        │                      
                             │   165 │   │   except Exception as exc:                                                         │                      
                             │   166 │   │   │   request = Request(scope)                                                     │                      
                             │   167 │   │   │   if self.debug:                                                               │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/middleware/exceptions.py:62 in __call__                                   │                      
                             │                                                                                                │                      
                             │   59 │   │   else:                                                                             │                      
                             │   60 │   │   │   conn = WebSocket(scope, receive, send)                                        │                      
                             │   61 │   │                                                                                     │                      
                             │ ❱ 62 │   │   await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)          │                      
                             │   63 │                                                                                         │                      
                             │   64 │   def http_exception(self, request: Request, exc: Exception) -> Response:               │                      
                             │   65 │   │   assert isinstance(exc, HTTPException)                                             │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/_exception_handler.py:64 in wrapped_app                                   │                      
                             │                                                                                                │                      
                             │   61 │   │   │   │   handler = _lookup_exception_handler(exception_handlers, exc)              │                      
                             │   62 │   │   │                                                                                 │                      
                             │   63 │   │   │   if handler is None:                                                           │                      
                             │ ❱ 64 │   │   │   │   raise exc                                                                 │                      
                             │   65 │   │   │                                                                                 │                      
                             │   66 │   │   │   if response_started:                                                          │                      
                             │   67 │   │   │   │   msg = "Caught handled exception, but response already started."           │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/_exception_handler.py:53 in wrapped_app                                   │                      
                             │                                                                                                │                      
                             │   50 │   │   │   await send(message)                                                           │                      
                             │   51 │   │                                                                                     │                      
                             │   52 │   │   try:                                                                              │                      
                             │ ❱ 53 │   │   │   await app(scope, receive, sender)                                             │                      
                             │   54 │   │   except Exception as exc:                                                          │                      
                             │   55 │   │   │   handler = None                                                                │                      
                             │   56                                                                                           │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/routing.py:758 in __call__                                                │                      
                             │                                                                                                │                      
                             │   755 │   │   """                                                                              │                      
                             │   756 │   │   The main entry point to the Router class.                                        │                      
                             │   757 │   │   """                                                                              │                      
                             │ ❱ 758 │   │   await self.middleware_stack(scope, receive, send)                                │                      
                             │   759 │                                                                                        │                      
                             │   760 │   async def app(self, scope: Scope, receive: Receive, send: Send) -> None:             │                      
                             │   761 │   │   assert scope["type"] in ("http", "websocket", "lifespan")                        │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/routing.py:778 in app                                                     │                      
                             │                                                                                                │                      
                             │   775 │   │   │   match, child_scope = route.matches(scope)                                    │                      
                             │   776 │   │   │   if match == Match.FULL:                                                      │                      
                             │   777 │   │   │   │   scope.update(child_scope)                                                │                      
                             │ ❱ 778 │   │   │   │   await route.handle(scope, receive, send)                                 │                      
                             │   779 │   │   │   │   return                                                                   │                      
                             │   780 │   │   │   elif match == Match.PARTIAL and partial is None:                             │                      
                             │   781 │   │   │   │   partial = route                                                          │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/routing.py:299 in handle                                                  │                      
                             │                                                                                                │                      
                             │   296 │   │   │   │   )                                                                        │                      
                             │   297 │   │   │   await response(scope, receive, send)                                         │                      
                             │   298 │   │   else:                                                                            │                      
                             │ ❱ 299 │   │   │   await self.app(scope, receive, send)                                         │                      
                             │   300 │                                                                                        │                      
                             │   301 │   def __eq__(self, other: typing.Any) -> bool:                                         │                      
                             │   302 │   │   return (                                                                         │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/routing.py:79 in app                                                      │                      
                             │                                                                                                │                      
                             │    76 │   │   │   │   response = await run_in_threadpool(func, request)                        │                      
                             │    77 │   │   │   await response(scope, receive, send)                                         │                      
                             │    78 │   │                                                                                    │                      
                             │ ❱  79 │   │   await wrap_app_handling_exceptions(app, request)(scope, receive, send)           │                      
                             │    80 │                                                                                        │                      
                             │    81 │   return app                                                                           │                      
                             │    82                                                                                          │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/_exception_handler.py:64 in wrapped_app                                   │                      
                             │                                                                                                │                      
                             │   61 │   │   │   │   handler = _lookup_exception_handler(exception_handlers, exc)              │                      
                             │   62 │   │   │                                                                                 │                      
                             │   63 │   │   │   if handler is None:                                                           │                      
                             │ ❱ 64 │   │   │   │   raise exc                                                                 │                      
                             │   65 │   │   │                                                                                 │                      
                             │   66 │   │   │   if response_started:                                                          │                      
                             │   67 │   │   │   │   msg = "Caught handled exception, but response already started."           │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/_exception_handler.py:53 in wrapped_app                                   │                      
                             │                                                                                                │                      
                             │   50 │   │   │   await send(message)                                                           │                      
                             │   51 │   │                                                                                     │                      
                             │   52 │   │   try:                                                                              │                      
                             │ ❱ 53 │   │   │   await app(scope, receive, sender)                                             │                      
                             │   54 │   │   except Exception as exc:                                                          │                      
                             │   55 │   │   │   handler = None                                                                │                      
                             │   56                                                                                           │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/starlette/routing.py:74 in app                                                      │                      
                             │                                                                                                │                      
                             │    71 │   │                                                                                    │                      
                             │    72 │   │   async def app(scope: Scope, receive: Receive, send: Send) -> None:               │                      
                             │    73 │   │   │   if is_async_callable(func):                                                  │                      
                             │ ❱  74 │   │   │   │   response = await func(request)                                           │                      
                             │    75 │   │   │   else:                                                                        │                      
                             │    76 │   │   │   │   response = await run_in_threadpool(func, request)                        │                      
                             │    77 │   │   │   await response(scope, receive, send)                                         │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/fastapi/routing.py:296 in app                                                       │                      
                             │                                                                                                │                      
                             │    293 │   │   │   │   │   │   │   response_args["status_code"] = current_status_code          │                      
                             │    294 │   │   │   │   │   │   if sub_response.status_code:                                    │                      
                             │    295 │   │   │   │   │   │   │   response_args["status_code"] = sub_response.status_code     │                      
                             │ ❱  296 │   │   │   │   │   │   content = await serialize_response(                             │                      
                             │    297 │   │   │   │   │   │   │   field=response_field,                                       │                      
                             │    298 │   │   │   │   │   │   │   response_content=raw_response,                              │                      
                             │    299 │   │   │   │   │   │   │   include=response_model_include,                             │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/fastapi/routing.py:160 in serialize_response                                        │                      
                             │                                                                                                │                      
                             │    157 │   │   │   )                                                                           │                      
                             │    158 │   │                                                                                   │                      
                             │    159 │   │   if hasattr(field, "serialize"):                                                 │                      
                             │ ❱  160 │   │   │   return field.serialize(                                                     │                      
                             │    161 │   │   │   │   value,                                                                  │                      
                             │    162 │   │   │   │   include=include,                                                        │                      
                             │    163 │   │   │   │   exclude=exclude,                                                        │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/fastapi/_compat.py:147 in serialize                                                 │                      
                             │                                                                                                │                      
                             │   144 │   │   ) -> Any:                                                                        │                      
                             │   145 │   │   │   # What calls this code passes a value that already called                    │                      
                             │   146 │   │   │   # self._type_adapter.validate_python(value)                                  │                      
                             │ ❱ 147 │   │   │   return self._type_adapter.dump_python(                                       │                      
                             │   148 │   │   │   │   value,                                                                   │                      
                             │   149 │   │   │   │   mode=mode,                                                               │                      
                             │   150 │   │   │   │   include=include,                                                         │                      
                             │                                                                                                │                      
                             │ /Users/juan_cano/Projects/QuantumBlackLabs/workshop-kedro-huggingface/.venv/lib/python3.11/sit │                      
                             │ e-packages/pydantic/type_adapter.py:333 in dump_python                                         │                      
                             │                                                                                                │                      
                             │   330 │   │   Returns:                                                                         │                      
                             │   331 │   │   │   The serialized object.                                                       │                      
                             │   332 │   │   """                                                                              │                      
                             │ ❱ 333 │   │   return self.serializer.to_python(                                                │                      
                             │   334 │   │   │   __instance,                                                                  │                      
                             │   335 │   │   │   mode=mode,                                                                   │                      
                             │   336 │   │   │   by_alias=by_alias,                                                           │                      
                             ╰────────────────────────────────────────────────────────────────────────────────────────────────╯                      
                             PydanticSerializationError: Unable to serialize unknown type: <class                                                    
                             'polars.series.series.Series'>

ravi-kumar-pilla · 2024-04-29T22:10:51Z

Hi @astrojuanlu ,

Thank you for raising the issue. I am able to produce the bug on my side. The documentation for creating a custom dataset seems to create a dictionary but the value inside it is of type pandas.series or polars.series (in your case).

I created a CustomDataset mimicking the excel_dataset implementation and modified the preview method as below -

    def preview(self, nrows, ncolumns, filters) -> TablePreview:
      dataset_copy = self._copy()
      dataset_copy._load_args["nrows"] = nrows
      filtered_data = dataset_copy.load()

      for column, value in filters.items():
          filtered_data = filtered_data[filtered_data[column] == value]
      subset = filtered_data.iloc[:nrows, :ncolumns]
      df_dict = {}
      for column in subset.columns:
          df_dict[column] = subset[column]
      
      return df_dict

catalog -

shuttles:
  type: demo_project.extras.datasets.custom_dataset.CustomDataset
  filepath: ${_base_location}/01_raw/shuttles.xlsx
  metadata:
    kedro-viz:
      layer: raw
      preview_args:
        nrows: 5
        ncolumns: 2 
        filters: {
          engine_type: Quantum 
        }

This gives me the below preview data -

preview={'id': 0    63561
1    36260
2    57015
Name: id, dtype: int64, 'shuttle_location': 0                  Niue
1              Anguilla
2    Russian Federation
Name: shuttle_location, dtype: object},

So, when FAST API tries to searialize this response it throws the error PydanticSerializationError rightly. I tried to correct the preview method as below - (change - df_dict[column] = subset[column].to_dict())

  def preview(self, nrows, ncolumns, filters) -> TablePreview:
      dataset_copy = self._copy()
      dataset_copy._load_args["nrows"] = nrows
      filtered_data = dataset_copy.load()

      for column, value in filters.items():
          filtered_data = filtered_data[filtered_data[column] == value]
      subset = filtered_data.iloc[:nrows, :ncolumns]
      df_dict = {}
      for column in subset.columns:
          df_dict[column] = subset[column].to_dict()
      
      return df_dict

Now the response is a valid dict like -

 preview={'id': {0: 63561, 1: 36260, 2: 57015}, 'shuttle_location': {0: 'Niue', 1: 'Anguilla', 2: 'Russian Federation'}}

Though the error is resolved, there needs to be further discussion with the team to check if the dictionary needs to be in a certain format, so that it will be previewed as I am seeing a blank box in the UI with the above preview response.

An example preview response (TablePreview) which works -

preview={'index': [0, 1, 2, 3, 4], 'columns': ['id', 'company_rating', 'company_location', 'total_fleet_count', 'iata_approved'], 'data': [[35029, '100%', 'Niue', 4.0, 'f'], [30292, '67%', 'Anguilla', 6.0, 'f'], [19032, '67%', 'Russian Federation', 4.0, 'f'], [8238, '91%', 'Barbados', 15.0, 't'], [30342, nan, 'Sao Tome and Principe', 2.0, 't']]}

Next Steps:

Confirm with the team (@jitu5 , @SajidAlamQB ) if there needs to be any specific format for the preview dictionary to be viewed in the UI ?
Should the user be responsible to return a valid dictionary key-value pairs, from the preview method ? (cc: @rashidakanchwala )
In case the user returns an invalid dictionary key-value pairs fast api raises the exception as it did. Should we show a custom error message for the user or leave it like this ? I think we should catch (PydanticSerializationError) for now and show the user some custom message which would allow them to check the preview data being returned.

Thank you

jitu5 · 2024-04-30T17:32:04Z

Confirm with the team (@jitu5 , @SajidAlamQB ) if there needs to be any specific format for the preview dictionary to be viewed in the UI ?

@ravi-kumar-pilla for the TablePreview, UI expect in below format

preview={
    'index': number[], 
    'columns': string[], 
    'data': any[][]  // List[List[Any]] 
}

ravi-kumar-pilla · 2024-04-30T17:43:17Z

Thanks @jitu5 , I think we should document the expected {key:value} pairs for all the supported preview types -

TablePreview = NewType("TablePreview", dict)
ImagePreview = NewType("ImagePreview", bytes)
PlotlyPreview = NewType("PlotlyPreview", dict)
JSONPreview = NewType("JSONPreview", dict)

I will create a ticket to document the expected schema and also see if we can enforce that in the backend so that, FE always gets what is required.

For this ticket, I would like to add an except block to catch PydanticSerializationError and inform the users to return a valid dictionary {key:value} pair. Thank you

rashidakanchwala · 2024-04-30T19:37:13Z

Thanks, Ravi. Great debugging. I now realize this is happening because while our TablePreview accepts a generic dict, our frontend only supports Pandas DataFrames for displaying tabular data and not any other formats such as Polars.

We could add validation in the backend, but I do feel our frontend needs to be more versatile to handle different tabular formats.

Update - we could check with Vizro, how they do this in the BE.

rashidakanchwala · 2024-04-30T20:07:57Z

The FE logic to distinguish between Polars and Pandas dataframe is quite straightforward.

Ideally, we could natively support both formats. My question is, do we anticipate more formats? If so, maybe we could use Plotly to preview data tables as well, instead of our own custom table preview?

@astrojuanlu?

astrojuanlu · 2024-05-03T16:06:42Z

I guess the main problem I see is that, as a user, it's not clear what the preview() method should return. I initially did the df_dict thing because I copy-pasted it from somewhere else without really understanding how it works.

rashidakanchwala · 2024-05-07T11:08:49Z

Yes, I think TablePreview is very generic when it is actually only meant for Pandas. So I am proposing we rename TablePreview to PandasTablePreview.

and create another NewType called PolarsTablePreview which also returns a dict.

The only difference is now in the FE, we have a different way of displaying both dicts as tables. What do you all think? @astrojuanlu , @ravi-kumar-pilla , @jitu5

jitu5 · 2024-05-07T12:06:19Z

@rashidakanchwala If the logic to distinguish between Polars and Pandas dataframe is straightforward at FE then bases on the type, we can create a separate components for each type to display.

astrojuanlu · 2024-05-07T14:15:25Z

Yes, I think TablePreview is very generic when it is actually only meant for Pandas. So I am proposing we rename TablePreview to PandasTablePreview.

and create another NewType called PolarsTablePreview which also returns a dict.

To clarify: if I change my preview function to return the PolarsTablePreview you're proposing and I don't change the body, would it work?

    def preview(self, nrows: int) -> PolarsTablePreview:
        subset = self._load().head(nrows)
        df_dict = {}
        for column in subset.columns:
            df_dict[column] = subset[column]
        return df_dict

My understanding is that someone needs to make sure the output of the function can be serialized.

If this is the user responsibility, then the object returned from preview should have some constraints. Where should these be enforced?
If the user can return whatever, then somebody else should take care of serializing that. But my understanding is that there are limits to how smart we can be about the object returned by preview, which takes me back to 1.

I think ideally the user would be helped by the IDE to return a proper object. How can we achieve this, so the journey is more clear?

rashidakanchwala · 2024-05-07T15:25:41Z

I agree, @ravi-kumar-pilla's ticket here #1884 proposes what you mentioned.

At the time we decided to use NewType, we did not anticipate running into this; if we need to enforce validation - we might have to change the NewTypes into classes that we did consider before.

astrojuanlu · 2024-07-18T13:44:52Z

I was trying to prove whether TablePreview could work with a Polars dataframe, but the truth is that I cannot even make it work with a Pandas dataframe...

my preview function is now

    def preview(self, nrows: int=5) -> TablePreview:
        subset = self._load().head(nrows).to_pandas()
        df_dict = {}
        for column in subset.columns:
            df_dict[column] = subset[column]
        return df_dict

and I get this on the CLI

PydanticSerializationError: Unable to serialize unknown type: <class                                  
                             'pandas.core.series.Series'>

Now, if I change the code to be this:

df_dict[column] = subset[column].to_list()  # Serializable

I get a perfectly empty table:

So there are, at least, 4 problems:

The preview function is called without arguments?
The return type from the preview function does not match the schema that the frontend expects
It doesn't work for Pandas dataframes as explained in the docs https://docs.kedro.org/projects/kedro-viz/en/stable/preview_custom_datasets.html
Even after fixing serialization issues, the frontend shows an empty table

SajidAlamQB · 2024-09-02T11:24:20Z

I was trying to prove whether TablePreview could work with a Polars dataframe, but the truth is that I cannot even make it work with a Pandas dataframe...
    def preview(self, nrows: int=5) -> TablePreview:
        subset = self._load().head(nrows).to_pandas()
        df_dict = {}
        for column in subset.columns:
            df_dict[column] = subset[column]
        return df_dict

Hey @astrojuanlu sorry its taken a bit to get back to this, could you try this instead.

    def preview(self, nrows: int=5) -> TablePreview:
        subset = self._load().head(nrows).to_pandas()
        preview_data = {
            'index': list(subset.index),  # List of row indices
            'columns': list(subset.columns),  # List of column names
            'data': subset.values.tolist()  # List of rows, where each row is a list of values
        }
        return preview_data

The expected format in FE is this as @jitu5 mentioned:

preview={
'index': number[],
'columns': string[],
'data': any[][] // List[List[Any]]
}

astrojuanlu · 2024-09-02T14:49:45Z

Thanks @SajidAlamQB, I confirm that this works. Minor tweak:

# catalog.yml
companies:
  type: kedro_preview.datasets.CustomCSVDataset
  filepath: data/01_raw/companies.csv
  metadata:
    kedro-viz:
      preview_args:
        nrows: 5

from kedro_datasets.polars import CSVDataset

from kedro_datasets._typing import TablePreview


class CustomCSVDataset(CSVDataset):
    def preview(self, nrows: int) -> TablePreview:
        subset = self._load().head(nrows)
        return {
            "index": list(range(len(subset))),
            "columns": list(subset.columns),
            "data": subset.to_numpy().tolist(),
        }

It is my understanding then that the docs are wrong? https://docs.kedro.org/projects/kedro-viz/en/latest/preview_custom_datasets.html#extend-preview-to-custom-datasets

In other news, this confirms my intuition that there's nothing pandas-specific to this functionality, so we don't need to introduce PandasTablePreview, PolarsTablePreview etc.

As a user, I think it would have been easier if I could have seen an error not on Kedro-Viz validation side, but on my own code. Something like:

[09/02/24 16:48:24] WARNING  'companies' could not be previewed. Full exception: ValidationError: 1 validation error for NGTablePreview                                              flowchart.py:701
                             index                                                                                                                                                                   
                               Input should be a valid list [type=list_type, input_value=1, input_type=int]

But then again, as discussed in kedro-org/kedro-plugins#504 it's not entirely clear where this validation logic should live...

SajidAlamQB · 2024-09-03T11:49:35Z

Yes @astrojuanlu you are correct I believe the Key Points for this issue are:

Expected Data Format: The frontend expects the preview data in a specific format for TablePreview (a dictionary with "index", "columns", and "data" keys), which wasn't initially clear.
Incorrect Documentation: There seems to be a mistake between what the documentation is saying and what is actually required by the frontend.

I have a draft PR that adds validation check for the expected data format in kedro-viz backend, #2070 and opened a PR to update the documentation, #2074. We can discuss these solutions in the upcoming PS session.

SajidAlamQB · 2024-09-05T16:11:54Z

Update from PS Session on Dataset Previews:

We’ve identified two key action items to address the ongoing issue with custom dataset previews:

[Spike] Create a Proof of Concept (PoC) for Type Hints in the IDE:
By using a type dict or other typing tools, see if the IDE will be able to provide hints that lets users know about the required structure of the preview dictionary (e.g., index, columns, and data keys for TablePreview). Assess how effective the PoC is in helping users keep to the expected preview format and reducing the likelihood of errors due to incorrect return types or structures.
[Docs] Enhance Documentation for Dataset Previews:
Expand the current documentation to provide detailed examples of the expected return format for various preview types (e.g., TablePreview, ImagePreview, PlotlyPreview, JSONPreview).
Document the required {key: value} pairs for the relevant preview types.

SajidAlamQB · 2024-09-11T10:52:00Z

Closing this for now as docs have been updated, we can follow the spike on this new issue: #2090.

astrojuanlu added the Issue: Bug Report label Apr 8, 2024

rashidakanchwala added this to Kedro-Viz Apr 8, 2024

rashidakanchwala moved this to Inbox in Kedro-Viz Apr 8, 2024

rashidakanchwala moved this from Inbox to Backlog in Kedro-Viz Apr 8, 2024

rashidakanchwala moved this from Backlog to Todo in Kedro-Viz Apr 15, 2024

ravi-kumar-pilla self-assigned this Apr 26, 2024

ravi-kumar-pilla moved this from Todo to In Progress in Kedro-Viz Apr 26, 2024

ravi-kumar-pilla mentioned this issue Apr 30, 2024

Update documentation and add schema details for Custom Dataset Preview #1884

Closed

1 task

astrojuanlu moved this from In Progress to Todo in Kedro-Viz May 8, 2024

rashidakanchwala assigned rashidakanchwala and unassigned ravi-kumar-pilla Jul 22, 2024

rashidakanchwala assigned SajidAlamQB and unassigned rashidakanchwala Aug 5, 2024

SajidAlamQB mentioned this issue Sep 2, 2024

Add validation for dataset previews #2070

Closed

5 tasks

SajidAlamQB mentioned this issue Sep 5, 2024

Enhance Documentation for Dataset Previews #2074

Merged

5 tasks

SajidAlamQB mentioned this issue Sep 11, 2024

[Spike] Create a Proof of Concept for Type Hints for dataset previews #2090

Closed

1 task

SajidAlamQB closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Dataset could not be previewed" #1847

"Dataset could not be previewed" #1847

astrojuanlu commented Apr 8, 2024 •

edited

Loading

rashidakanchwala commented Apr 8, 2024

SajidAlamQB commented Apr 8, 2024

astrojuanlu commented Apr 8, 2024

ravi-kumar-pilla commented Apr 29, 2024

jitu5 commented Apr 30, 2024

ravi-kumar-pilla commented Apr 30, 2024

rashidakanchwala commented Apr 30, 2024 •

edited

Loading

rashidakanchwala commented Apr 30, 2024

astrojuanlu commented May 3, 2024

rashidakanchwala commented May 7, 2024

jitu5 commented May 7, 2024

astrojuanlu commented May 7, 2024

rashidakanchwala commented May 7, 2024

astrojuanlu commented Jul 18, 2024

SajidAlamQB commented Sep 2, 2024 •

edited

Loading

astrojuanlu commented Sep 2, 2024

SajidAlamQB commented Sep 3, 2024 •

edited

Loading

SajidAlamQB commented Sep 5, 2024 •

edited

Loading

SajidAlamQB commented Sep 11, 2024

"Dataset could not be previewed" #1847

"Dataset could not be previewed" #1847

Comments

astrojuanlu commented Apr 8, 2024 • edited Loading

Description

Your Environment

Checklist

rashidakanchwala commented Apr 8, 2024

SajidAlamQB commented Apr 8, 2024

astrojuanlu commented Apr 8, 2024

ravi-kumar-pilla commented Apr 29, 2024

jitu5 commented Apr 30, 2024

ravi-kumar-pilla commented Apr 30, 2024

rashidakanchwala commented Apr 30, 2024 • edited Loading

rashidakanchwala commented Apr 30, 2024

astrojuanlu commented May 3, 2024

rashidakanchwala commented May 7, 2024

jitu5 commented May 7, 2024

astrojuanlu commented May 7, 2024

rashidakanchwala commented May 7, 2024

astrojuanlu commented Jul 18, 2024

SajidAlamQB commented Sep 2, 2024 • edited Loading

astrojuanlu commented Sep 2, 2024

SajidAlamQB commented Sep 3, 2024 • edited Loading

SajidAlamQB commented Sep 5, 2024 • edited Loading

SajidAlamQB commented Sep 11, 2024

astrojuanlu commented Apr 8, 2024 •

edited

Loading

rashidakanchwala commented Apr 30, 2024 •

edited

Loading

SajidAlamQB commented Sep 2, 2024 •

edited

Loading

SajidAlamQB commented Sep 3, 2024 •

edited

Loading

SajidAlamQB commented Sep 5, 2024 •

edited

Loading