Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inspect between nodes #784

Merged

Conversation

Jhonatannunessilva
Copy link
Contributor

@Jhonatannunessilva Jhonatannunessilva commented Dec 22, 2023

This PR resolves #768

After this PR:

With DataFrame

iex(test_0@myhost)1> df = Explorer.Datasets.fossil_fuels()       
#Explorer.DataFrame<      
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA",
   "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

iex(test_0@myhost)2> Node.spawn_link(:"test_1@myhost", fn -> IO.inspect(df) end)    
#PID<21907.295.0>
#Explorer.DataFrame<        
  Polars[node: test_0@myhost]
  year integer ???
  country string ???
  total integer ???
  solid_fuel integer ???
  liquid_fuel integer ???
  gas_fuel integer ???
  cement integer ???
  gas_flaring integer ???
  per_capita f64 ???
  bunker_fuels integer ???
>

iex(test_0@myhost)3> IO.inspect(df)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA",
   "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

With Series

iex(test_0@myhost)1> series = Explorer.Series.from_list([1, 2, 3])                         
#Explorer.Series<
  Polars[3]
  integer [1, 2, 3]
>

iex(test_0@myhost)2> Node.spawn_link(:"test_1@myhost", fn -> IO.inspect(series) end)
#PID<21661.295.0>
#Explorer.Series<           
  Polars[node: test_0@myhost]
  integer ???
>

iex(test_0@myhost)3> IO.inspect(series)
#Explorer.Series<
  Polars[3]
  integer [1, 2, 3]
>
#Explorer.Series<
  Polars[3]
  integer [1, 2, 3]
>

@Jhonatannunessilva
Copy link
Contributor Author

I had a problem when I tried to run the tests with the "mix ci" command... 6 tests are breaking because of this error:

  6) test from_query/3 queries database (Explorer.DataFrameTest)
     test/explorer/data_frame_test.exs:40
     ** (RuntimeError) failed to start child with the spec {Adbc.Database, [driver: :sqlite]}.
     Reason: an exception was raised:
         ** (ErlangError) Erlang error: :not_loaded
             :erlang.nif_error(:not_loaded)
             (adbc 0.2.2) lib/adbc_nif.ex:24: Adbc.Nif.adbc_database_new/0
             (adbc 0.2.2) lib/adbc_database.ex:51: Adbc.Database.start_link/1
             (stdlib 4.0.1) supervisor.erl:414: :supervisor.do_start_child_i/3
             (stdlib 4.0.1) supervisor.erl:400: :supervisor.do_start_child/2
             (stdlib 4.0.1) supervisor.erl:706: :supervisor.handle_start_child/2
             (stdlib 4.0.1) supervisor.erl:455: :supervisor.handle_call/3
             (stdlib 4.0.1) gen_server.erl:1146: :gen_server.try_handle_call/4
             (stdlib 4.0.1) gen_server.erl:1175: :gen_server.handle_msg/6
             (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
     stacktrace:
       (ex_unit 1.14.0) lib/ex_unit/callbacks.ex:538: ExUnit.Callbacks.start_supervised!/2
       test/explorer/data_frame_test.exs:35: Explorer.DataFrameTest.__ex_unit_setup_0_0/1
       Explorer.DataFrameTest.__ex_unit_describe_0/1

I think I had this problem before and I had already solved it, but now that it's back I don't remember what I did... I've already tried to recompile, update the project, etc.

All tests work with the "mix test --only cloud_integration" command.

Versions I'm using:

  • rust nightly
  • elixir 1.14.0-otp-25
  • erlang 25.0.4
  • cmake version 3.27.0-rc4

I already tried with (which was the version I was on before):

  • rust nightly
  • elixir 1.14.0-otp-24
  • erlang 24.3.4.11

Would anyone know why?

Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor detail :)

Would anyone know why? (About the Adbc exception)

Sorry, I don't remember either. But maybe José can help.

@@ -5785,6 +5785,12 @@ defmodule Explorer.DataFrame do
defimpl Inspect do
import Inspect.Algebra

def inspect(df, _opts) when node(df.data.resource) != node() do
raise RuntimeError.exception(
"It is not possible to inspect a DataFrame that belongs to another node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally don't capitalize the message:

Suggested change
"It is not possible to inspect a DataFrame that belongs to another node"
"it is not possible to inspect a DataFrame that belongs to another node"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues here:

  1. We probably should not raise, otherwise it makes impossible to see the value in logs, IEx, etc

  2. df.data.resource is accessing an implementation detail of the backend. Ideally we want to move this to Explorer.DF.PolarsBackend or similar. And instead of raising, maybe we just print something like:

#Explorer.DataFrame<        
  Polars[node: other@foobar]
>

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add a little bit: we have the column names and types. We could also display them if we want (without the values). WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's good. We can show them as:

Polars[node: ...]
foo s64 ???
bar string ???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -6048,6 +6048,12 @@ defmodule Explorer.Series do
defimpl Inspect do
import Inspect.Algebra

def inspect(series, _opts) when node(series.data.resource) != node() do
raise RuntimeError.exception(
"It is not possible to inspect a Series that belongs to another node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"It is not possible to inspect a Series that belongs to another node"
"it is not possible to inspect a Series that belongs to another node"

@@ -858,6 +858,10 @@ defmodule Explorer.PolarsBackend.DataFrame do
# Inspect

@impl true
def inspect(df, opts) when node(df.data.resource) != node() do
Explorer.Backend.DataFrame.inspect(df, "Polars", nil, opts, from_another_node: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Explorer.Backend.DataFrame.inspect(df, "Polars", nil, opts, from_another_node: true)
Explorer.Backend.DataFrame.inspect(df, "Polars", "node: #{df.data.resource}", opts, elide_columns: true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will generate this error:

iex(test_0@myhost)1> df = Explorer.Datasets.fossil_fuels()                                                                                                   
#Explorer.DataFrame<                                                                                                                                                
  Polars[1094 x 10]                                                                                                                                                 
  year integer [2010, 2010, 2010, 2010, 2010, ...]                                                                                                                  
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]                                                                                    
  total integer [2308, 1254, 32500, 141, 7924, ...]                                                                                                                 
  solid_fuel integer [627, 117, 332, 0, 0, ...]                                                                                                                     
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]                                                                                                            
  gas_fuel integer [74, 7, 14565, 0, 374, ...]                                                                                                                      
  cement integer [5, 177, 2598, 0, 204, ...]                                                                                                                        
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

iex(test_0@myhost)2> "node: #{df.data.resource}"
** (Protocol.UndefinedError) protocol String.Chars not implemented for #Reference<0.641693014.1003356202.157361> of type Reference. This protocol is implemented for the following type(s): Atom, BitString, Complex, Date, DateTime, Explorer.Duration, Float, Hex.Solver.Assignment, Hex.Solver.Constraints.Empty, Hex.Solver.Constraints.Range, Hex.Solver.Constraints.Union, Hex.Solver.Incompatibility, Hex.Solver.PackageRange, Hex.Solver.Term, Integer, List, NaiveDateTime, Time, URI, Version, Version.Requirement
    (elixir 1.14.0) lib/string/chars.ex:3: String.Chars.impl_for!/1
    (elixir 1.14.0) lib/string/chars.ex:22: String.Chars.to_string/1
    iex:2: (file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we need to inspect the node, but the overall idea should be solid. :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim josevalim merged commit 5c51c26 into elixir-explorer:main Dec 29, 2023
3 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

This was referenced Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataframes created on a different node cannot be introspected
3 participants