Network loading API: support for binary streams #401

sylvlecl · 2022-05-23T13:24:55Z

Do you want to request a feature or report a bug?

Feature.

What is the current behavior?

We have 2 loading methods:

load: takes a file path as argument
load_from_string: takes the content of a file, as a string, as argument. Does not support byte entries.

So, in particular, in-memory "blobs" cannot be provided as arguments.
Streaming content is not supported either.

What is the expected behavior?

As proposed in #144, we should support byte streams (file object) as arguments to load.

Type checking and runtime behaviour
After some digging, it seems there is not a very standard way of type cheking for file objects, neither at runtime nor at typecheck time.
We have typing.BinaryIO and various classes in io module. But io.BinaryIO exposes much more methods than necessary:
we could go for a lighter protocol.
As an example, pandas lib seems to perform typechecking by using a union of many types (see comment below), and at runtime will only check for the presence of a read method.

--> a good, mixed approach could be to define a simple protocol with at least read method, and check for their presence at runtime.

String input handling

We should deprecate the load_from_string method:
users can provide an in memory buffer instead, with io.BytesIO for example.

Once question remains: do we allow only to input binary IO (such as io.BytesIO), or also text IO (such as io.StringIO) ?
The latter could be handy for text formats.
If we want to distinguish between the 2, there is no standard way ... For example, pandas ends up looking for 'b' character in "mode", and also checking the actual class of the object (against a predefined set of class including io.TextIOWrapper etc).

What is the motivation / use case for changing the behavior?

A use case is being able to load a network from in memory zip content.

The text was updated successfully, but these errors were encountered:

sylvlecl · 2022-05-24T09:25:24Z

For reference, pandas uses those unions for typing file inputs:

# filenames and file-like-objects
Buffer = Union[IO[AnyStr], RawIOBase, BufferedIOBase, TextIOBase, TextIOWrapper, mmap]
FileOrBuffer = Union[str, Buffer[AnyStr]]
FilePathOrBuffer = Union["PathLike[str]", FileOrBuffer[AnyStr]]

Haigutus · 2022-05-31T11:10:56Z

I would propose to support only io.BytestIO
The same should be done for export
On the packaging of files I would propose that the function would take in list of binary objects and find all XML-s independent of packaging

Network.load_bytest(list(object1, object2, objectN)

where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside

Haigutus · 2023-04-14T11:39:58Z

any updates on this?

Haigutus · 2023-04-30T08:11:38Z

It seems there should be change done already on JAVA side on things

Currenlty:
filePath -> Network.load -> DataSource -> Imporetr find -> Import

Proposal to change Network.load implementation:

if filePath: use current solution
if not filePath and DataSource
DataSource -> Network.load -> Importer find -> Import

Then on PY side of things one could implement creation of DataSource from file_like objects
so either
DataSourece(open(filePath))
or
DataSource(io.BytesIO)

This would help integrations in any language as it would remove main filesystem dependacy form the source code

https://github.com/powsybl/powsybl-core/blob/8b2850d803738d471c76b1f6ac7c8903af70e3db/commons/src/main/java/com/powsybl/commons/datasource/DataSourceUtil.java#L65

geofjamg · 2023-05-02T18:02:57Z

An easy solution could be to fully load the byte stream on python side and then transfer as it is to Java side but it would not be memory efficient. The right way (and hardest!) is to somehow connect python BytesIO.read to Java InputStream.read to be able to really stream the bytes from end to end.

geofjamg · 2023-05-02T18:04:38Z

I would propose to support only io.BytestIO

The same should be done for export

On the packaging of files I would propose that the function would take in list of binary objects and find all XML-s independent of packaging

Network.load_bytest(list(object1, object2, objectN)

where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside

If we want to support this, we have to first detect if the byte stream is a zip or not (using magic number) and then we will be able to map it to the right DataSource implementation on Java side.

Haigutus · 2023-05-03T09:50:10Z

Maybe the GraalVM api helps here with https://www.graalvm.org/sdk/javadoc/index.html?org/graalvm/nativeimage/c/type/CTypeConversion.html

obrix · 2023-05-25T14:00:17Z

FYI following discussions with @geofjamg I am working to support io.BytesIO as an input to load a network. This mean the content have to be fully loaded in memory but it should work with ascii based network description and also zip file as a memory blob (require a pull request on powsybl core to add a InMemoryZipFileDataSource as the existing ZipFileDataSource reload from a file).
I'll try to investigate a completely streamed pipeline but it will probably be an issue with zip support.

geofjamg · 2023-07-26T09:21:53Z

@Haigutus last release 0.24 supports BytesIO for network loading. You can provides to load_from_binary_buffers a list of BytesIO containing for instance individually zipped profiles and boundaries. Let us known if it solves your issue.

Haigutus · 2023-08-03T13:24:09Z

@geofjamg I can confirm, the new API works for import, we have been using/testing for a week and have not encountered any issues, thankyou. As for export, should we make another ticket or will that also be handled under this ticket?

geofjamg · 2023-08-03T14:48:19Z

We can let this issue open until export is done

Haigutus · 2023-08-31T20:12:03Z

@geofjamg do you have any information, when the export to binary buffers could be expected?

Haigutus · 2024-02-23T07:14:28Z

Both import and export are implemented, thanks. This issue can be closed

sylvlecl self-assigned this May 24, 2022

EtienneLt mentioned this issue May 25, 2023

Ability to read a set of network from a zip without unzipping #606

Closed

obrix self-assigned this May 25, 2023

Haigutus mentioned this issue Jun 2, 2023

Support for BytesIO for pypowsybl Baltic-RCC/EMF#9

Closed

4 tasks

sylvlecl removed their assignment Jun 6, 2023

geofjamg mentioned this issue Nov 12, 2023

Save network to a binary buffer #672

Merged

5 tasks

geofjamg closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network loading API: support for binary streams #401

Network loading API: support for binary streams #401

sylvlecl commented May 23, 2022 •

edited

Loading

sylvlecl commented May 24, 2022

Haigutus commented May 31, 2022 •

edited

Loading

Haigutus commented Apr 14, 2023

Haigutus commented Apr 30, 2023

geofjamg commented May 2, 2023 •

edited

Loading

geofjamg commented May 2, 2023

Haigutus commented May 3, 2023

obrix commented May 25, 2023

geofjamg commented Jul 26, 2023

Haigutus commented Aug 3, 2023

geofjamg commented Aug 3, 2023

Haigutus commented Aug 31, 2023

Haigutus commented Feb 23, 2024

Network loading API: support for binary streams #401

Network loading API: support for binary streams #401

Comments

sylvlecl commented May 23, 2022 • edited Loading

sylvlecl commented May 24, 2022

Haigutus commented May 31, 2022 • edited Loading

Haigutus commented Apr 14, 2023

Haigutus commented Apr 30, 2023

geofjamg commented May 2, 2023 • edited Loading

geofjamg commented May 2, 2023

Haigutus commented May 3, 2023

obrix commented May 25, 2023

geofjamg commented Jul 26, 2023

Haigutus commented Aug 3, 2023

geofjamg commented Aug 3, 2023

Haigutus commented Aug 31, 2023

Haigutus commented Feb 23, 2024

sylvlecl commented May 23, 2022 •

edited

Loading

Haigutus commented May 31, 2022 •

edited

Loading

geofjamg commented May 2, 2023 •

edited

Loading