-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network loading API: support for binary streams #401
Comments
For reference, pandas uses those unions for typing file inputs: # filenames and file-like-objects
Buffer = Union[IO[AnyStr], RawIOBase, BufferedIOBase, TextIOBase, TextIOWrapper, mmap]
FileOrBuffer = Union[str, Buffer[AnyStr]]
FilePathOrBuffer = Union["PathLike[str]", FileOrBuffer[AnyStr]] |
Network.load_bytest(list(object1, object2, objectN) where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside |
any updates on this? |
It seems there should be change done already on JAVA side on things Currenlty: Proposal to change Network.load implementation:
Then on PY side of things one could implement creation of DataSource from file_like objects This would help integrations in any language as it would remove main filesystem dependacy form the source code |
An easy solution could be to fully load the byte stream on python side and then transfer as it is to Java side but it would not be memory efficient. The right way (and hardest!) is to somehow connect python |
If we want to support this, we have to first detect if the byte stream is a zip or not (using magic number) and then we will be able to map it to the right DataSource implementation on Java side. |
Maybe the GraalVM api helps here with https://www.graalvm.org/sdk/javadoc/index.html?org/graalvm/nativeimage/c/type/CTypeConversion.html |
FYI following discussions with @geofjamg I am working to support io.BytesIO as an input to load a network. This mean the content have to be fully loaded in memory but it should work with ascii based network description and also zip file as a memory blob (require a pull request on powsybl core to add a InMemoryZipFileDataSource as the existing ZipFileDataSource reload from a file). |
@Haigutus last release 0.24 supports BytesIO for network loading. You can provides to |
@geofjamg I can confirm, the new API works for import, we have been using/testing for a week and have not encountered any issues, thankyou. As for export, should we make another ticket or will that also be handled under this ticket? |
We can let this issue open until export is done |
@geofjamg do you have any information, when the export to binary buffers could be expected? |
Both import and export are implemented, thanks. This issue can be closed |
Feature.
We have 2 loading methods:
load
: takes a file path as argumentload_from_string
: takes the content of a file, as a string, as argument. Does not support byte entries.So, in particular, in-memory "blobs" cannot be provided as arguments.
Streaming content is not supported either.
As proposed in #144, we should support byte streams (file object) as arguments to
load
.Type checking and runtime behaviour
After some digging, it seems there is not a very standard way of type cheking for file objects, neither at runtime nor at typecheck time.
We have
typing.BinaryIO
and various classes inio
module. Butio.BinaryIO
exposes much more methods than necessary:we could go for a lighter protocol.
As an example, pandas lib seems to perform typechecking by using a union of many types (see comment below), and at runtime will only check for the presence of a
read
method.--> a good, mixed approach could be to define a simple protocol with at least
read
method, and check for their presence at runtime.String input handling
We should deprecate the
load_from_string
method:users can provide an in memory buffer instead, with
io.BytesIO
for example.Once question remains: do we allow only to input binary IO (such as
io.BytesIO
), or also text IO (such asio.StringIO
) ?The latter could be handy for text formats.
If we want to distinguish between the 2, there is no standard way ... For example, pandas ends up looking for 'b' character in "mode", and also checking the actual class of the object (against a predefined set of class including
io.TextIOWrapper
etc).A use case is being able to load a network from in memory zip content.
The text was updated successfully, but these errors were encountered: