Feature/file reporter #78

Hrsj123 · 2024-02-12T14:44:34Z

Closes #74

Here, I have written a generic FileReporter according to your recommendations mentioned in PR #74.
To specialize the method to both reading and writing to file in any particular style, we can utilize the Parser class to define read and write methods.
I have defined JSONParser and YAMLFileParser as sample in file.py for handling read and write operations in json and yaml files.
Since the FileReporter read and write functions utilized the Parser class, it automatically adjusts the read and write methods based on the file extension we are writing to.

Registering a Parser

class MyCustomParser(Parser):
        def parse_file(self, records):
            # Implement your custom parsing logic here
            ...
        def write_records(self, records):
            # Implement your custom file writing logic here
            ...
# Register your custom parser with a distinct file type
MyCustomParser.register("my_custom_format")

Once the parser is registered with a particular extension, we can utilize that parser implicitly based on the extension of file_name we want to write to.

Do let me know your views and recommendations on these changes.

…feature/fileReporter

nicholasjng

Thanks - this looks promising, but we'll have to tweak it slightly to make it as effective as possible.

If anything is unclear, either ping me here or I can push some aspects as I have them in my head to your branch if you prefer.

nicholasjng · 2024-02-12T15:22:58Z

src/nnbench/reporter/file.py

+from nnbench.types import BenchmarkRecord
+
+
+class Parser:


Just for clarification: We're not actually parsing things when we are reading files, that is the job of the respective modules (json|yaml|toml).

nicholasjng · 2024-02-12T15:28:13Z

src/nnbench/reporter/file.py

+# Register custom parsers here
+parsers = {"json": JsonParser, "yaml": YamlParser}


I like the idea of making it a module-wide default, but we should take care that

a) the variable is private (i.e. has a leading underscore) to prevent accidental export, and
b) we should make the value structure of the map as easy as possible.

To address b), I would start by making it a tuple[ser, de] where ser is a Callable[[IO, dict[str, Any]], None], i.e., a function taking a file descriptor in write mode and a record and writing it to a file, and de being a Callable[[IO], dict[str, Any]], a function taking a file descriptor in read mode and returning the loaded record.

You can then register the SerDe factories based on whether the necessary packages are installed (json is available out of the box, yaml and toml are not).

Should I use a class-based approach to register the SerDe factories (like I used in Parser class) ?
One other option is to use a simple register method which takes 3 arguments (i.e., Ser and De functions and a file_type) as arguments.
Also, I think the class-based approach is much more concise to define the ser and de methods on the user side.

I'm happy with either, though the optional import part (i.e., erroring if toml/yaml are not installed) will be a bit easier in the class case.

For now, I think the quickest way is a functional approach, though. Maybe like this:

_file_loaders: dict[str, tuple[Any, Any]] = {} def yaml_load(fp: IO, options=None): try: import yaml except ImportError: raise ModuleNotFoundError("`pyyaml` is not installed") # takes no options, but the slot is useful for passing options to file loaders. obj = yaml.safe_load(fp) return BenchmarkRecord(context=obj["context"], benchmarks=obj["benchmarks"]) def yaml_save(record: BenchmarkRecord, fp: IO, options=None) -> None: try: import yaml except ImportError: raise ModuleNotFoundError("`pyyaml` is not installed") yaml.safe_dump(record, fp, **(options or {}) _file_loaders["yaml"] = (yaml_save, yaml_load)

With an option of defining e.g. a register_file_io(ser, de) later to do the dict insertion if we want.

src/nnbench/reporter/file.py

nicholasjng · 2024-02-12T15:34:51Z

src/nnbench/reporter/file.py

+        if not self.dir:
+            raise BaseException("Directory is not initialized")
+        file_path = os.path.join(self.dir, file_name)
+        file_type = file_name.split(".")[1]
+        try:
+            with open(file_path) as file:
+                data = file.read()
+                parsed_data = parse_records(data, file_type)
+                return parsed_data
+        except FileNotFoundError:
+            raise ValueError(f"Could not read the file: {file_path}")


This needs the restructured file loading dict I talked about earlier, but in essence, all that should happen here is the file being loaded with open (like you did), and then calling the deserializer on the opened file.

In particular, no error handling for open() should be necessary, since those are informative enough for the user on their own.

nicholasjng · 2024-02-12T15:35:31Z

src/nnbench/reporter/file.py

+        if not self.dir:
+            raise BaseException("Directory is not initialized")
+
+        file_path = os.path.join(self.dir, file_name)
+        if not os.path.exists(file_path):  # Create the file
+            with open(file_path, "w") as file:
+                file.write("")
+        try:
+            parsed_records = self.read(file_name)
+            file_type = file_name.split(".")[1]
+            new_records = append_record_to_records(parsed_records, record, file_type)
+            with open(file_path, "w") as file:
+                file.write(new_records)
+        except FileNotFoundError:
+            raise ValueError(f"Could not read the file: {file_path}")


Same here, just load the serializer from the dict and call it on the opened file.

nicholasjng · 2024-02-12T15:37:14Z

src/nnbench/reporter/file.py

+    def finalize(self) -> None:
+        del self.dir


This does not do what you think it does - it just stages the self.dir variable for garbage collection.
To safely remove the directory (which you might not want to do anyway, we could add a flag in the constructor for that?), you should call shutil.rmtree(self.dir, ignore_errors=True). (Though you might want to check existence first and set ignore_errors to False.

…ded changes to read and write methods of base `BenchmarkReporter`.

Hrsj123 · 2024-02-13T18:07:01Z

I've changed the value value of the _file_loader registry and made the necessary changes recommended by you. Do review the changes and let me know if anything else can be improved.

nicholasjng · 2024-02-14T14:41:07Z

Thanks!

To get you up to speed to the current situation, we (I) have found that a file reporter is necessary as a building block for the duckDB reporter implementation that I started in #75. For this reason, I implemented a file reporter myself in that PR, which we will go ahead with ASAP.

That is not meant as a knock on you - you did a great job here, and a lot of the logic you brought in here also made it into #75, we just had to speed things up a bit to consolidate the interface. If you are interested, I'd like to welcome you to help improve on it (especially the file driver registration/deregistration hooks you gave here come to mind)!

I'm going to open some tickets related to improvement of that file IO solution, which I can assign to you if you want, just let me know by commenting on the respective issue.

Hrsj123 added 3 commits February 11, 2024 23:20

Created FileReporter and Parser

c31395d

Implemented FileReporter and Parser

f8c8371

Merge branch 'main' of https://github.com/aai-institute/nnbench into …

4c4b966

…feature/fileReporter

Hrsj123 marked this pull request as draft February 12, 2024 15:06

nicholasjng reviewed Feb 12, 2024

View reviewed changes

Changed the file loaders value structure in reporter/file.py and ad…

3c3b850

…ded changes to read and write methods of base `BenchmarkReporter`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/file reporter #78

Feature/file reporter #78

Hrsj123 commented Feb 12, 2024

nicholasjng left a comment

nicholasjng Feb 12, 2024

nicholasjng Feb 12, 2024

Hrsj123 Feb 13, 2024

nicholasjng Feb 13, 2024

nicholasjng Feb 12, 2024

nicholasjng Feb 12, 2024

nicholasjng Feb 12, 2024

Hrsj123 commented Feb 13, 2024

nicholasjng commented Feb 14, 2024

		# Register custom parsers here
		parsers = {"json": JsonParser, "yaml": YamlParser}

Feature/file reporter #78

Are you sure you want to change the base?

Feature/file reporter #78

Conversation

Hrsj123 commented Feb 12, 2024

Registering a Parser

nicholasjng left a comment

Choose a reason for hiding this comment

nicholasjng Feb 12, 2024

Choose a reason for hiding this comment

nicholasjng Feb 12, 2024

Choose a reason for hiding this comment

Hrsj123 Feb 13, 2024

Choose a reason for hiding this comment

nicholasjng Feb 13, 2024

Choose a reason for hiding this comment

nicholasjng Feb 12, 2024

Choose a reason for hiding this comment

nicholasjng Feb 12, 2024

Choose a reason for hiding this comment

nicholasjng Feb 12, 2024

Choose a reason for hiding this comment

Hrsj123 commented Feb 13, 2024

nicholasjng commented Feb 14, 2024