Summary dataset with filter option #12

michalkrawczyk · 2023-05-16T13:38:48Z

(Currently only for short summaries)
datasets.PaperDataset allows to stack summaries by PDF filepath and later filter them by content.

Basic Input Operations:

refresh_summary - overwriting existing data of papers with newly created. This may be useful when prompts had been changed
add_papers_by_id - Adds papers to dataset by arxiv id. If they are not present - it also downloads them from Arxiv
search_and_add_papers(search_query: str, limit: float = 10.0, output_dir: str = ".") - Search papers on arxiv by specific query. If they are not present in file system - it also downloads them from Arxiv.
add_paper(filepath: str, reload_if_exist: bool = False) - load specific PDF file by file path. If reload_if_exist is set to True - overwrites data record with file if exists

Filter/View Options:

get_paper_by_filename - shows paper data by given filename (without exact path)
search_by_field_value(field: str, value: str, regex_search: bool = True) allows to search specific field (e.g. "New Features") for specific values. If regex_search is set to False - value must be exact match with field.
list_data_fields - shows all existing data fields in dataset
list_values_by_field - shows all existing values for specific field in dataset
list_of_papers - shows all loaded papers by filename as dict keys

The text was updated successfully, but these errors were encountered:

michalkrawczyk added the documentation Improvements or additions to documentation label May 16, 2023

Provide feedback