Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary dataset with filter option #12

Open
michalkrawczyk opened this issue May 16, 2023 · 0 comments
Open

Summary dataset with filter option #12

michalkrawczyk opened this issue May 16, 2023 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@michalkrawczyk
Copy link
Owner

(Currently only for short summaries)
datasets.PaperDataset allows to stack summaries by PDF filepath and later filter them by content.

Basic Input Operations:

  • refresh_summary - overwriting existing data of papers with newly created. This may be useful when prompts had been changed
  • add_papers_by_id - Adds papers to dataset by arxiv id. If they are not present - it also downloads them from Arxiv
  • search_and_add_papers(search_query: str, limit: float = 10.0, output_dir: str = ".") - Search papers on arxiv by specific query. If they are not present in file system - it also downloads them from Arxiv.
  • add_paper(filepath: str, reload_if_exist: bool = False) - load specific PDF file by file path. If reload_if_exist is set to True - overwrites data record with file if exists

Filter/View Options:

  • get_paper_by_filename - shows paper data by given filename (without exact path)
  • search_by_field_value(field: str, value: str, regex_search: bool = True) allows to search specific field (e.g. "New Features") for specific values. If regex_search is set to False - value must be exact match with field.
  • list_data_fields - shows all existing data fields in dataset
  • list_values_by_field - shows all existing values for specific field in dataset
  • list_of_papers - shows all loaded papers by filename as dict keys
@michalkrawczyk michalkrawczyk added the documentation Improvements or additions to documentation label May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant