Skip to content

Commit

Permalink
Summarizer class object (#3)
Browse files Browse the repository at this point in the history
- introduces a `Summarizer` class object defined in `summarize.py`
- this helps organize/wrangle the many functions required for processing
& enables easier usage in python API

fairly straightforward
```python
from textsum.summarize import Summarizer

summ = Summarizer()
output_path = summ.summarize_file('test.txt', batch_length=1024)
print(f"summary saved to {output_path.resolve()}")
```

`Summarizer` of course, accepts several different kwargs for defining
model and so on

Signed-off-by: Peter <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
  • Loading branch information
pszemraj authored Jan 18, 2023
1 parent aa63119 commit f096278
Show file tree
Hide file tree
Showing 9 changed files with 582 additions and 489 deletions.
21 changes: 20 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1 +1,20 @@
# Changelog
### Changelog

All notable changes to this project will be documented in this file. Dates are displayed in UTC.

Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog).

#### [v0.0.5](https://github.com/pszemraj/textsum/compare/v0.0.1...v0.0.5)

> 16 January 2023
- Summarization Pipeline CLI [`#2`](https://github.com/pszemraj/textsum/pull/2)

#### v0.0.1

> 20 December 2022
- min working example [`#1`](https://github.com/pszemraj/textsum/pull/1)
- 🚚 migrate docsum space files [`a33b00c`](https://github.com/pszemraj/textsum/commit/a33b00c676add7db63a163b37f6ca6dba61d646b)
- 🎉 add pyscaffold skeleton [`cacaea3`](https://github.com/pszemraj/textsum/commit/cacaea3840ac620dedfcbdce8f92ae023fbf161b)
- Initial commit [`ec48913`](https://github.com/pszemraj/textsum/commit/ec48913456d314908838db7574183e21e698a066)
60 changes: 47 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,24 @@

> utility for using transformers summarization models on text docs
An extension/generalization of the [document summarization](<https://huggingface.co/spaces/pszemraj/document-summarization>) space on huggingface. The purpose of this package is to provide a simple interface for using summarization models on text documents of arbitrary length.
The purpose of this package is to provide a simple interface (python API, CLI, gradio web UI) for using summarization models on text documents of arbitrary length.

⚠️ **WARNING**: _This package is a WIP and is not ready for production use. Some things may not work yet._ ⚠️

## Installation

Install the package using pip:
Install using pip:

```bash
# create a virtual environment (optional)
pip install git+https://github.com/pszemraj/textsum.git
```

The textsum package is now installed in your virtual environment. You can now use the CLI or UI demo (see [Usage](#usage)).
The `textsum` package is now installed in your virtual environment. You can now use the CLI or python API to summarize text docs see the [Usage](#usage) section for more details.

### Full Installation _(PDF OCR, gradio UI demo)_
### Full Installation

To install all the dependencies _(includes PDF OCR, gradio UI demo)_, run:
To install all the dependencies _(includes PDF OCR, gradio UI demo, optimum, etc)_, run:

```bash
git clone https://github.com/pszemraj/textsum.git
Expand All @@ -42,6 +42,31 @@ pip install -e .[all]

## Usage

There are three ways to use this package:

1. [python API](#python-api)
2. [CLI](#cli)
3. [Demo App](#demo-app)

### Python API

```python
from textsum.summarize import Summarizer

summarizer = Summarizer() # loads default model and parameters

# summarize a long string
out_str = summarizer.summarize_string('This is a long string of text that will be summarized.')
print(f'summary: {out_str}')
```

you can also directly summarize a file:

```python
out_path = summarizer.summarize_file('/path/to/file.txt')
print(f'summary saved to {out_path}')
```

### CLI

To summarize a directory of text files, run the following command:
Expand All @@ -66,27 +91,36 @@ For more information, run:
textsum-dir --help
```

### UI Demo
### Demo App

For convenience, a UI demo[^1] is provided using [gradio](https://gradio.app/). To ensure you have the dependencies installed, clone the repo and run the following command:

```bash
pip install -e .[app]
```

For convenience, a UI demo is provided using [gradio](https://gradio.app/). To run the demo, run the following command:
To run the demo, run the following command:

```bash
textsum-ui
```

This is currently a minimal demo, but it will be expanded in the future to accept other arguments and options.
This will start a local server that you can access in your browser & a shareable link will be printed to the console.

[^1]: The demo is currently minimal, but will be expanded in the future to accept other arguments and options.

---

## Roadmap

- [ ] add argparse CLI for UI demo
- [x] add CLI for summarization of all text files in a directory
- [ ] python API for summarization of text docs
- [ ] optimum inference integration
- [ ] better documentation, details on improving performance (speed, quality, memory usage, etc.)
- [x] python API for summarization of text docs
- [ ] add argparse CLI for UI demo
- [ ] put on pypi
- [ ] optimum inference integration, LLM.int8 inference
- [ ] better documentation [in the wiki](https://github.com/pszemraj/textsum/wiki), details on improving performance (speed, quality, memory usage, etc.)

and other things I haven't thought of yet
_Other ideas? Open an issue or PR!_

---

Expand Down
5 changes: 3 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,10 @@ optimum = optimum
PDF =
python-doctr[torch]
pyspellchecker
app = gradio
all =
app =
gradio
%(PDF)s
all =
%(app)s
%(optimum)s
%(8bit)s
Expand Down
2 changes: 1 addition & 1 deletion src/textsum/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
import sys

from . import cli, utils
from . import summarize, utils

if sys.version_info[:2] >= (3, 8):
# TODO: Import directly (no need for conditional) when `python_requires = >= 3.8`
Expand Down
Loading

0 comments on commit f096278

Please sign in to comment.