Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text summarizer (based on BERT) as a service/API on fly #292

Open
geshan opened this issue Jul 22, 2020 · 10 comments
Open

Text summarizer (based on BERT) as a service/API on fly #292

geshan opened this issue Jul 22, 2020 · 10 comments
Labels
example An example project + readme

Comments

@geshan
Copy link

geshan commented Jul 22, 2020

I was playing around with an open source summarizer for creating executive sumamry of a given text. It is a generalization of a solution based on a paper which uses BERT (by Google) to summarize lectures. Basically give it something that is around 2k words and ask it to make it 20% it will come back with sentences it thinks are important which would make it around 400+ words (roughly 20% depending on sentence lengths).

It has multiple use cases like creating a news summary service (something to summarize all the Corona news for instance) or summarize any long text you need to read with a ML algo.

As it has a docker container with the project which gives out a REST API with Flask, I can quickly build that and make it work on fly.io with fly specific instructions on how to do it. I think this will be a good addition to the examples.

From my previous experience, this 3.5 GB container needs a lot of resources (given the ML model it uses). It needs like 2 GB of RAM to run, just a heads up. On the bright side, this translates to doc on how to scale up services for a heavy and useful application. Thanks!


PS: I am not a Machine Learning enthusiast, I had to solve a problem for a side project and basic googling landed me to this project. I even evaluated Meaning Cloud API but this repo was better at summarizing and less cost with virtually no limit on number of calls :).

@mrkurt mrkurt added the example An example project + readme label Jul 23, 2020
@mrkurt
Copy link
Member

mrkurt commented Jul 23, 2020

I like this a lot. We have a lot of people who've done TensorFlow apps for doing quick predictions to detect things like bots. That stuff is always heavy on the CPU, it sounds like an interesting thing to show people.

We can give you some credits for experimenting + running this on Fly while you build the example out, if it's helpful.

@geshan
Copy link
Author

geshan commented Jul 23, 2020

hey @mrkurt , let me know what you think of this: https://github.com/geshan/bert-extractive-summarizer , you can play around on this URL: https://summarizer.fly.dev/ , here is a quick Curl to try out : https://gist.github.com/geshan/0aba03355dc987892b3aa16f87f6eb0b . Let me know your thoughts, thanks!

@mrkurt
Copy link
Member

mrkurt commented Jul 28, 2020

@geshan Sorry for the slow response here, @codepope or I will work through this one this week.

@geshan
Copy link
Author

geshan commented Jul 28, 2020

@mrkurt its totally fine, I am working on the js-renderer-fly where @codepope 's comments have been very helpful, thanks!

@rizqventures
Copy link

rizqventures commented Jul 30, 2020 via email

@codepope
Copy link

Hi @geshan,

Looking over the readme, I think you might get better flow by opening with ...

I am not a Machine Learning (ML) enthusiast yet, but I was digging into the subject when I discovered the .... open source summariser. I wanted to do a quick deploy with it onto a service and start experimenting with text summarisation. For the service I have chosen Fly.io which deploys apps closer to the user so that it responds much faster.

Let's look at the summariser we'll be deploying....

etc....

That should smooth the flow at the start.

@geshan
Copy link
Author

geshan commented Jul 31, 2020

Hello @codepope ,

Thanks for pointing me to a good opener. Let's discuss the sections first this time as it will be easier to do it section by section. This is what is in my mind:


Bert Extractive Summarizer on Fly.io

--> the opener goes here.

Running the summarizer API on Fly.io

--> One line here

Prerequisites

--> 2 steps similar to puppeteer-js-renderer

Steps

--> Similar to puppeteer-js-renderer. I will remove the images, recheck it and main difference here will be the scaling to cpu2mem2 vm and the curl example. If you have a suggestion for the curl example I am open to it. Like a sample text from a wilki page or something of that sort.

Endless possibilites

--> I am not sure of this section too, so let me know your views.


If I need to add any new section please let me know of that too. I would like to be more structured this time :) as we have already worked on the puppeteer-js-renderer already. Hope to hear from you soon, thanks!

@geshan
Copy link
Author

geshan commented Jul 31, 2020

May be a summary of the first 3 paragraphs of - https://en.wikipedia.org/wiki/Wiki - one of the most viewed pages on wikipedia.

@codepope
Copy link

Looks reasonable as an outline.

I'd switch "Endless Possibilities" for "Possible Applications" and loosely suggest some ideas for things that would benefit from summarizing (news feeds, instructions, blog articles...) ...

Possible bonus section, leverage puppeteer-js and get it to extract some text from a URL and feed it to the summarizer. And refer people to the guide for that too.

I'd avoid summarizing Wikipedia articles as encylopedias tend to be short statements of fact which are either overly easy to summarize or terribly complex. For the example paragraphs, how about summarizing a recent blog post at Fly like the Sandbox and Isolation one (or at least see how it comes out).

@mrkurt
Copy link
Member

mrkurt commented Jul 31, 2020

@geshan We are migrating content discussions to our new community site. Will you please copy your topic for this over to the new forum: https://community.fly.io/c/write/writers-room/8

Just copy and paste on the original issue is fine, with a link back here.

@superfly superfly locked as resolved and limited conversation to collaborators Jul 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
example An example project + readme
Projects
None yet
Development

No branches or pull requests

4 participants