Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Access to Bedrock and Textract for CJS & Capability Data Science Teams #4740

Closed
K1Br opened this issue Jul 22, 2024 · 6 comments
Closed

Comments

@K1Br
Copy link

K1Br commented Jul 22, 2024

Describe the feature request.

Headline: Request for the CJS & Capability Data Science team to have access to use textract to get data in the right format from pdfs and the Claude generative AI models on the AP.

Describe the context.

One current projects, and potentially others in the pipelins for the Parole Board. The current proposed data is publically available. However, there be some additional internal data added subject to adding to the DPIA/sharing agreements.

For all projects, we need to be able to:

  1. Read and extract text and non text data from pdf stored in s3, ideally using amazon textract. We've tried other methods to little success.
  2. Test the capabilities of the different models. To retrieveand/or summarise relevant information.
  3. Run the models for inference in production. If the models meet our evaluation thresholds, we then want to be able to use them in production. In practice, this means things like the below for future development:
    3.1 Running a script on schedule using Airflow to send data to the models, receive outputs, and save those to an s3 bucket or database on AWS.
    3.2 Calling the models via API from a deployed streamlit application

Project details:
Parole Policy binders: Parole Board colleagues currently have a lot of guidance on SharePoint. Members must sift through lots of policy documents. This is problematic as it is not always clear which document to look in and the SharePoint search functionality isn't adequate.  They also may need bespoke advice for their case. We would deliver:

  1. Text analysis of policies and flagging which documents the advice has come from.
  2. Potentially served as an app pulling data from the pdfs and linking members to the places in the documents
  3. Members might also provide details of the case and have related policies flagged to them to consider.

Value / Purpose

We have tested open source methods of extracting text and other mediums from data. Due to the size and complexity of the documents (the documents are very long and may include a wide range of diagrams etc).

Impact:
Members more quickly able to access the right advice needed to progress the case they are working on and get through the parole board faster without decreasing quality of decisions. There are people of varied abilities and this additional tolling may provide extra support in their work.

User Types

Data scientists in CJS & Capability Data Science

@julialawrence
Copy link
Contributor

julialawrence commented Jul 23, 2024

Hiya, unless you need some kind of additional support in implementing Bedrock in your usecase, Bedrock can now be requested via a support ticket. When you open a ticket please list users and/or apps that need that access and region you want to use it in. A support request can be opened here: https://github.com/ministryofjustice/data-platform-support/issues

@RolakeO-mojo
Copy link

Hiya, unless you need some kind of additional support in implementing Bedrock in your usecase, Bedrock can now be requested via a support ticket. When you open a ticket please list users and/or apps that need that access and region you want to use it in. A support request can be opened here: https://github.com/ministryofjustice/data-platform-support/issues

Thanks for the response, would it be the same method of request for Amazon Textract?

@RolakeO-mojo
Copy link

@julialawrence can we get the access to Textract and implement Bedrock seperately?

@julialawrence
Copy link
Contributor

Apologies, I missed your question.

We don't currently offer textract which is why we don't provide it via a support request.

We have not had a chance to assess this request yet but are happy for you to request bedrock via our support process.

@RolakeO-mojo
Copy link

Thank you for your response @julialawrence , we will investigate other methods instead of Textract and request bedrock via the support process once we need to use it, I think this FR can be closed now.

@simon-pope
Copy link

@RolakeO-mojo Thanks for the update, I will close this Feature Request.

@simon-pope simon-pope moved this from 👀 TODO to 🎉 Done in Analytical Platform Sep 20, 2024
@simon-pope simon-pope closed this as completed by moving to 🎉 Done in Analytical Platform Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants