Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[community] Refactor PebbloRetrievalQA #84

Closed
wants to merge 27 commits into from

Conversation

Raj725
Copy link
Collaborator

@Raj725 Raj725 commented Aug 14, 2024

Thank you for contributing to LangChain!

  • PR title: "package: description"

    • Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
    • Example: "community: add foobar LLM"
  • PR message: Delete this entire checklist and replace with

    • Description: a description of the change
    • Issue: the issue # it fixes, if applicable
    • Dependencies: any dependencies required for this change
    • Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
  • Add tests and docs: If you're adding a new integration, please include

    1. a test for the integration, preferably unit tests that do not rely on network access,
    2. an example notebook showing its use. It lives in docs/docs/integrations directory.
  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:

  • Make sure optional dependencies are imported within a function.
  • Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
  • Most PRs should not touch more than one package.
  • Changes should be backwards compatible.
  • If you are adding something to community, do not re-import it in langchain.

If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

@Raj725 Raj725 requested review from srics and gr8nishan August 14, 2024 05:49
Copy link
Collaborator

@gr8nishan gr8nishan Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_acall function does not give call to prompt API. Is prompt API call in _call function not required ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt API call in _acall is needed, but the team member who worked on it didn't add it. Since _acall is an async function, the API calls inside it should also be async.

I'll include it in the next PR. Keeping this PR to refactoring only since it's already over 500 lines.

pebblo_resp = None
payload = app.dict(exclude_unset=True)

if self.classifier_location == "local":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if someone has set the classifier location as local and also has sent the API key will we be calling classifier from both daxa and local.. Is it the desired flow ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This PR doesn't change the existing flow—just refactored the existing code.

"Request: method %s, url %s, body %s len %s response status %s body %s",
method,
response.request.url,
str(response.request.body),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not be logging body here as it might print some info which is PII

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was logged. This line was in all the API calls, including prompt/governance, so I left it as is. I'll remove it if it's not needed. Since it's a debug log, it shouldn't be an issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed from both the places if not in this then next iteration.. The issue I see is if someone is trying to debug the RAG app, this info might get absorbed in the centralized log and people who may not have access to the data might get access to the info in the log.

resp.get("retrieval_data", {}).get("prompt", {})
)
payload["prompt"].pop("data")
context = payload["context"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use get here in payload["response"], payload["prompt"], payload["context"]

@Raj725 Raj725 force-pushed the refactor/PebbloRetrievalQA branch 4 times, most recently from 49e0549 to 330d10e Compare August 21, 2024 04:47
daziz and others added 22 commits August 21, 2024 08:49
…in-ai#25605)

Complete missing arguments in api doc of `PineconeVectorStore`.
…ader (langchain-ai#25593)

- **Description:** Updating metadata for sharepoint loader with full
path i.e., webUrl
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA

Co-authored-by: dristy.cd <[email protected]>
Co-authored-by: ccurme <[email protected]>
Otherwise I've got KeyError from `fastembeds`
…-ai#25619)

Mistral 7B occasionally fails tool-calling tests. Updating to Mixtral
appears to improve this.
and update langsmtih example selector docs
This ensures version 0.15.7+ is pulled. 
This version of unstructured uses a version of NLTK >= 3.8.2 that has a
fix for a critical CVE:
GHSA-cgvx-9447-vcch
Thank you for contributing to LangChain!


**Description:** Adding `BoxRetriever` for langchain_box. This retriever
handles two use cases:
* Retrieve all documents that match a full-text search
* Retrieve the answer to a Box AI prompt as a Document

**Twitter handle:** @BoxPlatform


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <[email protected]>
…tation (langchain-ai#25430)

### Summary

Create `langchain-databricks` as a new partner packages. This PR does
not migrate all existing Databricks integration, but the package will
eventually contain:

* `ChatDatabricks` (implemented in this PR)
* `DatabricksVectorSearch`
* `DatabricksEmbeddings`
* ~`UCFunctionToolkit`~ (will be done after UC SDK work which
drastically simplify implementation)

Also, this PR does not add integration tests yet. This will be added
once the Databricks test workspace is ready.

Tagging @efriis as POC


### Tracker
[✍️] Create a package and imgrate ChatDatabricks
[ ] Migrate DatabricksVectorSearch, DatabricksEmbeddings, and their docs
~[ ] Migrate UCFunctionToolkit and its doc~
[ ] Add provider document and update README.md
[ ] Add integration tests and set up secrets (after moved to an external
package)
[ ] Add deprecation note to the community implementations.

---------

Signed-off-by: B-Step62 <[email protected]>
Co-authored-by: Erick Friis <[email protected]>
@Raj725 Raj725 force-pushed the refactor/PebbloRetrievalQA branch from 7aa9b62 to 1d2942a Compare August 22, 2024 03:41
@Raj725 Raj725 closed this Aug 22, 2024
@Raj725 Raj725 deleted the refactor/PebbloRetrievalQA branch August 22, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.