Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: loader improvements (chunking, pipeline integration) #161

Open
1 of 5 tasks
0xMochan opened this issue Dec 19, 2024 · 0 comments
Open
1 of 5 tasks

feat: loader improvements (chunking, pipeline integration) #161

0xMochan opened this issue Dec 19, 2024 · 0 comments
Labels

Comments

@0xMochan
Copy link
Contributor

  • I have looked for existing issues (including closed) about this

Feature Request

Loaders could do with some more features to create a more complete experience.

  • Add a more straightforward chunking experience (including multiple strategies, isolated + overlapping)
  • Add pipeline integration by implementing a new method or the Op and TryOp traits

Motivation

Loaders first implementations are straightforward, we should continue to grow the aspect of incorporating knowledge for the utility of RAGs by improving our interface.

Proposal

  • Add .chunk with various options to the loaders
  • Add Op and/or TryOp trait implementations

Alternatives

  • Add chunking as apart of the pipeline system instead
    • This is a fairly interesting thought as chunking is a generic way of breaking of text. This could also apply to length LLM output streams which is natural from a pipeline POV. However, it's mostly used for loaders and the context restraint being only placed on loaders makes for seamless helpers as anyone could just use iterators on their own to create custom chunking if needed.
  • Create iterator helper methods directly rather than tying it to the context of loaders.
    • Also another approach which would increase it's flexibility. It would exist more as a loaders helper / utils which would be leveraged in common use-cases for chunking strategies (perhaps the loaders implementation just wraps the internal iterator extenders).

Notes

Loaders share some sense of ideology but the different loaders do not share any traits or implementations so it's very easy to duplicate code. Might want to investigate how some logic could be intelligently shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant