Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(loaders): CSV Loader to Document Loaders #30

Closed
wants to merge 4 commits into from

Conversation

Tachikoma000
Copy link
Contributor

This PR implements a CSV loader as part of the document loaders module in Rig. It allows users to easily load and process CSV documents for use in RAG systems and other document processing tasks.

Changes

  • Implemented CsvLoader struct in src/document_loaders/csv.rs
  • Added CsvLoader to the document_loaders module
  • Implemented DocumentLoader trait for CsvLoader
  • Used the csv crate for CSV parsing
  • Added error handling for file operations and CSV parsing
  • Updated Cargo.toml with the csv dependency
  • Updated documentation with CsvLoader usage examples

Implementation Details
The CsvLoader uses the csv crate to parse CSV files and extract content. It handles potential errors such as file not found or parsing errors. The extracted content is converted into a single DocumentEmbeddings object for further processing in Rig. Each row of the CSV is formatted as "header: value" pairs, separated by newlines.

Testing
Ran tests to ensure the CsvLoader correctly loads CSV files and handles various edge cases. The tests covered:

  • Loading a valid CSV file
  • Handling a non-existent file
  • Processing a CSV with multiple columns and rows
  • Dealing with empty CSV files
  • Handling CSV files with different delimiters

Documentation
Code files are commented, and usage examples have been added to the documentation.

Related Issue
Closes #29

Checklist

  • Code follows the project's coding style
  • Tests have been added and all tests pass
  • Documentation has been updated
  • Commit messages are clear and descriptive
  • Changes have been reviewed for potential performance impacts

Additional Notes
This implementation focuses on converting CSV data into a single document for embedding. Future enhancements could include options for creating separate embeddings for each row or handling more complex CSV structures.

- Implement PdfLoader struct in src/document_loaders/pdf.rs
- Add PdfLoader to document_loaders module
- Implement DocumentLoader trait for PdfLoader
- Use lopdf crate for PDF parsing
- Add error handling for file operations and PDF parsing
- Update Cargo.toml with lopdf dependency
- Add unit tests for PdfLoader
- Update documentation with PdfLoader usage examples
- Comments added to pdf.rs, mod.rs, document_loaders.rs
- Implement CsvLoader struct in src/document_loaders/csv.rs
- Add CsvLoader to the document_loaders module
- Implement DocumentLoader trait for CsvLoader
- Use csv crate for CSV parsing
- Add error handling for file operations and CSV parsing
- Update Cargo.toml with csv dependency
- Update documentation with CsvLoader usage examples
- Addressed a Clippy warning by replacing `push_str("\n")` with `push('\n')` for better performance and code quality.
@Tachikoma000 Tachikoma000 changed the title Feat/CSV Loader to Document Loaders feat(Loaders): CSV Loader to Document Loaders Sep 19, 2024
@Tachikoma000 Tachikoma000 changed the title feat(Loaders): CSV Loader to Document Loaders feat(loaders): CSV Loader to Document Loaders Sep 19, 2024
@cvauclair
Copy link
Contributor

cvauclair commented Oct 14, 2024

This is being (re-)implemented in #55

@cvauclair cvauclair closed this Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: CSV loader support
2 participants