Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moving current wip to main #6

Merged
merged 78 commits into from
Sep 25, 2024
Merged

moving current wip to main #6

merged 78 commits into from
Sep 25, 2024

Conversation

b08x
Copy link
Owner

@b08x b08x commented Sep 25, 2024

No description provided.

b08x added 30 commits July 5, 2024 17:44
…inding.pry breakpoints.EUpdated the AdvancedAnalysisTask:EEModified the file path for the advanced_analysis_cartridge.yml.EChanged the prompt for analysis to generate a short narrative.
…training process:EENow trains in iterations, printing progress.EOutputs more detailed model statistics.EEEUpdated the infer_topics method:EENow uses make_doc method.EHandles case where topic inference fails.EIdentifies and returns the most probable topic.EPrints full topic distribution.
…ts in flowbots.rb,EDeleted topic_modeler.rb file,ESimplified TextProcessor and TextSegmenter classes,EUpdated TextProcessingWorkflow to use get_topics,ERemoved Redis initialization from WorkflowOrchestrator
- Improved error handling and logging
- Updated Docker configuration
- Removed unused segmentation code
- Enhanced configuration management
- Adjusted file paths and dependencies
- Updated nano-bots submodule
- Improved error handling and logging throughout
- Removed redundant code and improved readability
- Added logger initialization in the constructor
b08x added 28 commits July 27, 2024 06:34
- Update GrammarProcessor to use Treetop grammar file
- Simplify markdown_yaml.treetop grammar for better YAML parsing
- Enhance PreprocessTextFileTask with improved error handling and logging
- Modify TextSegmentTask to use preprocessed content
- Add parallel processing support to flowbots.rb
- Update CLI to use TopicModelTrainerWorkflow instead of test version
- Improve error logging and context in GrammarProcessor
- Enhance WorkflowOrchestrator cleanup process

This commit significantly improves the text processing pipeline, 
particularly in handling YAML front matter in Markdown files. It also 
adds better error handling and logging throughout the workflow.
Renaming the Textfile model to FileObject.
Updating all references to Textfile to FileObject.
Modifying the FileLoader class to use the FileObject model.
Updating the InputRetrieval module to retrieve FileObject instances.
Adjusting the RedisKeys module to use keys related to FileObject.
Updating tasks and workflows to use the FileObject model.
- Reduce log file max size to 2,145,728 bytes
- Increase max number of log files to 100
- Comment out flush_redis_cache in unified_file_processing
- Add batch mode to TextProcessingWorkflow
- Implement separate processing for batch and single file modes
- Add methods for fetching unprocessed file IDs and creating/fetching file objects
- Update perform_additional_tasks to work with specific file IDs
@b08x b08x merged commit 844ca60 into main Sep 25, 2024
0 of 5 checks passed
@b08x b08x deleted the topicmodeler branch September 25, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant