Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* wip: putting together the scaffolding * demo working * wip: examples * added examples command * wip * setting interface output colors here results in ascii chars sent to redis * added gems * added example * added provider placeholder * wip: flowise api * added helper module from monadic-chat, wip: flowise api working * added setup instructions for python libs * wip ToT example, workflow architecture * added original example * moved cartridges to nano-bot registry * wip * added singleton class for spacy tasks * wip ERROR -- : No valid words found in the provided documents * Moved the require statement for text_processing_workflow to after other component requires. * Changed logging level from DEBUG to INFO.ECommented out most of the binding.pry breakpoints.EUpdated the AdvancedAnalysisTask:EEModified the file path for the advanced_analysis_cartridge.yml.EChanged the prompt for analysis to generate a short narrative. * Added more detailed logging during document processing.EModified the training process:EENow trains in iterations, printing progress.EOutputs more detailed model statistics.EEEUpdated the infer_topics method:EENow uses make_doc method.EHandles case where topic inference fails.EIdentifies and returns the most probable topic.EPrints full topic distribution. * Removed unused imports and dependencies,EReorganized require statements in flowbots.rb,EDeleted topic_modeler.rb file,ESimplified TextProcessor and TextSegmenter classes,EUpdated TextProcessingWorkflow to use get_topics,ERemoved Redis initialization from WorkflowOrchestrator * - Modularized topic modeling functionality - Improved error handling and logging - Updated Docker configuration - Removed unused segmentation code - Enhanced configuration management - Adjusted file paths and dependencies - Updated nano-bots submodule * - Extracted train_model and infer_topics methods - Improved error handling and logging throughout - Removed redundant code and improved readability - Added logger initialization in the constructor * adding tty-box functions * moved workflows, renamed components * future utils * wip: error handler * wip: ui * added error handling cartridge * seperated cli module from main * a nice and accurate exceptionhandler :) * snapshot * wip: almost back together * working in ohm * adjusted to output exception reports in markdown * 1. ExceptionAgent improvements: - Removed the "Relevant Files" section from exception reports, simplifying the output. 2. TopicModelProcessor enhancements: - Improved model loading and creation process with a new `load_or_create_model` method. - Enhanced `process` method to handle empty documents and ensure model existence. - Improved `train_model` method with better handling of empty documents and word counting. - Added more robust error handling and logging throughout. - Improved `save_model` method with checks for directory existence, write permissions, and disk space. - Enhanced `store_topics` method with better error handling and logging. 3. Task structure changes: - Modified the base `Task` class to no longer inherit from `Jongleur::WorkerTask`. - Updated specific task classes (LlmAnalysisTask, NlpAnalysisTask, TopicModelingTask) to inherit directly from `Jongleur::WorkerTask`. 4. UI improvements: - Simplified the `info` method in the UI module. 5. TextProcessingWorkflow updates: - Commented out some workflow steps (process_input, run_nlp_analysis, run_topic_modeling) in the `execute` method. - Changed logging to use UI.info instead of logger.info in the `run_workflow` method. * set messsages to print * added cartridges * snapshot: working * wip: created task to display results * removed redundant includes * assets * added cartridges * set text segmentation as its own task * added Fileloader task, added tokenizer, adjusted ohm models * wip: working, set batch_size or else large datasets overflow mem * Refactor topic modeling workflow and improve text processing pipeline This commit significantly updates the topic modeling workflow and text processing pipeline, improving efficiency and adding new features: 1. TopicModelTrainerWorkflow: - Implement batch processing with BATCH_SIZE constant - Add flush_redis_cache method for clean slate processing - Refactor process_files method to handle batches - Implement train_topic_model method with cleaning and filtering - Add clean_segments_for_modeling method to improve data quality 2. Task Updates: - Modify LoadTextFilesTask to process single files - Update TextSegmentTask, TokenizeSegmentsTask, and NlpAnalysisTask for single file processing - Refactor FilterSegmentsTask with improved logging and error handling - Add AccumulateFilteredSegmentsTask for batch accumulation - Update TrainTopicModelTask to handle accumulated segments 3. LLM Analysis: - Refactor LlmAnalysisTask to use preprocessed content and file metadata - Implement generate_analysis_prompt method for better LLM input 4. UI Improvements: - Add BoxUI module with side_by_side_boxes method for improved result display - Update DisplayResultsTask to use new BoxUI for better visualization 5. NLP Processing: - Refactor NLPProcessor to return more detailed token information - Update NlpAnalysisTask to handle new NLP processor output 6. Miscellaneous: - Remove unused code and comments - Update error handling and logging across multiple files - Improve code organization and readability This refactoring enhances the workflow's ability to handle large datasets efficiently, improves the quality of topic modeling input, and provides better visualization of results. * added treetop grammar, working on clean interrupt * wip: grammar parser * Refactor text processing workflow and improve YAML front matter parsing - Update GrammarProcessor to use Treetop grammar file - Simplify markdown_yaml.treetop grammar for better YAML parsing - Enhance PreprocessTextFileTask with improved error handling and logging - Modify TextSegmentTask to use preprocessed content - Add parallel processing support to flowbots.rb - Update CLI to use TopicModelTrainerWorkflow instead of test version - Improve error logging and context in GrammarProcessor - Enhance WorkflowOrchestrator cleanup process This commit significantly improves the text processing pipeline, particularly in handling YAML front matter in Markdown files. It also adds better error handling and logging throughout the workflow. * added sublayer gem * wip: text processing ohm models * renamed textfile to Sourcefile * wip: create new ohm objects * Key Components and Changes: 1. OhmModels.rb: - Introduction of OhmIndexManager module for managing Ohm model indices - Updates to Workflow, Task, and Sourcefile models - New methods for index management and file processing 2. WorkflowOrchestrator.rb: - Modifications to setup_workflow and run_workflow methods - Improved error handling and logging 3. flowbots.rb: - Addition of init_redis method for Redis setup - Removal of direct Redis configuration in favor of environment variables - Introduction of more specific error classes 4. New Components: - ExceptionHandler.rb for structured exception handling - FileLoader.rb for file processing and data storage - WorkflowAgent.rb for representing workflow agents 5. Task Updates: - Various task files updated or added (e.g., PreprocessTextFileTask, NlpAnalysisTask) - Tasks now work with updated Ohm models and new workflow structure 6. Workflow Updates: - TextProcessingWorkflow and TopicModelTrainerWorkflow adapted to new components and models 7. CLI Updates: - New commands and improved error handling * readme updates * last for the day * wip: full wf run * one nice thing, each error has been different. * edits * this one just to save, not working * check * edits * strip and refactor time! * Update README.md * adding exception reports for posterity * leaving examples * this works at least * update readme * set preprocess task to get the current_textfile_id in the workflow * add engtagger task wip: text compressor * added rdocs * documentation * extras * fix: linear logic for detecting file type * wip * Refactor tasks and implement uniform input retrieval (Epics 1 & 2) * added lemmas ohm model * ui improvements * UI improvements * cartridge updates * ui improvements * adjusted readme * submodule update
- Loading branch information