Development (#12)

* wip: putting together the scaffolding * demo working * wip: examples * added examples command * wip * setting interface output colors here results in ascii chars sent to redis * added gems * added example * added provider placeholder * wip: flowise api * added helper module from monadic-chat, wip: flowise api working * added setup instructions for python libs * wip ToT example, workflow architecture * added original example * moved cartridges to nano-bot registry * wip * added singleton class for spacy tasks * wip ERROR -- : No valid words found in the provided documents * Moved the require statement for text_processing_workflow to after other component requires. * Changed logging level from DEBUG to INFO.ECommented out most of the binding.pry breakpoints.EUpdated the AdvancedAnalysisTask:EEModified the file path for the advanced_analysis_cartridge.yml.EChanged the prompt for analysis to generate a short narrative. * Added more detailed logging during document processing.EModified the training process:EENow trains in iterations, printing progress.EOutputs more detailed model statistics.EEEUpdated the infer_topics method:EENow uses make_doc method.EHandles case where topic inference fails.EIdentifies and returns the most probable topic.EPrints full topic distribution. * Removed unused imports and dependencies,EReorganized require statements in flowbots.rb,EDeleted topic_modeler.rb file,ESimplified TextProcessor and TextSegmenter classes,EUpdated TextProcessingWorkflow to use get_topics,ERemoved Redis initialization from WorkflowOrchestrator * - Modularized topic modeling functionality - Improved error handling and logging - Updated Docker configuration - Removed unused segmentation code - Enhanced configuration management - Adjusted file paths and dependencies - Updated nano-bots submodule * - Extracted train_model and infer_topics methods - Improved error handling and logging throughout - Removed redundant code and improved readability - Added logger initialization in the constructor * adding tty-box functions * moved workflows, renamed components * future utils * wip: error handler * wip: ui * added error handling cartridge * seperated cli module from main * a nice and accurate exceptionhandler :) * snapshot * wip: almost back together * working in ohm * adjusted to output exception reports in markdown * 1. ExceptionAgent improvements: - Removed the "Relevant Files" section from exception reports, simplifying the output. 2. TopicModelProcessor enhancements: - Improved model loading and creation process with a new `load_or_create_model` method. - Enhanced `process` method to handle empty documents and ensure model existence. - Improved `train_model` method with better handling of empty documents and word counting. - Added more robust error handling and logging throughout. - Improved `save_model` method with checks for directory existence, write permissions, and disk space. - Enhanced `store_topics` method with better error handling and logging. 3. Task structure changes: - Modified the base `Task` class to no longer inherit from `Jongleur::WorkerTask`. - Updated specific task classes (LlmAnalysisTask, NlpAnalysisTask, TopicModelingTask) to inherit directly from `Jongleur::WorkerTask`. 4. UI improvements: - Simplified the `info` method in the UI module. 5. TextProcessingWorkflow updates: - Commented out some workflow steps (process_input, run_nlp_analysis, run_topic_modeling) in the `execute` method. - Changed logging to use UI.info instead of logger.info in the `run_workflow` method. * set messsages to print * added cartridges * snapshot: working * wip: created task to display results * removed redundant includes * assets * added cartridges * set text segmentation as its own task * added Fileloader task, added tokenizer, adjusted ohm models * wip: working, set batch_size or else large datasets overflow mem * Refactor topic modeling workflow and improve text processing pipeline This commit significantly updates the topic modeling workflow and text processing pipeline, improving efficiency and adding new features: 1. TopicModelTrainerWorkflow: - Implement batch processing with BATCH_SIZE constant - Add flush_redis_cache method for clean slate processing - Refactor process_files method to handle batches - Implement train_topic_model method with cleaning and filtering - Add clean_segments_for_modeling method to improve data quality 2. Task Updates: - Modify LoadTextFilesTask to process single files - Update TextSegmentTask, TokenizeSegmentsTask, and NlpAnalysisTask for single file processing - Refactor FilterSegmentsTask with improved logging and error handling - Add AccumulateFilteredSegmentsTask for batch accumulation - Update TrainTopicModelTask to handle accumulated segments 3. LLM Analysis: - Refactor LlmAnalysisTask to use preprocessed content and file metadata - Implement generate_analysis_prompt method for better LLM input 4. UI Improvements: - Add BoxUI module with side_by_side_boxes method for improved result display - Update DisplayResultsTask to use new BoxUI for better visualization 5. NLP Processing: - Refactor NLPProcessor to return more detailed token information - Update NlpAnalysisTask to handle new NLP processor output 6. Miscellaneous: - Remove unused code and comments - Update error handling and logging across multiple files - Improve code organization and readability This refactoring enhances the workflow's ability to handle large datasets efficiently, improves the quality of topic modeling input, and provides better visualization of results. * added treetop grammar, working on clean interrupt * wip: grammar parser * Refactor text processing workflow and improve YAML front matter parsing - Update GrammarProcessor to use Treetop grammar file - Simplify markdown_yaml.treetop grammar for better YAML parsing - Enhance PreprocessTextFileTask with improved error handling and logging - Modify TextSegmentTask to use preprocessed content - Add parallel processing support to flowbots.rb - Update CLI to use TopicModelTrainerWorkflow instead of test version - Improve error logging and context in GrammarProcessor - Enhance WorkflowOrchestrator cleanup process This commit significantly improves the text processing pipeline, particularly in handling YAML front matter in Markdown files. It also adds better error handling and logging throughout the workflow. * added sublayer gem * wip: text processing ohm models * renamed textfile to Sourcefile * wip: create new ohm objects * Key Components and Changes: 1. OhmModels.rb: - Introduction of OhmIndexManager module for managing Ohm model indices - Updates to Workflow, Task, and Sourcefile models - New methods for index management and file processing 2. WorkflowOrchestrator.rb: - Modifications to setup_workflow and run_workflow methods - Improved error handling and logging 3. flowbots.rb: - Addition of init_redis method for Redis setup - Removal of direct Redis configuration in favor of environment variables - Introduction of more specific error classes 4. New Components: - ExceptionHandler.rb for structured exception handling - FileLoader.rb for file processing and data storage - WorkflowAgent.rb for representing workflow agents 5. Task Updates: - Various task files updated or added (e.g., PreprocessTextFileTask, NlpAnalysisTask) - Tasks now work with updated Ohm models and new workflow structure 6. Workflow Updates: - TextProcessingWorkflow and TopicModelTrainerWorkflow adapted to new components and models 7. CLI Updates: - New commands and improved error handling * readme updates * last for the day * wip: full wf run * one nice thing, each error has been different. * edits * this one just to save, not working * check * edits * strip and refactor time! * Update README.md * adding exception reports for posterity * leaving examples * this works at least * update readme * set preprocess task to get the current_textfile_id in the workflow * add engtagger task wip: text compressor * added rdocs * documentation * extras * fix: linear logic for detecting file type * wip * Refactor tasks and implement uniform input retrieval (Epics 1 & 2) * added lemmas ohm model * ui improvements * UI improvements * cartridge updates * ui improvements * adjusted readme * submodule update
b08x · Sep 25, 2024 · bcc7c3c · bcc7c3c
1 parent b508e5e
commit bcc7c3c
Show file tree

Hide file tree

Showing 169 changed files with 13,130 additions and 129 deletions.
diff --git a/Rakefile b/Rakefile
@@ -189,3 +189,105 @@ Rake::RDocTask.new do |rdoc|
   rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflow.rb"
   rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflowtest.rb"
 end
+
+desc "Build all images"
+task "build-all" do
+  ALL_IMAGES.each do |image|
+    Rake::Task["build/#{image}"].invoke
+  end
+end
+
+desc "Tag all images"
+task "tag-all" do
+  ALL_IMAGES.each do |image|
+    Rake::Task["tag/#{image}"].invoke
+  end
+end
+
+desc "Push all images"
+task "push-all" do
+  ALL_IMAGES.each do |image|
+    Rake::Task["push/#{image}"].invoke
+  end
+end
+
+Rake::RDocTask.new do |rdoc|
+  rdoc.title    = "flowbots v0.1"
+  rdoc.rdoc_dir = "#{APP_ROOT}/doc"
+  rdoc.options += [
+    "-w",
+    "2",
+    "-H",
+    "-A",
+    "-f",
+    "darkfish", # This bit
+    "-m",
+    "README.md",
+    "--visibility",
+    "nodoc",
+    "--markup",
+    "markdown"
+  ]
+  rdoc.rdoc_files.include "README.md"
+  rdoc.rdoc_files.include "LICENSE"
+  rdoc.rdoc_files.include "exe/flowbots"
+
+  rdoc.rdoc_files.include "lib/api.rb"
+  rdoc.rdoc_files.include "lib/cli.rb"
+  rdoc.rdoc_files.include "lib/flowbots.rb"
+  rdoc.rdoc_files.include "lib/helper.rb"
+  rdoc.rdoc_files.include "lib/logging.rb"
+  rdoc.rdoc_files.include "lib/tasks.rb"
+  rdoc.rdoc_files.include "lib/ui.rb"
+  rdoc.rdoc_files.include "lib/workflows.rb"
+
+  rdoc.rdoc_files.include "lib/integrations/flowise.rb"
+
+  rdoc.rdoc_files.include "lib/processors/GrammarProcessor.rb"
+  rdoc.rdoc_files.include "lib/processors/NLPProcessor.rb"
+  rdoc.rdoc_files.include "lib/processors/TextProcessor.rb"
+  rdoc.rdoc_files.include "lib/processors/TextSegmentProcessor.rb"
+  rdoc.rdoc_files.include "lib/processors/TextTaggerProcessor.rb"
+  rdoc.rdoc_files.include "lib/processors/TextTokenizeProcessor.rb"
+  rdoc.rdoc_files.include "lib/processors/TopicModelProcessor.rb"
+
+  rdoc.rdoc_files.include "lib/tasks/accumulate_filtered_segments_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/display_results_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/file_loader_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/filter_segments_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/llm_analysis_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/load_text_files_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/nlp_analysis_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/preprocess_text_file_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/text_segment_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/text_tagger_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/text_tokenize_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/tokenize_segments_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/topic_modeling_task.rb"
+  rdoc.rdoc_files.include "lib/tasks/train_topic_model_task.rb"
+
+  rdoc.rdoc_files.include "lib/components/ExceptionAgent.rb"
+  rdoc.rdoc_files.include "lib/components/ExceptionHandler.rb"
+  rdoc.rdoc_files.include "lib/components/FileLoader.rb"
+  rdoc.rdoc_files.include "lib/components/OhmModels.rb"
+  rdoc.rdoc_files.include "lib/components/WorkflowAgent.rb"
+  rdoc.rdoc_files.include "lib/components/WorkflowOrchestrator.rb"
+  rdoc.rdoc_files.include "lib/components/word_salad.rb"
+
+  rdoc.rdoc_files.include "lib/grammars/markdown_yaml.rb"
+
+  rdoc.rdoc_files.include "lib/utils/command.rb"
+  rdoc.rdoc_files.include "lib/utils/transcribe.rb"
+  rdoc.rdoc_files.include "lib/utils/tts.rb"
+  rdoc.rdoc_files.include "lib/utils/writefile.rb"
+
+  rdoc.rdoc_files.include "lib/workflows/text_processing_workflow.rb"
+  rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflow.rb"
+  rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflowtest.rb"
+end
+
+Gokdok::Dokker.new do |gd|
+  gd.remote_path = "" # Put into the root directory
+  gd.repo_url = "[email protected]:b08x/flowbots.git"
+  gd.doc_home = "#{APP_ROOT}/doc"
+end
diff --git a/compressed_prompt_test.rb b/compressed_prompt_test.rb
@@ -1,20 +1,20 @@
 
 **Evaluation Test:**
 
-Now, to evaluate the effectiveness of the condensed prompt, I will employ various criteria: 
+Now, to evaluate the effectiveness of the condensed prompt, I will employ various criteria:
 
-- Clarity: Does the optimized version provide a clear and concise explanation of the benefits of using a visual representation? Does it eliminate ambiguity and ensure that the message is easily understandable? 
+- Clarity: Does the optimized version provide a clear and concise explanation of the benefits of using a visual representation? Does it eliminate ambiguity and ensure that the message is easily understandable?
 
-Evaluation: The optimized version excels in clarity. By elaborating on the advantages of the visual representation, it provides a more explicit explanation. The use of phrases like "elucidating the inherent sequential nature" and "providing insights into the flow of data" offers a precise understanding of how visuals aid in comprehension and communication. The message is straightforward and free from ambiguity, making it easily comprehensible to the target audience. 
+Evaluation: The optimized version excels in clarity. By elaborating on the advantages of the visual representation, it provides a more explicit explanation. The use of phrases like "elucidating the inherent sequential nature" and "providing insights into the flow of data" offers a precise understanding of how visuals aid in comprehension and communication. The message is straightforward and free from ambiguity, making it easily comprehensible to the target audience.
 
-- Completeness: Does the optimized version retain all the crucial information from the original prompt? Does it provide a comprehensive understanding of the topic without omitting relevant details? 
+- Completeness: Does the optimized version retain all the crucial information from the original prompt? Does it provide a comprehensive understanding of the topic without omitting relevant details?
 
-Evaluation: The optimized version effectively maintains the completeness of information. It expands on the ideas presented in the original prompt, offering a more detailed yet concise explanation. The addition of phrases like "distinct process stages" and "shared understanding of the system's architecture" ensures that the full scope of the visual representation's utility is conveyed. No relevant details have been omitted, and the optimized version successfully captures the core message of the original prompt. 
+Evaluation: The optimized version effectively maintains the completeness of information. It expands on the ideas presented in the original prompt, offering a more detailed yet concise explanation. The addition of phrases like "distinct process stages" and "shared understanding of the system's architecture" ensures that the full scope of the visual representation's utility is conveyed. No relevant details have been omitted, and the optimized version successfully captures the core message of the original prompt.
 
-- Eliciting Desired Responses: Will the optimized version be more effective in eliciting the desired response from the AI assistant or model? Does it provide a clear directive, enabling the AI to generate a more accurate and contextually appropriate response? 
+- Eliciting Desired Responses: Will the optimized version be more effective in eliciting the desired response from the AI assistant or model? Does it provide a clear directive, enabling the AI to generate a more accurate and contextually appropriate response?
 
-Evaluation: The optimized version is designed to elicit a more focused and accurate response from the AI assistant. By providing additional context and clarity, the AI has a better understanding of the specific benefits attributed to the visual representation. The use of phrases like "graphical depiction," "shared understanding," and "communication of complex ideas" offers a clear framework for the AI to generate a response that aligns with the prompt's intent. The optimized version reduces potential ambiguity and enhances the likelihood of receiving a contextually relevant and high-quality response from the AI. 
+Evaluation: The optimized version is designed to elicit a more focused and accurate response from the AI assistant. By providing additional context and clarity, the AI has a better understanding of the specific benefits attributed to the visual representation. The use of phrases like "graphical depiction," "shared understanding," and "communication of complex ideas" offers a clear framework for the AI to generate a response that aligns with the prompt's intent. The optimized version reduces potential ambiguity and enhances the likelihood of receiving a contextually relevant and high-quality response from the AI.
 
-Overall Conclusion: 
+Overall Conclusion:
 
 Based on the evaluation test, the optimized version of the prompt demonstrates superior effectiveness compared to the original. It achieves a higher standard of clarity by providing explicit and detailed explanations while maintaining the completeness of the information conveyed. The optimized version is also tailored to elicit more accurate and contextually appropriate responses from AI assistants or models, ensuring a more productive and efficient interaction. This comprehensive test underscores the value of careful prompt design and analysis, highlighting the potential for enhanced AI performance and output quality.