From bcc7c3c34406d378d830e79d1d717e9899ce226e Mon Sep 17 00:00:00 2001 From: Robert Pannick Date: Wed, 25 Sep 2024 08:47:20 -0400 Subject: [PATCH] Development (#12) * wip: putting together the scaffolding * demo working * wip: examples * added examples command * wip * setting interface output colors here results in ascii chars sent to redis * added gems * added example * added provider placeholder * wip: flowise api * added helper module from monadic-chat, wip: flowise api working * added setup instructions for python libs * wip ToT example, workflow architecture * added original example * moved cartridges to nano-bot registry * wip * added singleton class for spacy tasks * wip ERROR -- : No valid words found in the provided documents * Moved the require statement for text_processing_workflow to after other component requires. * Changed logging level from DEBUG to INFO.ECommented out most of the binding.pry breakpoints.EUpdated the AdvancedAnalysisTask:EEModified the file path for the advanced_analysis_cartridge.yml.EChanged the prompt for analysis to generate a short narrative. * Added more detailed logging during document processing.EModified the training process:EENow trains in iterations, printing progress.EOutputs more detailed model statistics.EEEUpdated the infer_topics method:EENow uses make_doc method.EHandles case where topic inference fails.EIdentifies and returns the most probable topic.EPrints full topic distribution. * Removed unused imports and dependencies,EReorganized require statements in flowbots.rb,EDeleted topic_modeler.rb file,ESimplified TextProcessor and TextSegmenter classes,EUpdated TextProcessingWorkflow to use get_topics,ERemoved Redis initialization from WorkflowOrchestrator * - Modularized topic modeling functionality - Improved error handling and logging - Updated Docker configuration - Removed unused segmentation code - Enhanced configuration management - Adjusted file paths and dependencies - Updated nano-bots submodule * - Extracted train_model and infer_topics methods - Improved error handling and logging throughout - Removed redundant code and improved readability - Added logger initialization in the constructor * adding tty-box functions * moved workflows, renamed components * future utils * wip: error handler * wip: ui * added error handling cartridge * seperated cli module from main * a nice and accurate exceptionhandler :) * snapshot * wip: almost back together * working in ohm * adjusted to output exception reports in markdown * 1. ExceptionAgent improvements: - Removed the "Relevant Files" section from exception reports, simplifying the output. 2. TopicModelProcessor enhancements: - Improved model loading and creation process with a new `load_or_create_model` method. - Enhanced `process` method to handle empty documents and ensure model existence. - Improved `train_model` method with better handling of empty documents and word counting. - Added more robust error handling and logging throughout. - Improved `save_model` method with checks for directory existence, write permissions, and disk space. - Enhanced `store_topics` method with better error handling and logging. 3. Task structure changes: - Modified the base `Task` class to no longer inherit from `Jongleur::WorkerTask`. - Updated specific task classes (LlmAnalysisTask, NlpAnalysisTask, TopicModelingTask) to inherit directly from `Jongleur::WorkerTask`. 4. UI improvements: - Simplified the `info` method in the UI module. 5. TextProcessingWorkflow updates: - Commented out some workflow steps (process_input, run_nlp_analysis, run_topic_modeling) in the `execute` method. - Changed logging to use UI.info instead of logger.info in the `run_workflow` method. * set messsages to print * added cartridges * snapshot: working * wip: created task to display results * removed redundant includes * assets * added cartridges * set text segmentation as its own task * added Fileloader task, added tokenizer, adjusted ohm models * wip: working, set batch_size or else large datasets overflow mem * Refactor topic modeling workflow and improve text processing pipeline This commit significantly updates the topic modeling workflow and text processing pipeline, improving efficiency and adding new features: 1. TopicModelTrainerWorkflow: - Implement batch processing with BATCH_SIZE constant - Add flush_redis_cache method for clean slate processing - Refactor process_files method to handle batches - Implement train_topic_model method with cleaning and filtering - Add clean_segments_for_modeling method to improve data quality 2. Task Updates: - Modify LoadTextFilesTask to process single files - Update TextSegmentTask, TokenizeSegmentsTask, and NlpAnalysisTask for single file processing - Refactor FilterSegmentsTask with improved logging and error handling - Add AccumulateFilteredSegmentsTask for batch accumulation - Update TrainTopicModelTask to handle accumulated segments 3. LLM Analysis: - Refactor LlmAnalysisTask to use preprocessed content and file metadata - Implement generate_analysis_prompt method for better LLM input 4. UI Improvements: - Add BoxUI module with side_by_side_boxes method for improved result display - Update DisplayResultsTask to use new BoxUI for better visualization 5. NLP Processing: - Refactor NLPProcessor to return more detailed token information - Update NlpAnalysisTask to handle new NLP processor output 6. Miscellaneous: - Remove unused code and comments - Update error handling and logging across multiple files - Improve code organization and readability This refactoring enhances the workflow's ability to handle large datasets efficiently, improves the quality of topic modeling input, and provides better visualization of results. * added treetop grammar, working on clean interrupt * wip: grammar parser * Refactor text processing workflow and improve YAML front matter parsing - Update GrammarProcessor to use Treetop grammar file - Simplify markdown_yaml.treetop grammar for better YAML parsing - Enhance PreprocessTextFileTask with improved error handling and logging - Modify TextSegmentTask to use preprocessed content - Add parallel processing support to flowbots.rb - Update CLI to use TopicModelTrainerWorkflow instead of test version - Improve error logging and context in GrammarProcessor - Enhance WorkflowOrchestrator cleanup process This commit significantly improves the text processing pipeline, particularly in handling YAML front matter in Markdown files. It also adds better error handling and logging throughout the workflow. * added sublayer gem * wip: text processing ohm models * renamed textfile to Sourcefile * wip: create new ohm objects * Key Components and Changes: 1. OhmModels.rb: - Introduction of OhmIndexManager module for managing Ohm model indices - Updates to Workflow, Task, and Sourcefile models - New methods for index management and file processing 2. WorkflowOrchestrator.rb: - Modifications to setup_workflow and run_workflow methods - Improved error handling and logging 3. flowbots.rb: - Addition of init_redis method for Redis setup - Removal of direct Redis configuration in favor of environment variables - Introduction of more specific error classes 4. New Components: - ExceptionHandler.rb for structured exception handling - FileLoader.rb for file processing and data storage - WorkflowAgent.rb for representing workflow agents 5. Task Updates: - Various task files updated or added (e.g., PreprocessTextFileTask, NlpAnalysisTask) - Tasks now work with updated Ohm models and new workflow structure 6. Workflow Updates: - TextProcessingWorkflow and TopicModelTrainerWorkflow adapted to new components and models 7. CLI Updates: - New commands and improved error handling * readme updates * last for the day * wip: full wf run * one nice thing, each error has been different. * edits * this one just to save, not working * check * edits * strip and refactor time! * Update README.md * adding exception reports for posterity * leaving examples * this works at least * update readme * set preprocess task to get the current_textfile_id in the workflow * add engtagger task wip: text compressor * added rdocs * documentation * extras * fix: linear logic for detecting file type * wip * Refactor tasks and implement uniform input retrieval (Epics 1 & 2) * added lemmas ohm model * ui improvements * UI improvements * cartridge updates * ui improvements * adjusted readme * submodule update --- Rakefile | 102 ++++ compressed_prompt_test.rb | 16 +- doc/BoxUI.html | 271 +++++++++ doc/Flowbots/UI.html | 285 ++++++++++ doc/PreprocessTextFileTask.html | 293 ++++++++++ doc/TextTaggerTask.html | 15 +- doc/TextTokenizeTask.html | 15 +- doc/Textfile.html | 456 +++++++++++++++ doc/TokenizeSegmentsTask.html | 15 +- doc/TopicModelingTask.html | 23 +- doc/TrainTopicModelTask.html | 15 +- doc/UIBox.html | 245 ++++++++ doc/WorkflowAgent.html | 19 +- doc/WorkflowOrchestrator.html | 15 +- doc/index.html | 3 +- doc/js/search_index.js | 2 +- doc/js/search_index.js.gz | Bin 14516 -> 10496 bytes doc/table_of_contents.html | 1 - examples/agileBloom.rb | 173 ++++++ examples/crazy-story-gen.rb | 134 +++++ examples/crazy-story-genv2.rb | 130 +++++ examples/llm_analysis.rb | 49 ++ examples/text_analysis_workflow2.rb | 121 ++++ examples/tree-of-thoughts.rb | 233 ++++++++ .../exception_report_20240716_115957.md | 67 +++ .../exception_report_20240716_120556.md | 61 ++ .../exception_report_20240716_120827.md | 59 ++ .../exception_report_20240716_121921.md | 63 +++ .../exception_report_20240716_122845.md | 69 +++ .../exception_report_20240716_123021.md | 66 +++ .../exception_report_20240716_123526.md | 64 +++ .../exception_report_20240716_124043.md | 70 +++ .../exception_report_20240716_124418.md | 57 ++ .../exception_report_20240716_125232.md | 62 +++ .../exception_report_20240716_125402.md | 65 +++ .../exception_report_20240716_125459.md | 56 ++ .../exception_report_20240716_130418.md | 60 ++ .../exception_report_20240716_131051.md | 61 ++ .../exception_report_20240716_131202.md | 62 +++ .../exception_report_20240716_131311.md | 62 +++ .../exception_report_20240716_133135.md | 57 ++ .../exception_report_20240716_143838.md | 68 +++ .../exception_report_20240716_171356.md | 66 +++ .../exception_report_20240716_172747.md | 50 ++ .../exception_report_20240716_173704.md | 45 ++ .../exception_report_20240716_173948.md | 44 ++ .../exception_report_20240716_174247.md | 70 +++ .../exception_report_20240716_174346.md | 42 ++ .../exception_report_20240716_174620.md | 48 ++ .../exception_report_20240716_175140.md | 55 ++ .../exception_report_20240716_175200.md | 65 +++ .../exception_report_20240716_175428.md | 89 +++ .../exception_report_20240716_175620.md | 111 ++++ .../exception_report_20240716_180146.md | 54 ++ .../exception_report_20240716_180303.md | 88 +++ .../exception_report_20240716_180746.md | 43 ++ .../exception_report_20240716_181219.md | 54 ++ .../exception_report_20240716_181325.md | 98 ++++ .../exception_report_20240716_181407.md | 47 ++ .../exception_report_20240716_181645.md | 43 ++ .../exception_report_20240716_182656.md | 44 ++ .../exception_report_20240716_183231.md | 55 ++ .../exception_report_20240716_183357.md | 45 ++ .../exception_report_20240716_183619.md | 73 +++ .../exception_report_20240716_183714.md | 44 ++ .../exception_report_20240716_184657.md | 52 ++ .../exception_report_20240716_184921.md | 50 ++ .../exception_report_20240716_185055.md | 50 ++ .../exception_report_20240716_185439.md | 236 ++++++++ .../exception_report_20240716_185729.md | 65 +++ .../exception_report_20240716_185959.md | 40 ++ .../exception_report_20240716_190125.md | 78 +++ .../exception_report_20240716_190318.md | 60 ++ .../exception_report_20240716_190433.md | 57 ++ .../exception_report_20240716_191401.md | 68 +++ .../exception_report_20240716_191440.md | 66 +++ .../exception_report_20240716_191750.md | 77 +++ .../exception_report_20240716_191948.md | 43 ++ .../exception_report_20240716_192633.md | 41 ++ .../exception_report_20240716_192932.md | 79 +++ .../exception_report_20240716_232004.md | 57 ++ .../exception_report_20240716_232107.md | 62 +++ .../exception_report_20240716_232628.md | 70 +++ .../exception_report_20240716_233021.md | 162 ++++++ .../exception_report_20240720_191059.md | 134 +++++ .../exception_report_20240720_192539.md | 74 +++ .../exception_report_20240720_192728.md | 125 +++++ .../exception_report_20240720_193431.md | 65 +++ .../exception_report_20240720_193849.md | 109 ++++ .../exception_report_20240720_194725.md | 49 ++ .../exception_report_20240720_220052.md | 41 ++ .../exception_report_20240720_222604.md | 46 ++ .../exception_report_20240720_225122.md | 183 ++++++ .../exception_report_20240720_230245.md | 59 ++ .../exception_report_20240720_230836.md | 65 +++ .../exception_report_20240720_232832.md | 54 ++ .../exception_report_20240727_013422.md | 230 ++++++++ .../exception_report_20240727_013631.md | 80 +++ .../exception_report_20240727_014142.md | 524 ++++++++++++++++++ .../exception_report_20240727_014616.md | 120 ++++ .../exception_report_20240727_014728.md | 143 +++++ .../exception_report_20240727_015016.md | 61 ++ .../exception_report_20240727_015143.md | 90 +++ .../exception_report_20240727_015651.md | 82 +++ .../exception_report_20240727_015940.md | 46 ++ .../exception_report_20240727_020003.md | 84 +++ .../exception_report_20240727_020236.md | 83 +++ .../exception_report_20240727_020459.md | 126 +++++ .../exception_report_20240727_020900.md | 115 ++++ .../exception_report_20240727_021209.md | 290 ++++++++++ .../exception_report_20240727_043105.md | 61 ++ .../exception_report_20240727_043421.md | 56 ++ .../exception_report_20240727_043538.md | 70 +++ .../exception_report_20240727_044025.md | 62 +++ .../exception_report_20240727_044106.md | 99 ++++ .../exception_report_20240727_044125.md | 69 +++ .../exception_report_20240727_044153.md | 76 +++ .../exception_report_20240731_105234.md | 36 ++ .../exception_report_20240731_110151.md | 157 ++++++ .../exception_report_20240731_110531.md | 38 ++ .../exception_report_20240731_110832.md | 163 ++++++ .../exception_report_20240731_111121.md | 61 ++ .../exception_report_20240731_111604.md | 41 ++ .../exception_report_20240731_112054.md | 41 ++ .../exception_report_20240731_112600.md | 41 ++ .../exception_report_20240731_112637.md | 97 ++++ .../exception_report_20240731_112734.md | 41 ++ .../exception_report_20240731_112829.md | 65 +++ .../exception_report_20240731_113031.md | 69 +++ .../exception_report_20240731_113239.md | 71 +++ .../exception_report_20240731_114248.md | 65 +++ .../exception_report_20240731_151440.md | 56 ++ .../exception_report_20240731_152047.md | 63 +++ .../exception_report_20240731_152235.md | 56 ++ .../exception_report_20240731_153002.md | 81 +++ .../exception_report_20240731_153050.md | 65 +++ .../exception_report_20240731_153126.md | 65 +++ .../exception_report_20240731_153216.md | 70 +++ .../exception_report_20240731_153821.md | 56 ++ .../exception_report_20240731_154930.md | 73 +++ .../exception_report_20240731_155512.md | 70 +++ .../exception_report_20240731_155905.md | 62 +++ .../exception_report_20240731_161555.md | 58 ++ .../exception_report_20240731_162132.md | 68 +++ .../exception_report_20240731_162437.md | 54 ++ .../exception_report_20240731_165736.md | 71 +++ .../exception_report_20240801_072049.md | 38 ++ .../exception_report_20240801_072425.md | 38 ++ .../exception_report_20240801_074652.md | 64 +++ .../exception_report_20240801_074800.md | 59 ++ .../exception_report_20240801_081744.md | 76 +++ .../exception_report_20240801_082408.md | 68 +++ .../exception_report_20240801_123817.md | 35 ++ .../exception_report_20240801_124144.md | 35 ++ .../exception_report_20240801_124201.md | 35 ++ .../exception_report_20240801_124358.md | 35 ++ .../exception_report_20240801_124942.md | 34 ++ .../exception_report_20240801_125243.md | 36 ++ .../exception_report_20240801_131009.md | 35 ++ final_report.md | 110 ++-- lib/components/task_helper.rb | 41 ++ lib/ohm/DocumentProcessing.rb | 97 ++++ lib/tasks/base_task.rb | 27 + lib/tasks/batch_completion_task.rb | 35 ++ lib/tasks/preprocess_text_file_task.og | 32 ++ lib/tasks/preprocess_text_file_task.rb | 56 ++ lib/tasks/workflow_initializer_task.rb | 65 +++ lib/utils/errors.rb | 13 + test/topic_model_trainer_workflowtest.rb | 124 +++++ 169 files changed, 13130 insertions(+), 129 deletions(-) create mode 100644 doc/BoxUI.html create mode 100644 doc/Flowbots/UI.html create mode 100644 doc/PreprocessTextFileTask.html create mode 100644 doc/Textfile.html create mode 100644 doc/UIBox.html create mode 100644 examples/agileBloom.rb create mode 100755 examples/crazy-story-gen.rb create mode 100644 examples/crazy-story-genv2.rb create mode 100644 examples/llm_analysis.rb create mode 100644 examples/text_analysis_workflow2.rb create mode 100644 examples/tree-of-thoughts.rb create mode 100644 exception_reports/exception_report_20240716_115957.md create mode 100644 exception_reports/exception_report_20240716_120556.md create mode 100644 exception_reports/exception_report_20240716_120827.md create mode 100644 exception_reports/exception_report_20240716_121921.md create mode 100644 exception_reports/exception_report_20240716_122845.md create mode 100644 exception_reports/exception_report_20240716_123021.md create mode 100644 exception_reports/exception_report_20240716_123526.md create mode 100644 exception_reports/exception_report_20240716_124043.md create mode 100644 exception_reports/exception_report_20240716_124418.md create mode 100644 exception_reports/exception_report_20240716_125232.md create mode 100644 exception_reports/exception_report_20240716_125402.md create mode 100644 exception_reports/exception_report_20240716_125459.md create mode 100644 exception_reports/exception_report_20240716_130418.md create mode 100644 exception_reports/exception_report_20240716_131051.md create mode 100644 exception_reports/exception_report_20240716_131202.md create mode 100644 exception_reports/exception_report_20240716_131311.md create mode 100644 exception_reports/exception_report_20240716_133135.md create mode 100644 exception_reports/exception_report_20240716_143838.md create mode 100644 exception_reports/exception_report_20240716_171356.md create mode 100644 exception_reports/exception_report_20240716_172747.md create mode 100644 exception_reports/exception_report_20240716_173704.md create mode 100644 exception_reports/exception_report_20240716_173948.md create mode 100644 exception_reports/exception_report_20240716_174247.md create mode 100644 exception_reports/exception_report_20240716_174346.md create mode 100644 exception_reports/exception_report_20240716_174620.md create mode 100644 exception_reports/exception_report_20240716_175140.md create mode 100644 exception_reports/exception_report_20240716_175200.md create mode 100644 exception_reports/exception_report_20240716_175428.md create mode 100644 exception_reports/exception_report_20240716_175620.md create mode 100644 exception_reports/exception_report_20240716_180146.md create mode 100644 exception_reports/exception_report_20240716_180303.md create mode 100644 exception_reports/exception_report_20240716_180746.md create mode 100644 exception_reports/exception_report_20240716_181219.md create mode 100644 exception_reports/exception_report_20240716_181325.md create mode 100644 exception_reports/exception_report_20240716_181407.md create mode 100644 exception_reports/exception_report_20240716_181645.md create mode 100644 exception_reports/exception_report_20240716_182656.md create mode 100644 exception_reports/exception_report_20240716_183231.md create mode 100644 exception_reports/exception_report_20240716_183357.md create mode 100644 exception_reports/exception_report_20240716_183619.md create mode 100644 exception_reports/exception_report_20240716_183714.md create mode 100644 exception_reports/exception_report_20240716_184657.md create mode 100644 exception_reports/exception_report_20240716_184921.md create mode 100644 exception_reports/exception_report_20240716_185055.md create mode 100644 exception_reports/exception_report_20240716_185439.md create mode 100644 exception_reports/exception_report_20240716_185729.md create mode 100644 exception_reports/exception_report_20240716_185959.md create mode 100644 exception_reports/exception_report_20240716_190125.md create mode 100644 exception_reports/exception_report_20240716_190318.md create mode 100644 exception_reports/exception_report_20240716_190433.md create mode 100644 exception_reports/exception_report_20240716_191401.md create mode 100644 exception_reports/exception_report_20240716_191440.md create mode 100644 exception_reports/exception_report_20240716_191750.md create mode 100644 exception_reports/exception_report_20240716_191948.md create mode 100644 exception_reports/exception_report_20240716_192633.md create mode 100644 exception_reports/exception_report_20240716_192932.md create mode 100644 exception_reports/exception_report_20240716_232004.md create mode 100644 exception_reports/exception_report_20240716_232107.md create mode 100644 exception_reports/exception_report_20240716_232628.md create mode 100644 exception_reports/exception_report_20240716_233021.md create mode 100644 exception_reports/exception_report_20240720_191059.md create mode 100644 exception_reports/exception_report_20240720_192539.md create mode 100644 exception_reports/exception_report_20240720_192728.md create mode 100644 exception_reports/exception_report_20240720_193431.md create mode 100644 exception_reports/exception_report_20240720_193849.md create mode 100644 exception_reports/exception_report_20240720_194725.md create mode 100644 exception_reports/exception_report_20240720_220052.md create mode 100644 exception_reports/exception_report_20240720_222604.md create mode 100644 exception_reports/exception_report_20240720_225122.md create mode 100644 exception_reports/exception_report_20240720_230245.md create mode 100644 exception_reports/exception_report_20240720_230836.md create mode 100644 exception_reports/exception_report_20240720_232832.md create mode 100644 exception_reports/exception_report_20240727_013422.md create mode 100644 exception_reports/exception_report_20240727_013631.md create mode 100644 exception_reports/exception_report_20240727_014142.md create mode 100644 exception_reports/exception_report_20240727_014616.md create mode 100644 exception_reports/exception_report_20240727_014728.md create mode 100644 exception_reports/exception_report_20240727_015016.md create mode 100644 exception_reports/exception_report_20240727_015143.md create mode 100644 exception_reports/exception_report_20240727_015651.md create mode 100644 exception_reports/exception_report_20240727_015940.md create mode 100644 exception_reports/exception_report_20240727_020003.md create mode 100644 exception_reports/exception_report_20240727_020236.md create mode 100644 exception_reports/exception_report_20240727_020459.md create mode 100644 exception_reports/exception_report_20240727_020900.md create mode 100644 exception_reports/exception_report_20240727_021209.md create mode 100644 exception_reports/exception_report_20240727_043105.md create mode 100644 exception_reports/exception_report_20240727_043421.md create mode 100644 exception_reports/exception_report_20240727_043538.md create mode 100644 exception_reports/exception_report_20240727_044025.md create mode 100644 exception_reports/exception_report_20240727_044106.md create mode 100644 exception_reports/exception_report_20240727_044125.md create mode 100644 exception_reports/exception_report_20240727_044153.md create mode 100644 exception_reports/exception_report_20240731_105234.md create mode 100644 exception_reports/exception_report_20240731_110151.md create mode 100644 exception_reports/exception_report_20240731_110531.md create mode 100644 exception_reports/exception_report_20240731_110832.md create mode 100644 exception_reports/exception_report_20240731_111121.md create mode 100644 exception_reports/exception_report_20240731_111604.md create mode 100644 exception_reports/exception_report_20240731_112054.md create mode 100644 exception_reports/exception_report_20240731_112600.md create mode 100644 exception_reports/exception_report_20240731_112637.md create mode 100644 exception_reports/exception_report_20240731_112734.md create mode 100644 exception_reports/exception_report_20240731_112829.md create mode 100644 exception_reports/exception_report_20240731_113031.md create mode 100644 exception_reports/exception_report_20240731_113239.md create mode 100644 exception_reports/exception_report_20240731_114248.md create mode 100644 exception_reports/exception_report_20240731_151440.md create mode 100644 exception_reports/exception_report_20240731_152047.md create mode 100644 exception_reports/exception_report_20240731_152235.md create mode 100644 exception_reports/exception_report_20240731_153002.md create mode 100644 exception_reports/exception_report_20240731_153050.md create mode 100644 exception_reports/exception_report_20240731_153126.md create mode 100644 exception_reports/exception_report_20240731_153216.md create mode 100644 exception_reports/exception_report_20240731_153821.md create mode 100644 exception_reports/exception_report_20240731_154930.md create mode 100644 exception_reports/exception_report_20240731_155512.md create mode 100644 exception_reports/exception_report_20240731_155905.md create mode 100644 exception_reports/exception_report_20240731_161555.md create mode 100644 exception_reports/exception_report_20240731_162132.md create mode 100644 exception_reports/exception_report_20240731_162437.md create mode 100644 exception_reports/exception_report_20240731_165736.md create mode 100644 exception_reports/exception_report_20240801_072049.md create mode 100644 exception_reports/exception_report_20240801_072425.md create mode 100644 exception_reports/exception_report_20240801_074652.md create mode 100644 exception_reports/exception_report_20240801_074800.md create mode 100644 exception_reports/exception_report_20240801_081744.md create mode 100644 exception_reports/exception_report_20240801_082408.md create mode 100644 exception_reports/exception_report_20240801_123817.md create mode 100644 exception_reports/exception_report_20240801_124144.md create mode 100644 exception_reports/exception_report_20240801_124201.md create mode 100644 exception_reports/exception_report_20240801_124358.md create mode 100644 exception_reports/exception_report_20240801_124942.md create mode 100644 exception_reports/exception_report_20240801_125243.md create mode 100644 exception_reports/exception_report_20240801_131009.md create mode 100644 lib/components/task_helper.rb create mode 100644 lib/ohm/DocumentProcessing.rb create mode 100644 lib/tasks/base_task.rb create mode 100644 lib/tasks/batch_completion_task.rb create mode 100644 lib/tasks/preprocess_text_file_task.og create mode 100644 lib/tasks/preprocess_text_file_task.rb create mode 100644 lib/tasks/workflow_initializer_task.rb create mode 100644 lib/utils/errors.rb create mode 100644 test/topic_model_trainer_workflowtest.rb diff --git a/Rakefile b/Rakefile index 06ef59a..f8c9a71 100644 --- a/Rakefile +++ b/Rakefile @@ -189,3 +189,105 @@ Rake::RDocTask.new do |rdoc| rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflow.rb" rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflowtest.rb" end + +desc "Build all images" +task "build-all" do + ALL_IMAGES.each do |image| + Rake::Task["build/#{image}"].invoke + end +end + +desc "Tag all images" +task "tag-all" do + ALL_IMAGES.each do |image| + Rake::Task["tag/#{image}"].invoke + end +end + +desc "Push all images" +task "push-all" do + ALL_IMAGES.each do |image| + Rake::Task["push/#{image}"].invoke + end +end + +Rake::RDocTask.new do |rdoc| + rdoc.title = "flowbots v0.1" + rdoc.rdoc_dir = "#{APP_ROOT}/doc" + rdoc.options += [ + "-w", + "2", + "-H", + "-A", + "-f", + "darkfish", # This bit + "-m", + "README.md", + "--visibility", + "nodoc", + "--markup", + "markdown" + ] + rdoc.rdoc_files.include "README.md" + rdoc.rdoc_files.include "LICENSE" + rdoc.rdoc_files.include "exe/flowbots" + + rdoc.rdoc_files.include "lib/api.rb" + rdoc.rdoc_files.include "lib/cli.rb" + rdoc.rdoc_files.include "lib/flowbots.rb" + rdoc.rdoc_files.include "lib/helper.rb" + rdoc.rdoc_files.include "lib/logging.rb" + rdoc.rdoc_files.include "lib/tasks.rb" + rdoc.rdoc_files.include "lib/ui.rb" + rdoc.rdoc_files.include "lib/workflows.rb" + + rdoc.rdoc_files.include "lib/integrations/flowise.rb" + + rdoc.rdoc_files.include "lib/processors/GrammarProcessor.rb" + rdoc.rdoc_files.include "lib/processors/NLPProcessor.rb" + rdoc.rdoc_files.include "lib/processors/TextProcessor.rb" + rdoc.rdoc_files.include "lib/processors/TextSegmentProcessor.rb" + rdoc.rdoc_files.include "lib/processors/TextTaggerProcessor.rb" + rdoc.rdoc_files.include "lib/processors/TextTokenizeProcessor.rb" + rdoc.rdoc_files.include "lib/processors/TopicModelProcessor.rb" + + rdoc.rdoc_files.include "lib/tasks/accumulate_filtered_segments_task.rb" + rdoc.rdoc_files.include "lib/tasks/display_results_task.rb" + rdoc.rdoc_files.include "lib/tasks/file_loader_task.rb" + rdoc.rdoc_files.include "lib/tasks/filter_segments_task.rb" + rdoc.rdoc_files.include "lib/tasks/llm_analysis_task.rb" + rdoc.rdoc_files.include "lib/tasks/load_text_files_task.rb" + rdoc.rdoc_files.include "lib/tasks/nlp_analysis_task.rb" + rdoc.rdoc_files.include "lib/tasks/preprocess_text_file_task.rb" + rdoc.rdoc_files.include "lib/tasks/text_segment_task.rb" + rdoc.rdoc_files.include "lib/tasks/text_tagger_task.rb" + rdoc.rdoc_files.include "lib/tasks/text_tokenize_task.rb" + rdoc.rdoc_files.include "lib/tasks/tokenize_segments_task.rb" + rdoc.rdoc_files.include "lib/tasks/topic_modeling_task.rb" + rdoc.rdoc_files.include "lib/tasks/train_topic_model_task.rb" + + rdoc.rdoc_files.include "lib/components/ExceptionAgent.rb" + rdoc.rdoc_files.include "lib/components/ExceptionHandler.rb" + rdoc.rdoc_files.include "lib/components/FileLoader.rb" + rdoc.rdoc_files.include "lib/components/OhmModels.rb" + rdoc.rdoc_files.include "lib/components/WorkflowAgent.rb" + rdoc.rdoc_files.include "lib/components/WorkflowOrchestrator.rb" + rdoc.rdoc_files.include "lib/components/word_salad.rb" + + rdoc.rdoc_files.include "lib/grammars/markdown_yaml.rb" + + rdoc.rdoc_files.include "lib/utils/command.rb" + rdoc.rdoc_files.include "lib/utils/transcribe.rb" + rdoc.rdoc_files.include "lib/utils/tts.rb" + rdoc.rdoc_files.include "lib/utils/writefile.rb" + + rdoc.rdoc_files.include "lib/workflows/text_processing_workflow.rb" + rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflow.rb" + rdoc.rdoc_files.include "lib/workflows/topic_model_trainer_workflowtest.rb" +end + +Gokdok::Dokker.new do |gd| + gd.remote_path = "" # Put into the root directory + gd.repo_url = "git@github.com:b08x/flowbots.git" + gd.doc_home = "#{APP_ROOT}/doc" +end diff --git a/compressed_prompt_test.rb b/compressed_prompt_test.rb index 3d8bc08..c9029d0 100644 --- a/compressed_prompt_test.rb +++ b/compressed_prompt_test.rb @@ -1,20 +1,20 @@ **Evaluation Test:** -Now, to evaluate the effectiveness of the condensed prompt, I will employ various criteria: +Now, to evaluate the effectiveness of the condensed prompt, I will employ various criteria: -- Clarity: Does the optimized version provide a clear and concise explanation of the benefits of using a visual representation? Does it eliminate ambiguity and ensure that the message is easily understandable? +- Clarity: Does the optimized version provide a clear and concise explanation of the benefits of using a visual representation? Does it eliminate ambiguity and ensure that the message is easily understandable? -Evaluation: The optimized version excels in clarity. By elaborating on the advantages of the visual representation, it provides a more explicit explanation. The use of phrases like "elucidating the inherent sequential nature" and "providing insights into the flow of data" offers a precise understanding of how visuals aid in comprehension and communication. The message is straightforward and free from ambiguity, making it easily comprehensible to the target audience. +Evaluation: The optimized version excels in clarity. By elaborating on the advantages of the visual representation, it provides a more explicit explanation. The use of phrases like "elucidating the inherent sequential nature" and "providing insights into the flow of data" offers a precise understanding of how visuals aid in comprehension and communication. The message is straightforward and free from ambiguity, making it easily comprehensible to the target audience. -- Completeness: Does the optimized version retain all the crucial information from the original prompt? Does it provide a comprehensive understanding of the topic without omitting relevant details? +- Completeness: Does the optimized version retain all the crucial information from the original prompt? Does it provide a comprehensive understanding of the topic without omitting relevant details? -Evaluation: The optimized version effectively maintains the completeness of information. It expands on the ideas presented in the original prompt, offering a more detailed yet concise explanation. The addition of phrases like "distinct process stages" and "shared understanding of the system's architecture" ensures that the full scope of the visual representation's utility is conveyed. No relevant details have been omitted, and the optimized version successfully captures the core message of the original prompt. +Evaluation: The optimized version effectively maintains the completeness of information. It expands on the ideas presented in the original prompt, offering a more detailed yet concise explanation. The addition of phrases like "distinct process stages" and "shared understanding of the system's architecture" ensures that the full scope of the visual representation's utility is conveyed. No relevant details have been omitted, and the optimized version successfully captures the core message of the original prompt. -- Eliciting Desired Responses: Will the optimized version be more effective in eliciting the desired response from the AI assistant or model? Does it provide a clear directive, enabling the AI to generate a more accurate and contextually appropriate response? +- Eliciting Desired Responses: Will the optimized version be more effective in eliciting the desired response from the AI assistant or model? Does it provide a clear directive, enabling the AI to generate a more accurate and contextually appropriate response? -Evaluation: The optimized version is designed to elicit a more focused and accurate response from the AI assistant. By providing additional context and clarity, the AI has a better understanding of the specific benefits attributed to the visual representation. The use of phrases like "graphical depiction," "shared understanding," and "communication of complex ideas" offers a clear framework for the AI to generate a response that aligns with the prompt's intent. The optimized version reduces potential ambiguity and enhances the likelihood of receiving a contextually relevant and high-quality response from the AI. +Evaluation: The optimized version is designed to elicit a more focused and accurate response from the AI assistant. By providing additional context and clarity, the AI has a better understanding of the specific benefits attributed to the visual representation. The use of phrases like "graphical depiction," "shared understanding," and "communication of complex ideas" offers a clear framework for the AI to generate a response that aligns with the prompt's intent. The optimized version reduces potential ambiguity and enhances the likelihood of receiving a contextually relevant and high-quality response from the AI. -Overall Conclusion: +Overall Conclusion: Based on the evaluation test, the optimized version of the prompt demonstrates superior effectiveness compared to the original. It achieves a higher standard of clarity by providing explicit and detailed explanations while maintaining the completeness of the information conveyed. The optimized version is also tailored to elicit more accurate and contextually appropriate responses from AI assistants or models, ensuring a more productive and efficient interaction. This comprehensive test underscores the value of careful prompt design and analysis, highlighting the potential for enhanced AI performance and output quality. diff --git a/doc/BoxUI.html b/doc/BoxUI.html new file mode 100644 index 0000000..dab09f6 --- /dev/null +++ b/doc/BoxUI.html @@ -0,0 +1,271 @@ + + + + + + +module BoxUI - flowbots v0.1 + + + + + + + + + + + + + + + + +
+

+ module BoxUI +

+ +
+ +

Add this to your ui.rb file or create a new file called box_ui.rb

+ +
+ +
+ + + + + +
+
+

Public Class Methods

+
+ +
+
+ side_by_side_boxes(text1, text2, title1: "Box 1", title2: "Box 2") + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 154
+def side_by_side_boxes(text1, text2, title1: "Box 1", title2: "Box 2")
+  screen_width = TTY::Screen.width
+  screen_height = TTY::Screen.height
+  box_width = (screen_width / 2) - 2
+  box_height = screen_height - 4  # Leave some space for prompts
+
+  box1 = create_scrollable_box(text1, box_width, box_height, title1)
+  box2 = create_scrollable_box(text2, box_width, box_height, title2)
+
+  display_boxes(box1, box2, box_height)
+end
+
+
+ + +
+ +
+ +
+
+

Private Class Methods

+
+ +
+
+ create_scrollable_box(text, width, height, title) + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 168
+def create_scrollable_box(text, width, height, title)
+  lines = text.split("\n")
+  total_pages = (lines.length.to_f / (height - 2)).ceil
+  {
+    title: title,
+    lines: lines,
+    width: width,
+    height: height,
+    total_pages: total_pages,
+    current_page: 1
+  }
+end
+
+
+ + +
+ +
+
+ display_boxes(box1, box2, box_height) + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 181
+def display_boxes(box1, box2, box_height)
+  loop do
+    system('clear') || system('cls')
+    print_boxes(box1, box2, box_height)
+    print_navigation_info(box1, box2)
+
+    input = STDIN.getch
+    case input.downcase
+    when 'q'
+      break
+    when 'a'
+      box1[:current_page] = [1, box1[:current_page] - 1].max
+    when 'd'
+      box1[:current_page] = [box1[:total_pages], box1[:current_page] + 1].min
+    when 'j'
+      box2[:current_page] = [1, box2[:current_page] - 1].max
+    when 'l'
+      box2[:current_page] = [box2[:total_pages], box2[:current_page] + 1].min
+    end
+  end
+end
+
+
+ + +
+ +
+
+ print_boxes(box1, box2, box_height) + click to toggle source +
+ +
+ + + +
+ + +
+ +
+
+ print_navigation_info(box1, box2) + click to toggle source +
+ +
+ + + +
+ + +
+ +
+ +
+
+ + + + diff --git a/doc/Flowbots/UI.html b/doc/Flowbots/UI.html new file mode 100644 index 0000000..699512a --- /dev/null +++ b/doc/Flowbots/UI.html @@ -0,0 +1,285 @@ + + + + + + +module Flowbots::UI - flowbots v0.1 + + + + + + + + + + + + + + + + +
+

+ module Flowbots::UI +

+ +
+ +
+ +
+ + + + + +
+
+

Public Instance Methods

+
+ +
+
+ exception(text) + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 53
+def exception(text)
+  ui.framed do
+    ui.failed "Exception Message" do
+      ui.puts text, glyph: "💡"
+    end
+  end
+end
+
+
+ + +
+ +
+
+ header() + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 89
+def header
+  ui.space
+  ui.h1 "UI: Message Types"
+  ui.space
+end
+
+
+ + +
+ +
+
+ info(text) + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 46
+def info(text)
+  header
+  ui.framed do
+      ui.puts text, glyph: "💡"
+  end
+end
+
+
+ + +
+ +
+
+ prompt() + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 24
+def prompt
+  @prompt = TTY::Prompt.new(enable_color: true, active_color: :cyan)
+end
+
+
+ + +
+ +
+
+ response(response) + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 61
+def response(response)
+  ui.space
+  ui.h1 "UI: Text Line Animation"
+  ui.space
+
+  response.each_line do |line|
+    input = line.chomp
+    unless input.nil?
+      input.each_char do |char|
+        print "\e[34m#{char}\e[0m"
+        ui.message char
+        sleep 0.02
+      end
+      puts "\n"
+    end
+    res = line.chomp
+    unless res.nil?
+      puts "\e[32m#{res.strip.chomp}\e[0m"
+      puts "\n"
+    end
+    cap = line.chomp
+    puts "\e[33m#{cap.strip.chomp}\e[0m\n" unless cap.nil?
+  end
+  puts "\n"
+  sleep 3
+  ui.space
+end
+
+
+ + +
+ +
+
+ say(type, statement) + click to toggle source +
+ +
+ + +
+
# File lib/ui.rb, line 28
+def say(type, statement)
+  prompt
+  type = :ok if type.nil?
+  case type
+  when :ok
+    @prompt.ok(statement)
+    logger.info statement
+  when :warn
+    @prompt.warn(statement)
+    logger.warn statement
+  when :error
+    @prompt.error(statement)
+    logger.fatal statement
+  else
+    PASTEL.say(statement)
+  end
+end
+
+
+ + +
+ +
+ +
+
+ + + + diff --git a/doc/PreprocessTextFileTask.html b/doc/PreprocessTextFileTask.html new file mode 100644 index 0000000..21e23c8 --- /dev/null +++ b/doc/PreprocessTextFileTask.html @@ -0,0 +1,293 @@ + + + + + + +class PreprocessTextFileTask - flowbots v0.1 + + + + + + + + + + + + + + + + +
+

+ class PreprocessTextFileTask +

+ +
+ +

This task preprocesses a text file, extracting metadata and content.

+ +
+ +
+ + + + + +
+
+

Public Instance Methods

+
+ +
+
+ execute() + click to toggle source +
+ +
+

Executes the task.

+ +

@return [void]

+ +
+
# File lib/tasks/preprocess_text_file_task.rb, line 9
+def execute
+  logger.info "Starting PreprocessTextFileTask"
+
+  @textfile = retrieve_current_textfile
+
+  logger.debug "File content: #{@textfile.content[0..500]}..." # Log first 200 characters
+
+  begin
+    grammar_processor = Flowbots::GrammarProcessor.new("markdown_yaml")
+    parse_result = grammar_processor.parse(@textfile.content)
+    logger.debug "Parse result: #{parse_result.inspect}"
+
+    if parse_result
+      content = parse_result[:markdown_content]
+      metadata = extract_metadata(parse_result[:yaml_front_matter])
+      store_preprocessed_data(content, metadata)
+      logger.info "Successfully preprocessed file with custom grammar"
+    else
+      logger.error "Failed to parse the document with custom grammar"
+      @textfile.update(preprocessed_content: "")
+      @textfile.update(metadata: {})
+      @textfile.save
+    end
+  rescue StandardError => e
+    logger.error "Error in grammar processing: #{e.message}"
+    logger.error e.backtrace.join("\n")
+    Flowbots::UI.exception("#{e.message}")
+    @textfile.update(preprocessed_content: "")
+    @textfile.update(metadata: {})
+    @textfile.save
+  end
+
+  logger.info "PreprocessTextFileTask completed"
+end
+
+
+ + +
+ +
+ +
+
+

Private Instance Methods

+
+ +
+
+ extract_metadata(yaml_front_matter) + click to toggle source +
+ +
+

Extracts metadata from the YAML front matter.

+ +

@param yaml_front_matter [String] The YAML front matter string.

+ +

@return [Hash] The extracted metadata.

+ +
+
# File lib/tasks/preprocess_text_file_task.rb, line 68
+def extract_metadata(yaml_front_matter)
+  return {} if yaml_front_matter.empty?
+
+  begin
+    YAML.safe_load(yaml_front_matter)
+  rescue StandardError => e
+    logger.error "Error parsing YAML front matter: #{e.message}"
+    {}
+  end
+end
+
+
+ + +
+ +
+
+ normalize_text(text) + click to toggle source +
+ +
+

Normalizes the given text by converting it to lowercase and removing non-alphanumeric characters.

+ +

@param text [String] The text to normalize.

+ +

@return [String] The normalized text.

+ +
+
# File lib/tasks/preprocess_text_file_task.rb, line 51
+def normalize_text(text)
+  text.downcase.gsub(/[^a-z0-9\s]/i, "")
+end
+
+
+ + +
+ +
+
+ retrieve_current_textfile() + click to toggle source +
+ +
+

Retrieves the current Textfile object from Redis.

+ +

@return [Textfile] The Textfile object representing the current file.

+ +
+
# File lib/tasks/preprocess_text_file_task.rb, line 58
+def retrieve_current_textfile
+  textfile_id = Jongleur::WorkerTask.class_variable_get(:@@redis).get("current_textfile_id")
+  Textfile[textfile_id]
+end
+
+
+ + +
+ +
+
+ store_preprocessed_data(content, metadata) + click to toggle source +
+ +
+

Stores the preprocessed content and metadata in the database.

+ +

@param content [String] The preprocessed content. @param metadata [Hash] The extracted metadata.

+ +

@return [void]

+ +
+
# File lib/tasks/preprocess_text_file_task.rb, line 85
+def store_preprocessed_data(content, metadata)
+  # redis = Jongleur::WorkerTask.class_variable_get(:@@redis)
+  @textfile.update(preprocessed_content: content)
+  @textfile.update(metadata:)
+  @textfile.save
+  # redis.set("preprocessed_content", content)
+  # redis.set("file_metadata", metadata.to_json)
+  logger.debug "Stored preprocessed content (first 100 chars): #{content[0..100]}"
+  logger.debug "Stored metadata: #{metadata.inspect}"
+end
+
+
+ + +
+ +
+ +
+
+ + + + diff --git a/doc/TextTaggerTask.html b/doc/TextTaggerTask.html index 7c16e6c..9b77dc6 100644 --- a/doc/TextTaggerTask.html +++ b/doc/TextTaggerTask.html @@ -53,18 +53,18 @@

- +
- - + + - + - - + + - +
- - + + - - - + + + - +
- - + + - + - - + + - +
- - + + - + - - + +
- +
# File lib/general_task_agent.rb, line 63
@@ -152,7 +152,7 @@ 

Public Instance Methods

- +
# File lib/general_task_agent.rb, line 67
@@ -214,7 +214,7 @@ 

Private Instance Methods

- +
# File lib/general_task_agent.rb, line 76
@@ -328,7 +328,7 @@ 

Private Instance Methods

- +
# File lib/general_task_agent.rb, line 80
@@ -352,4 +352,3 @@ 

Private Instance Methods

Generated by RDoc 6.4.0.

Based on Darkfish by Michael Granger. - diff --git a/doc/TrainTopicModelTask.html b/doc/TrainTopicModelTask.html index 70c9851..6f3dfeb 100644 --- a/doc/TrainTopicModelTask.html +++ b/doc/TrainTopicModelTask.html @@ -53,20 +53,20 @@

- +
- - + + - - - + + + - +
- - + + - - - + + +
- +
@@ -158,7 +158,7 @@

Attributes

- +
@@ -343,4 +343,3 @@

Private Instance Methods

Generated by RDoc 6.4.0.

Based on Darkfish by Michael Granger. - diff --git a/doc/WorkflowOrchestrator.html b/doc/WorkflowOrchestrator.html index 3fcfc57..3868e76 100644 --- a/doc/WorkflowOrchestrator.html +++ b/doc/WorkflowOrchestrator.html @@ -53,20 +53,20 @@

- +
- - + + - - - + + +
- +