Plugins are a way to extend the functionality of Whispering Tiger.
There mignt still be changes to the plugin system in the future.
Plugins are loaded from the Plugins
directory in the root of the project. The directory is scanned for .py
files and each file is loaded as a plugin.
Plugins are written as Python classes. The class must inherit from Plugins.Base
and implement at least the init
method.
At the very top of the file, you should add a comment with a short description and most importantly, a version line, so the version can be compared inside the UI application. example:
# ============================================================
# This is the plugin xyz for Whispering Tiger
# Version: 1.0.0
# some more information about the plugin
# ============================================================
The format of the version line can start with Version:
, Version
, V
, V:
followed by <major>.<minor>.<patch>
Function | Optionality | Description |
---|---|---|
init(self) |
Required | is called at the initialization of whispering tiger, right after the settings file is loaded. |
__plugin_init__(self) |
Optional | is called at the initialization of the Plugins, before settings file is loaded. (Only use if really needed.) |
on_enable(self) |
Optional | is called when the plugin is enabled. |
on_disable(self) |
Optional | is called when the plugin is disabled. |
timer(self) |
Optional | is called every x seconds (defined in plugin_timer ) and is paused for x seconds (defined in plugin_timer_timeout ) when the Speech-to-Text engine returned a result.This can be used for a regular output that is stopped occasionally when a more important transcription is supposed to be displayed. |
stt(self, text, result_obj) |
Optional | is called when the Speech-to-Text engine returns a result. |
stt_intermediate(self, text, result_obj) |
Optional | is called when a live transcription result is available. Make sure to use the stt function for final results. |
stt_processing(self, audio_data, sample_rate, final_audio) -> dict (data_obj) |
Optional | is called when audio is recorded and no local Speech-to-Text model is loaded (useful for Speech-to-Text Plugins). Needs to return a result object in the format: {'text': transcribed_text, 'type': "transcript", 'language': source_language} audio_data = raw 16 bit float mono audio sample_rate = 16000. |
tts(self, text, device_index, websocket_connection=None, download=False, path='') |
Optional | is called when the TTS engine is about to play a result, except when called by the sst engine. if you want to play a sound when the Speech-to-Text engine returns a result, you should do it in the stt method as well. Path is set if the TTS request was to save the TTS result. |
sts(self, wavefiledata, sample_rate) |
Optional | is called when a recording is finished (which is sent to the Speech-to-Text model). This function gets the audio recording to be processed by the plugin. |
text_translate(self, text, from_code, to_code) -> tuple (txt, from_lang, to_lang) |
Optional | is called when a translation is requested and no included translator is available. Must return a tuple of translation_text, from_lang_code, to_lang_code. |
on_{event_name}_call(self, data_obj) -> dict (data_obj) |
Optional | is called when a custom plugin event is called via Plugins.plugin_custom_event_call(event_name, data_obj) . See Custom Plugin events for more info. |
The Base
class provides some helper methods to make it easier to write plugins.
init_plugin_settings(self, settings, settings_groups=None)
- Prepare all possible plugin settings and their default values. This method must be called in the init
method of the plugin. The settings
parameter is a dictionary with the settings and their default values. The settings_groups
parameter is an optional dictionary with the settings groups and the settings that belong to that group. The settings groups are used in the settings window to group the settings. If the settings_groups
parameter is not provided no groups are displayed in the UI.
IMPORTANT: Settings that are not initialized with init_plugin_settings
are deleted when calling init_plugin_settings
, so make sure to define every setting your Plugin needs.
get_plugin_setting(self, setting, default=None)
- Get a plugin setting from the settings file. If the setting is not yet in the settings file, the default value is used. (if default is not set, the default from init_plugin_settings() is used)
set_plugin_setting(self, setting, value)
- Set a plugin setting in the settings file.
Note: _When using the *plugin_setting methods, the settings are saved with the class name as the section name. So if you have a plugin called ExamplePlugin
, the settings will be saved in the ExamplePlugin
section.
is_enabled(self, default=False)
- Check if the plugin is enabled. If the plugin is not yet in the settings file, the default value is used. So by default, plugins will be disabled. Use this around your main functionality to allow enabling/disabling of your plugin functionality even at runtime.
To use specific widgets in plugin settings, you can add specific structs to the init_plugin_settings method.
The following structs are available:
{"type": "slider", "min": 0.0, "max": 1.0, "step": 0.01, "value": 0.7}
- A slider{"type": "button", "label": "Batch Generate", "style": "primary"}
- A button (style can be "primary" or "default"){"type": "select", "label": "Label", "value": "default value", "options": ["default value", "option2", "option3"]}
- A select box{"type": "textarea", "rows": 5, "value": ""}
- A textarea{"type": "textfield", "password": false, "value": ""}
- A textfield (password field if "password" is true){"type": "hyperlink", "label": "hyperlink", "value": "https://github.com/Sharrnah/whispering-ui"}
{"type": "label", "label": "Some infotext in a label.", "style": "center"}
- A label (style can be "left", "right" or "center"){"type": "file_open", "accept": ".wav,.mp3", "value": "bark_clone_voice/clone_voice.wav"}
- A file open dialog (accept can be any file extension or a comma separated list of file extensions){"type": "file_save", "accept": ".npz", "value": "last_prompt.npz"}
- A file save dialog (accept can be any file extension or a comma separated list of file extensions){"type": "folder_open", "accept": "", "value": ""}
- A folder open dialog{"type": "dir_open", "accept": "", "value": ""}
- Alias for a folder open dialog{"type": "select_audio", "device_api": "|wasapi|mme|directsound|all", "device_type": "input|output", "value": ""}
- List of audio devices. Only listing input / output devices if device_type = "input" / device_type = "output". "device_api" can be empty (using main app api), 'wasapi', 'mme' or 'directsound' for a specific audio api, or 'all' for all APIs.get_plugin_setting
on aselect_audio
will return a valid pyAudio Audio Device ID.
You can use event calls in plugins using Plugins.plugin_custom_call(event_name, data_obj)
.
The function names have the form of on_{event_name}_call
.
event_name
should be unique and self explaining.
The function needs to return None
if something failed or should be skipped,
or the data_obj
again with your necessary changes to the object.
As an example the call from the Silero TTS:
plugin_audio = Plugins.plugin_custom_event_call('silero_tts_after_audio', {'audio': audio})
if plugin_audio is not None:
audio = plugin_audio['audio']
The called function in the Plugin looks like this:
def on_silero_tts_after_audio_call(self, data_obj):
if self.is_enabled(False) and self.get_plugin_setting("voice_change_source") == CONSTANTS["SILERO_TTS"]:
audio = data_obj['audio']
# doing stuff ...
# set modified audio back to data_obj
data_obj['audio'] = audio
return data_obj
return None
Before calling Events from other Plugins, make sure all Plugins are already loaded. (Should not be called in __init__
).
Make sure to check if the Event should be callable. (Is Plugin enabled? Are the plugin settings configured properly? ...). Otherwise return None
.
Function Name | Description |
---|---|
on_silero_tts_after_audio_call(data_obj) | Called after the Included Silero TTS generated audio. Expects the audio key in the data_obj (audio as Pytorch Tensor). |
on_audio_processor_stt_{audio_processor_caller}_call(data_obj) | Called after the stt and stt_intermediate plugin methods are called, with the name of the audio_processor_caller setting settings.SetOption("audio_processor_caller", "custom_name") .data_obj will be {"text": predicted_text, "result_obj": result_obj, "final_audio": final_audio} (used by the Secondary Profile plugin) |
on_plugin_get_languages_call(data_obj) | Called to fetch available text translation languages for Text Translation Plugins. should return the data_obj with a dict in the form data_obj['languages'] = tuple([{"code": code, "name": language}]) . Return None if the Plugin is disabled. |
# ============================================================
# This is the example plugin for Whispering Tiger
# Version: 1.0.0
# some more information about the plugin
# ============================================================
import Plugins
import settings
import VRC_OSCLib
class ExamplePlugin(Plugins.Base):
def init(self):
# prepare all possible plugin settings and their default values
self.init_plugin_settings(
{
"hello_world": "default value",
"hello_world2": "foo bar",
"osc_auto_processing_enabled": False,
"tts_answer": False,
"homepage_link": {"label": "Whispering Tiger Link", "value": "https://github.com/Sharrnah/whispering-ui", "type": "hyperlink"},
"more_settings_a": "default value",
"more_settings_b": "default value\nmultiline",
"more_settings_c": 0.15,
"more_settings_d": 60,
},
settings_groups={
"General": ["osc_auto_processing_enabled", "tts_answer", "hello_world", "hello_world2", "homepage_link"],
"Second Group": ["more_settings_a", "more_settings_b", "more_settings_c", "more_settings_d"],
}
)
if self.is_enabled():
print(self.__class__.__name__ + " is enabled")
# disable OSC processing so the Plugin can take it over:
settings.SetOption("osc_auto_processing_enabled", False)
# disable TTS so the Plugin can take it over:
settings.SetOption("tts_answer", False)
# disable websocket final messages processing so the Plugin can take it over:
# this is really only needed if you want to use the websocket to send your own messages.
# for the Websocket clients to understand the messages, you must follow the format. (see the LLM Plugin for a good example)
## settings.SetOption("websocket_final_messages", False)
else:
print(self.__class__.__name__ + " is disabled")
## OPTIONAL. called every x seconds (defined in plugin_timer)
def timer(self):
# get the settings from the global app settings
osc_ip = settings.GetOption("osc_ip")
osc_address = settings.GetOption("osc_address")
osc_port = settings.GetOption("osc_port")
hello_world = self.get_plugin_setting("hello_world", "default foo bar")
hello_world2 = self.get_plugin_setting("hello_world2")
print(hello_world2)
if self.is_enabled():
VRC_OSCLib.Chat(hello_world, True, False, osc_address, IP=osc_ip, PORT=osc_port,
convert_ascii=False)
pass
## OPTIONAL. called when the STT engine returns a result
def stt(self, text, result_obj):
if self.is_enabled():
print("Plugin Example")
print(result_obj['language'])
return
## OPTIONAL. only called when the STT engine returns an intermediate live result
def stt_intermediate(self, text, result_obj):
if self.is_enabled():
print("Plugin Example")
print(result_obj['language'])
return
## OPTIONAL. only called when audio is recorded and no STT model is loaded. (Can be used for Speech-to-Text Plugins)
# audio_data is raw audio bytes in 16 bit float mono audio and sample_rate of 16000.
def stt_processing(self, audio_data, sample_rate, final_audio) -> dict|None:
if self.is_enabled():
if final_audio:
print("Plugin Example, process the audio_data to get text")
return {
'text': "transcribed_text", # the transcribed text
'type': "transcript", # needs always be "transcript"!
'language': "source_language" # the recognized spoken language
}
return None
## OPTIONAL. called when the "send TTS" function is called
def tts(self, text, device_index, websocket_connection=None, download=False, path=''):
#wav = self.generate_tts(text.strip()) # generate the TTS audio
if wav is not None:
if download:
if path is not None and path != '':
# write wav_data to file in path
with open(path, "wb") as f:
f.write(wav)
websocket.BroadcastMessage(json.dumps({"type": "info",
"data": "File saved to: " + path}))
return
## OPTIONAL - called when audio is finished recording and the audio is sent to the STT model
def sts(self, wavefiledata, sample_rate):
return
## OPTIONAL - called when translation is requested and no other translator is selected. must return a tuple consisting of text, from_code, to_code.
def text_translate(self, text, from_code, to_code) -> tuple:
return text, from_code, to_code
## OPTIONAL - called when a websocket message is received.
## formats are: (where 'ExamplePlugin' is the plugin class name)
## {"name": "ExamplePlugin", "type": "plugin_button_press", "value": "button_name"}
## {"name": "ExamplePlugin", "type": "plugin_custom_event", "value": []}
def on_event_received(self, message, websocket_connection=None):
if "type" not in message:
return
if message["type"] == "plugin_button_press":
if message["value"] == "button_name":
print("button pressed")
if message["type"] == "plugin_custom_event":
if message["value"] == "other_event_name":
print("custom event received")
pass
## OPTIONAL
def on_enable(self):
pass
## OPTIONAL
def on_disable(self):
pass
## OPTIONAL - custom event call function.
# def on_{event_name}_call(self, data_obj):
# if self.is_enabled(False):
# return data_obj
# return None