-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Voice Input and Audio Response #1877
Comments
…to the tab of the conversation infiniflow#1877
…llowed to be turned on infiniflow#1877
…ly message when the answer is empty infiniflow#1877
Right now we can't implement voice input, so if I want to implement this function, how should I do it, thank you |
You can implement the voice input function by modifying the message-input file, which will use the Web Audio API. Of course, you can directly use third-party libraries, such as react-mic. |
Thank you for your reply. We have a development plan. When will the audio input function be implemented. In the meantime, can you tell me the message-input file path. Good luck to you! |
Follow-Up on voice inputI’ve been following the discussions regarding the implementation of voice input and audio responses in the chat application. Given the recent updates and the plan to integrate voice input, I’d like to share a potential approach for the audio transcription aspect. I have implemented a similar feature in my project, where I receive an audio file (in WAV format) from the front-end using @manager.route('/completion_with_transcribe', methods=['POST'])
@login_required
def completion_with_transcribe():
audio_file = request.files.get('file')
transcription_text = None
if audio_file:
print(f"Received file: {audio_file.filename}")
# Secure the filename and save it temporarily
filename = secure_filename(audio_file.filename)
temp_path = os.path.join('/tmp', filename)
audio_file.save(temp_path)
# Convert the audio file to MP3 format
try:
audio = AudioSegment.from_file(temp_path)
mp3_path = os.path.join('/tmp', f"{os.path.splitext(filename)[0]}.mp3")
audio.export(mp3_path, format="mp3")
print(f"File converted to MP3: {mp3_path}")
except Exception as e:
print(f"Error converting file to MP3: {e}")
return jsonify({"error": "Failed to convert audio to MP3"}), 500
# Transcription logic using Groq client
try:
client = Groq(api_key="YOUR_API_KEY_HERE")
with open(mp3_path, "rb") as file:
transcription = client.audio.transcriptions.create(
file=(mp3_path, file.read()),
model="whisper-large-v3",
response_format="verbose_json",
)
transcription_text = transcription.text
print(f"Transcription: {transcription_text}")
except Exception as e:
print(f"Error during transcription: {e}")
return jsonify({"error": "Failed to transcribe audio"}), 500
# Clean up temporary files
os.remove(temp_path)
os.remove(mp3_path)
if not transcription_text:
return jsonify({"error": "Failed to transcribe audio"}), 500
# Access JSON data from the form
req_data = request.form.get('data')
if not req_data:
return jsonify({"error": "No JSON data provided"}), 400
req = json.loads(req_data)
print('completion------->', req)
msg = []
for m in req["messages"]:
if m["role"] == "system":
continue
if m["role"] == "assistant" and not msg:
continue
msg.append({"role": m["role"], "content": m["content"]})
# Add the transcription as a new user message if available
if transcription_text:
msg.append({"role": "user", "content": transcription_text})
try:
e, conv = ConversationService.get_by_id(req["conversation_id"])
if not e:
return get_data_error_result(retmsg="Conversation not found!")
conv.message.append(deepcopy(msg[-1]))
e, dia = DialogService.get_by_id(conv.dialog_id)
if not e:
return get_data_error_result(retmsg="Dialog not found!")
del req["conversation_id"]
del req["messages"]
if not conv.reference:
conv.reference = []
conv.message.append({"role": "assistant", "content": ""})
conv.reference.append({"chunks": [], "doc_aggs": []})
def fillin_conv(ans):
nonlocal conv
if not conv.reference:
conv.reference.append(ans["reference"])
else:
conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"]}
def stream():
nonlocal dia, msg, req, conv
try:
for ans in chat(dia, msg, True, **req):
fillin_conv(ans)
yield "data:" + json.dumps({"retcode": 0, "retmsg": "", "data": ans}, ensure_ascii=False) + "\n\n"
ConversationService.update_by_id(conv.id, conv.to_dict())
except Exception as e:
yield "data:" + json.dumps({"retcode": 500, "retmsg": str(e), "data": {"answer": "**ERROR**: " + str(e), "reference": []}}, ensure_ascii=False) + "\n\n"
yield "data:" + json.dumps({"retcode": 0, "retmsg": "", "data": True}, ensure_ascii=False) + "\n\n"
if req.get("stream", True):
resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp
else:
answer = None
for ans in chat(dia, msg, **req):
answer = ans
fillin_conv(ans)
ConversationService.update_by_id(conv.id, conv.to_dict())
break
return get_json_result(data=answer)
except Exception as e:
return server_error_response(e) Hope this helps |
Thank you for your reply. Thank you .I will test this code as soon as possible and give you feedback. Best wishes ! |
Does this code need to be alone on a web/SRC/component/message - input folder ?Whether to add a recording component on the application interface?thanks,looking forward to your reply ! |
…ow#2436) ### What problem does this PR solve? feat: Supports text output and sound output infiniflow#1877 ### Type of change - [x] New Feature (non-breaking change which adds functionality)
…to the tab of the conversation infiniflow#1877 (infiniflow#2440) ### What problem does this PR solve? feat: After the voice in the new conversation window is played, jump to the tab of the conversation infiniflow#1877 ### Type of change - [x] New Feature (non-breaking change which adds functionality)
…llowed to be turned on infiniflow#1877 (infiniflow#2446) ### What problem does this PR solve? feat: If the tts model is not set, the Text to Speech switch is not allowed to be turned on infiniflow#1877 ### Type of change - [x] New Feature (non-breaking change which adds functionality)
…ly message when the answer is empty infiniflow#1877 (infiniflow#2447) ### What problem does this PR solve? feat: When voice is turned on, the page will not display an empty reply message when the answer is empty infiniflow#1877 ### Type of change - [x] New Feature (non-breaking change which adds functionality)
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
No response
Describe the feature you'd like
Summary:
Add voice input and audio response capabilities to the chat application.
Description:
Voice Input:
Audio Response:
Benefits:
Implementation:
Describe implementation you've considered
No response
Documentation, adoption, use case
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: