-
Notifications
You must be signed in to change notification settings - Fork 85
0.5. Generation
Jim Schwoebel edited this page Aug 17, 2018
·
14 revisions
This section documents all the scripts in the Chapter_5_generation folder.
Term | Definition |
---|---|
machine-generated data | information automatically generated by a computer process, application, or other mechanism without the active intervention of a human. |
machine-generated voice data | voice data generated by a machine; could take the form of text, audio, or mixed voice data. |
generate_text.py (from CLI)
cd ~
cd voicebook/chapter_5_generation/
python3 generate_text.py
54,919 texts collected.
Training on 2,887,069 character sequences.
Epoch 1/1
118/22555 [..............................] - ETA: 1:25:28 - loss: 2.3823
11508/22555 [==============>...............] - ETA: 28:44 - loss: 1.8921
19778/22555 [=========================>....] - ETA: 7:32 - loss: 1.8588
22555/22555 [==============================] - 3707s 164ms/step - loss: 1.8509
This results in this output
####################
Temperature: 0.5
####################
Ok i can have the inside to confirm the room to go to meet u all in the place prepare to come to be at hair. Haha. Thanks!
Ehh ok. Thanks! I dunno in the plan profe late to go me to prois already and like the things and my check time lol
Shall see <#> to me true late on the rest time to contell on the class money to reply my mrt reply.
generate_email.py (from CLI)
cd ~
cd voicebook/chapter_5_generation
python3 generate_email.py
Using TensorFlow backend.
601 texts collected.
Training on 521,976 character sequences.
Epoch 1/1
24/4077 [..............................] - ETA: 17:35 - loss: 2.4576
generates an output like this
Lucy,
I will be on a reviewer on a fund to bring the formation on the Steve Englide with a consummisse for the designards to track on the contracts to computer to out of the description to trade the to the profit state to pay the process.
generate_poem.py (from CLI)
cd ~
cd voicebook/chapter_5_generation
python3 generate_poem.py
would you like a random poem?
n
what is the name of the poem? (noun)
voice
what is the description?
a short poem about voices.
generates an output like this:
Voice
I seek a religious-emotional Voice
The systems– is approximate
Why is it still to go?
The forces is subjective
The things is a implicitly, transcendence
I seek a fruitful Voice
The Yesterday is illimitable
Why is it about to know?
The year is false
The journey. is a little, disease
I seek a challenging Voice
The tension. is changeable
Why is it only to pursue?
The other. is speculative
The consciousness. is a sensibly. pure
I seek a happiness Voice
The towards is stable
Why is it again to come?
The hopes is human-machine
The species; is a divergent game
I seek a estimating Voice
The acumen is subjective
Why is it ceaselessly to white,?
The reliance. is consummate
The proof is a peripheral house
generate_summary.py
Jims-iMac:chapter_5_generation jim$ python3 generate_summary.py
what file type is this (t) for text, (w) for website.
w
what link would you like to summarize on Wikipedia?
https://en.wikipedia.org/wiki/Information_technology
generates an output like this:
Humans have been storing, retrieving, manipulating, and communicating information since the Sumerians in Mesopotamia developed writing in about 3000 BC, [3] but the term information technology in its modern sense first appeared in a 1958 article published in the Harvard Business Review ; authors Harold J. Leavitt and Thomas L. Whisler commented that "the new technology does not yet have a single established name. Their definition consists of three categories: techniques for processing, the application of statistical and mathematical methods to decision-making , and the simulation of higher-order thinking through computer programs. Several products or services within an economy are associated with information technology, including computer hardware , software, electronics, semiconductors, internet, telecom equipment , and e-commerce . Based on the storage and processing technologies employed, it is possible to distinguish four distinct phases of IT development: pre-mechanical (3000 BC – 1450 AD), mechanical (1450–1840), electromechanical (1840–1940), and electronic (1940–present). Comparable geared devices did not emerge in Europe until the 16th century, and it was not until 1645 that the first mechanical calculator capable of performing the four basic arithmetical operations was developed. The development of transistors in the late 1940s at Bell Laboratories allowed a new generation of computers to be designed with greatly reduced power consumption. Although XML data can be stored in normal file systems , it is commonly held in relational databases to take advantage of their "robust implementation verified by years of both theoretical and practical effort”.
… [continued reference section]
generate_blogpost.py (from CLI)
cd voicebook/chapter_5_generation
python3 generate_blogpost.py
1,850 texts collected.
Training on 2,399,298 character sequences.
Epoch 1/1
257/18744 [..............................] - ETA: 1:05:57 - loss: 1.8847
I think I am sure if is the care with this is the includes and did I could tell it in the summer company. You can add out of the taster age and again, the path and i was the exciting in the swirl that is some inversed to talk to child of summing at the one... we were because its a bad with my heart
make_chatbot.py
from chatterbot.trainers import ListTrainer
from chatterbot import ChatBot
import os, requests
from bs4 import BeautifulSoup
# works on Drupal FAQ forms
page=requests.get('http://cyberlaunch.vc/faq-page')
soup=BeautifulSoup(page.content, 'lxml')
g=soup.find_all(class_="faq-question-answer")
y=list()
# initialize chatbot parameters
chatbot = ChatBot("CyberLaunch")
chatbot.set_trainer(ListTrainer)
# parse through soup and get Q&A
for i in range(len(g)):
entry=g[i].get_text().replace('\xa0','').split(' \n\n')
newentry=list()
for j in range(len(entry)):
if j==0:
qa=entry[j].replace('\n','')
newentry.append(qa)
else:
qa=entry[j].replace('\n',' ').replace(' ','')
newentry.append(qa)
y.append(newentry)
# train chatbot with Q&A training corpus
for i in range(len(y)):
question=y[i][0]
answer=y[i][1]
print(question)
print(answer)
chatbot.train([
question,
answer,
])
# now ask the user 2 sample questions to get response.
for i in range(2):
question=input('how can I help you?')
response = chatbot.get_response(question)
print(response)
Now we can run chatbot and get some answers...
cd ~
cd voicebook/chapter_5_generation
python3 make_chatbot.py
how can I help you?
-> when is demo day?
Our Summer 16’ Demo Day will be on August 25, 2016. Please visit demoday.cyberlaunch.vc for more information.
how can I help you?
-> where are you located?
Our office is located in Technology Square - Georgia’s ground zero for innovation. Visit www.cyberlaunch.vc/contact for the address and directions.
generate_tts.py (from CLI)
python3 generate_tts.py
['Voice.txt']
Voice.txt found, processing...
Voice
making tts file...
Voice.aiff
converting Voice.aiff to Voice.wav
Voice.wav
sleeping..
generate_filtered.py (from CLI)
cd ~
cd voicebook/chapter_5_generation
python3 generate_filtered.py
what is the name of the wav file (in ./data/ dir) you would like to manipulate?
Voice.wav
outputs Voice_lowpitch.wav, Voice_noise.wav, and Voice_slow.wav are output in current dir
generate_splice.py and generate_remix.py (from CLI)
cd ~
cd voicebook/chapter_5_generation
python3 generate_splice.py
what folder (in ./data folder) do you want to create splices for?
mix
how long (in secs) do you want the splices
2
python3 generate_remix.py
what folder (in ./data directory) would you like to remix?
mix_snipped
make_vchatbot.py
from chatterbot.trainers import ListTrainer
from chatterbot import ChatBot
import os, requests
from bs4 import BeautifulSoup
import speech_recognition as sr_audio
import sounddevice as sd
import soundfile as sf
import pyttsx3, time
# define some helper functions
def sync_record(filename, duration, fs, channels):
print('recording')
myrecording = sd.rec(int(duration * fs), samplerate=fs, channels=channels)
sd.wait()
sf.write(filename, myrecording, fs)
print('done recording')
def transcribe_sphinx(file):
r=sr_audio.Recognizer()
with sr_audio.AudioFile(file) as source:
audio = r.record(source)
transcript=r.recognize_sphinx(audio)
print('sphinx transcript: '+transcript)
return transcript
def speak_text(text):
engine = pyttsx3.init()
engine.say(text)
engine.runAndWait()
# works on Drupal FAQ forms
page=requests.get('http://cyberlaunch.vc/faq-page')
soup=BeautifulSoup(page.content, 'lxml')
g=soup.find_all(class_="faq-question-answer")
y=list()
# initialize chatbot parameters
chatbot = ChatBot("CyberLaunch")
chatbot.set_trainer(ListTrainer)
# parse through soup and get Q&A
for i in range(len(g)):
entry=g[i].get_text().replace('\xa0','').split(' \n\n')
newentry=list()
for j in range(len(entry)):
if j==0:
qa=entry[j].replace('\n','')
newentry.append(qa)
else:
qa=entry[j].replace('\n',' ').replace(' ','')
newentry.append(qa)
y.append(newentry)
# train chatbot with Q&A training corpus
for i in range(len(y)):
question=y[i][0]
answer=y[i][1]
print(question)
print(answer)
chatbot.train([
question,
answer,
])
# now ask the user 2 sample questions to get response.
for i in range(2):
speak_text('how can I help you?')
# record a voice sample
sync_record('sample.wav', 5, 16000, 1)
# transcribe this voice sample and remove the audio
question=transcribe_sphinx('sample.wav')
os.remove('sample.wav')
# speak_text('okay, processing...')
response = chatbot.get_response(question)
# speak the response instead of playing it on screen
print(str(response))
speak_text(str(response))
If you are interested to read more on any of these topics, check out the documentation below.
Text data
- Textgenrnn
- Sumy
- Chatterbot
- NLTK
- Spacy
- ENRON dataset (emails)
- NUS-SMS-corpus (text messages)
- Blog corpus (blogs)
- 20 newsgroups (news)
Audio data
Mixed data
Modeling