-
Notifications
You must be signed in to change notification settings - Fork 26
Home
Welcome to the WearableIntelligenceSystem (WIS) documentation wiki! Here, we provide an overview of what the system is, what it can do, how it works, how you can contribute, etc. If you're keen to start and just want a quick overview, go ahead and read the Getting Started section below.
This is mainly written for developers and researchers. If you just want to use the WIS as a user, checkout the README, which describes how to get it set up.
If you read through this and can't find an answer to your question, please create an issue.
The Wearable Intelligence System is a smart glasses home page and suite of powerful applications for users. But, you already know that from the README.
As a developer, the WIS represents a better way to develop applications for smart glasses. There are so many things to get right in a successful smart glasses application (including phone connection, voice transcription, sensor access, voice control, UI, edge AI, etc.) that building even simple apps for smart glasses can be difficult and time-consuming. The WIS has done a lot of the tough legwork for smart glasses development, allowing you to focus on building your app. It's also a step towards an egocentric operation system (OS), an OS which needs to work very differently because of the form factor, interface, and use cases that wearable applications demand.
Checkout the main README for a high level view of what the system is, what it's trying to accomplish, and how it helps users.
Ok, if you're here, you've already read the README. You're a developer, industry, start-up, or otherwise super user that wants to upgrade or modify the system in some way. If so, you or your team will need some basic knowledge of Android and Android Studio. If you've never used Android/Android Studio before, we recommend you go run through this tutorial first.
Finally, if you upgrade the system in any way, please consider making a pull request, so everyone else can benefit from that change, too.
These abbreviations will be used everywhere, so read them twice.
WIS - Wearable Intelligence System
ASP - Android Smart Phone
ASG - Android Smart Glasses
GLBOX - GNU/Linux 'Single Board Computer'/Laptop (just a regular old computer server)
You'll need at two pieces of hardware to run the system:
- ASP - Android Smart Phone, currently supporting Android 9+
- ASG - Android Smart Glasses, currently supporting Android 5.1 (version update coming soon)
Vuzix Blade - Supported
Nreal Light - In development
Vuzix Shield - Coming Soon
Epson Moverio BT200 - Previously supported, may still work
If you device isn't supported, create an issue and we may be able to support it.
Please follow the instructions in the README for how to get the system up and running. Note that the consumer facing application uses a server hosted by Emex Labs. If you want to use your own server, you'll need to follow the developer setup and install instructions.
If you're a developer and you want to build the system on your own machine, follow the instructions here. Remember that there are 3 main hardware components to this system, and each has its own build process.
To install the system, you have three options:
- USER INSTALL - Install the pre-built APKs on your ASP and ASG, and use the Emex Labs public GLBOX.
- DEVELOPER INSTALL - Build your own APKs for ASP and ASG, and use the Emex Labs public GLBOX
- DEVELOPER+ INSTALL - Build your own APKs for ASP and ASG, and setup your own GLBOX
Head on back to the README Install section for instructions on how to install without any modifications to the application.
- Clone this repo:
git clone [email protected]:emexlabs/WearableIntelligenceSystem.git #clone main repo
git submodule update --init --recursive #clone submodules
- Setup and build the ASG app with Android Studio
- Setup and build the ASP app with Android Studio
- Start the ASG app on the ASG and the ASP app on the ASP, then follow the WiFi hotspot section in README Install section to get the system running.
- Clone this repo:
git clone [email protected]:emexlabs/WearableIntelligenceSystem.git #clone main repo
git submodule update --init --recursive #clone submodules
- Setup and build the ASG app with Android Studio
- Setup and build the ASP app with Android Studio
- Modify the ASP application at
comms/RestServerComms.java
and change the variableserverUrl
to point to your GLBOX domain name or public IP address. - Install and run the GLBOX
- Start the ASG app on the ASG and the ASP app on the ASP, then follow the WiFi hotspot section in README Install section to get the system running.
The system is architected such that the ASG is treated as only an input/output device. The ASG receives human input and sensor input and sends all of that data immediately to the ASP. The ASP then processes and saves the data. Outputs from data processing (commands, insights, events, etc.) are then passed from the ASP to the ASG, and the ASG displays that info to the user. Thus, the ASG is running as little computation as possible.
There is also a cloud server. Due to data bandwidth limits (e.g. 2Gb/month ISP data plan) streaming video and audio all the time to a backend is not possible. The backend handles functionality that is best suited to run on the cloud (mostly third party API calls that require an external key) and will be expanded in the future to include a web interface.
In order to remain modular and decoupled with such a large and fast-changing system, the system uses JSON IPC between the ASP and ASG.
Data and function calls are passed around the application on an event bus. Right now, we are using RXJAVA as the event bus, with our own custom parsing, and all event keys can be found in /commes/MessageTypes.java
.
Instead of calling functions directly, which requires passing many objects around and becomes too complex with a big system like this, we only pass around the "dataObservable" rxjava object, which handles sending data and triggering messages anywhere in the app. These events are multicast, so multiple different systems can respond to the same message.
- Soon, we'll move RXJAVA to Android EventBus
-
MainActivity.java
- first class that is run on app launch, in charge of the UI and launching theWearableAiAspService.java
background service -
WearableAiAspService.java
- where connection to the ASG starts and all processing happens. This launches connections, moves data around, run processing and stays alive in the background. -
ASGRepresentative.java
- a system that communicates with the ASG -
GLBOXRepresentative.java
- a system that communicates with the GLBOX -
comms/
- handle all communications like WiFi and Bluetooth -
comms/AudioSystem.java
- handles decrypting encrypted audio -
speechrecognition/
- handles transcribing speech -
voicecommand/
- handle the voice command parsing and generating tasks -
nlp/
- handles Natural language processing (NLP) tasks -
facialrecognition/
- runs processsing and saving of facial recognition data -
ui/
- all the ui interactive fragments are here -
database/
- handles saving and recall of data -
utils/
- a number of utils to support file saving, common functions, etc.
-
MainActivity.java
- first class that is run on app launch, in charge of the UI and launching theWearableAiAspService.java
background service -
WearableAiAspService.java
- where connection to the ASP starts, and data is passed from the ASP to the ASG UI atMainActivity.java
-
comms/
- handle all communications like WiFi and Bluetooth -
sensors/
- handle all sensors like Microphone, EEG, etc. -
utils/
- a number of utils to support file saving, common functions, etc.
-
main_webserver.py
- main system which runs the web server -
api/
- all of the REST API endpoints -
main_tools.py
- a helper class that contains top level functionality that classes in/api
can call and utilizes lower level functions in -utils/
-
utils/
- a bunch of supporting function, classes, and API keys
For a list of all the voice commands, see the README Voice Commands section.
The voice command system runs entirely on the ASP. Audio is received from the ASG and transcribed on the ASG (see Speech Recognition. The voice transcripts are then sent to the Voice Command system on the ASP for processing and throwing off commands.
-
voicecommand/VoiceCommandServer.java
is the top level system which receives transcripts, parses wake words, parses commands, and runs voice commands. -
voicecommand/VoiceCommand.java
is the interface that all voice commands implement. - The other files in
voicecommand/
implement individual voice commands.
- Copy one of the existing voice commands to a new file name and class name.
- Modify the wakeWords list, commandList, commandName, and run functions to perform the function you wish. Remember, the system uses an rxjava event bus, so don't write the business logic in the command, just send off an event/message on the event bus so the appropriate subsystem can run the actual functionality. If that isn't clear, go reread Architecture.
- Add your new command to the list of live commands in the
voicecommands/VoiceCommand.java
voiceCommand
list.
We use Vosk for automatic speech recognition (ASR). This is because the system is high accuracy, runs locally on Android, and is almost completely open source. The audio data is streamed from the ASG microphone (connected Bluetooth SCO microphone) to the ASP, where's it transcribed by Vosk.
Audio streaming from ASG - android_smart_glasses/.../AudioSystem.java
Audio receiving on ASP - android_smart_phone/.../comms/AudioSystem.java
Vosk Speech recognition system - android_smart_phone/.../speechrecognition/SpeechRecVosk.java
We use the main Vosk android model: vosk-model-small-en-us-0.15
included as a dependency in the app.
However we have successfully tested using both vosk-model-en-us-0.22
and vosk-model-en-us-0.22-lgraph
. The problem with vosk-model-en-us-0.22
is that it makes the build time ~10 minutes because it's so large, which is unreasonable, and it lags behind except on the most powerful ASPs (older chipsets can't keep up). For now we will use the standard model, with a future interest in upgrading the model, especially for far field, conversational voice recognition.
In case it wasn't clear - all speech recognition runs locally, with no audio streamed over the internet.
This app runs on any Android 9+ smart phone. We recommend significant computing power for the ASP, something like a Snapdragon 855+ or better, and something that supports WiFi sharing.
Open, build, and run the app in main/
from Android Studio, just like any other Android app.
Not currently working with the latest master since we switched to Gradle. The mediapipe library is how we run AIML on the edge. Since we just moved to Gradle/Android-Studio and Mediapipe is in Bazel, we still need to either convert the Mediapipe system to an AAR we build in Bazel and import in the main Gradle app, or deprecate the mediapipe system. Below is how to build the MediaPipe system:
- Follow these instructions to setup Bazel and MediaPipe: https://google.github.io/mediapipe/getting_started/android.html (including the external link on this page on how to install MediaPipe)
- don't forget to follow these instructions on that same page: https://google.github.io/mediapipe/getting_started/install.html
- Change the SDK and NDK in ./main/WORKSPACE to point to your own Android SDK install (if you don't have one, install Android Studio and download an SDK and NDK)
- Run this command:
bazel build -c opt --config=android_arm64 --java_runtime_version=1.8 --noincremental_dexing --verbose_failures mediapipe/examples/android/src/java/com/google/mediapipe/apps/wearableai:wearableai;
- You have now built the application!
- For subsequent builds where you don't change anything in WORKSPACE file, use the following command for faster build:
bazel build -c opt --config=android_arm64 --java_runtime_version=1.8 --noincremental_dexing --verbose_failures --fetch=false mediapipe/examples/android/src/java/com/google/mediapipe/apps/wearableai:wearableai;
For now, to add references to your database, update the CSV in ASP application assets/wearable_referencer_references.csv
. This also will be moved to a user-facing UI shortly.
This is how we run a number of different machine learning models, on the edge, all at the same time, using the ASP's GPU. This system makes it easier to add new models and possible to cascade models which run inference on the output of models higher in the perception pipeline.
To do so, we us Google MediaPipe, which is way to define intelligence graphs ("perception pipelines") which take input, do intelligence processing (by creating flow of data between machine learning models and hard coded functions known as "Calculators"). This app is built on the Google MediaPipe even though ./main/ is not currently tracking Google MediaPipe repo. In the future, if we want to pull in new work from the main MediaPipe repository, we will set things up again to track Google Mediapipe.
This is a WIP. Detect/classify/recognise the scene/location/place from POV video.
Keras-VGG16-places365/ is the places365 system converted to a tensorflowlite model for our WearableAI graph that is currently running on the ASP
This app runs on Android Smart Glasses (ASG). It's designed to be able to work on any pair of Android smart glasses. See Officially Supported Smart Glasses for officially supported hardware.
android_smart_glasses/main
is the main Android application to run on the ASG.
Open android_smart_glasses/main
in Android studio. Setup the ASG USB debugging (see below). Use Android Studio to build + run + upload to glasses + edit in Android Studio.
In order to install the application from Android Studio to your ASG, you need to enable USB debugging on the ASG.
This setup depends on your hardware. There is usually a similar method for all Android devices, with slight differences between hardware.
- Go to "Settings" -> "System" -> "About" -> "Device Info" -> (swipe forward ten times)
- Go to "Settings" -> "System" -> "Dev Options" -> "USB Debugging" -> (turn this on)
There is already a publicly acessable GLBOX running at https://wis.emexwearables.com/api. You only need to follow these steps if you want to use your own server.
This is simply a cloud server, running Flask (a Python HTTP server) with Flask-Restful for API handling.
There are two main things:
-
main_webserver.py
- the main web server which runs function for the mobile computer - (DEPRECATED)
main_socket.py
- this is being deprecated, but it contains a lot of the fundamental system that we are moving to the web server
- Clone this repo:
git clone [email protected]:emexlabs/WearableIntelligenceSystem.git #clone main repo
git submodule update --init --recursive #clone submodules
- cd to
/gnu_linux_box/backend
- Make and activate a virtualenv. Example:
python3 -m virtualenv venv && source venv/bin/activate
- Install requirements (
pip3 install -r requirements.txt
) - Install spacy NLP artifact:
python3 -m spacy download en_core_web_sm
- Make file
./utils/wolfram_api_key.txt
and paste your WolframOne App Id in there. If you don't have a WolframOne App Id, make one here. This is, right now, required for natural language queries. - (IGNORE UNLESS DEVELOPING WITH GOOGLE TRANSLATE OR GOOGLE SPEECH TO TEXT) Setup GCP to allow Speech to Text and Translation, make credentials for it, and place your JSON credentials at
./utils/creds.json
. This is used for speech to text and translation services. - Setup Microsoft Azure to allow Bing search API, get the key and paste just the key string in
./utils/azure_key.txt
. This is needed, for now, for Visual Search - Run
main_webserver.py
If you follow the above steps above, follow the steps in the main README (connect ASG to ASP hotspot), and run Android apps on the ASG and ASP, you will be running the WIS on a local server. You must follow the below Deploy steps to deploy this to the internet.
This setup is arbitrary, do what works best for your stack, whether that be Nginx, Apache, etc. However, a ready to go setup is provided in this repo, and the steps are:
- Setup a cloud connected Linux box (tested on Ubuntu 18 LTS AWS EC2) with a domain name or static IP that you can SSH into
- Install Nginx:
sudo apt-get install nginx
- Enable Nginx:
sudo systemctl start nginx && sudo systemctl enable nginx
- Clone the repo at
/var/www/html
and ensure permissions are properly set for/var/www/html
: https://askubuntu.com/questions/767504/permissions-problems-with-var-www-html-and-my-own-home-directory-for-a-website - Add the two .conf files at
gnu_linux_box/backend/deploy
to/etc/nginx/site-available
and activate them with:
sudo rm /etc/nginx/sites-enabled/default
sudo ln /etc/nginx/sites-available/wis_backend.conf /etc/nginx/sites-enabled/
sudo ln /etc/nginx/sites-available/wis_ssl.conf /etc/nginx/sites-enabled/
sudo systemctl restart nginx
- Setup the backend to run by setting up virtualenv and installing requirements (follow the [Install / Setup](#build-and-install-glbox above)
- Copy the .service file from
gnu_linux_box/backend/deploy
to/etc/systemd/system
- Enable the service file with
sudo systemctl start wis_gunicorn && sudo systemctl enable wis_gunicorn
Change url and image location as required
(echo -n '{"image": "'; base64 ~/Pictures/grp.jpg; echo '"}') |
curl -H "Content-Type: application/json" -d @- http://localhost:5000/visual_search_search
Change url and query text as required
curl -X POST -F "query=who's the president of the us" http://127.0.0.1:5000/natural_language_query -vvv
To show your ASG or ASP screen to others (e.g. over video chat with "share screen") you can open a window on your computer that will mirror the ASG or ASP display. The steps to do so are:
- Install
scrcpy
: https://github.com/Genymobile/scrcpy - Run
scrcpy
- MediaPipe Graph: https://arxiv.org/abs/1906.08172
- Facial recognition from: https://github.com/shubham0204/FaceRecognition_With_FaceNet_Android
- Memory tools and database structure: https://github.com/stairs1/memory-expansion-tools
- Cayden Pierce - emexwearables.com
- you could be here, please consider contributing!