AR Subtitles
Andre Ahuna
AR Subtitles is a project to help people communicate across language barriers and hearing disabilities.
The aim is to develop an app for Meta Quest 3 which shows the user's surroundings with the device's onboard passthrough cameras and composites speech-to-text subtitles onto it. The user has the ability to pick a language the processed text will translated to. All in-app menus will support hand tracking.
A possible additional feature is locating speaker(s). With this, subtitles could be positioned more helpfully and individual speaker audios may be possible to isolate from background chatter. These features are still in R&D.
Platform: Meta Quest 3
Technology: Unity
Milestone 1 (07.10)
- Feature 1 - OpenXR Unity Scene setup
- Feature 2 - Hand tracking setup + a mock menu
Used OpenXR because Meta is moving towards deprecating it's own Unity plugin and for future compatibility with other VR devices. For hand tracking, OpenXR provides a great sample project.
Milestone 2 (21.10)
- Feature 1 - Implement cloud-based speech-to-text. Required supported languages: English, Estonian, German, French, Spanish.
Really time-consuming. Quick overview of tested approaches: - Local AI model with Whisper - an OpenAI trained model which comes in different sizes. Got it's tiny model running before consulting documentation and finding out Estonian is not supported. - Google speech-to-text - the initial approach, mostly because of their extensive API and translate. Got some API calls running (a hassle, no Unity SDK) on PC but wasn't able to authenticate myself on Android device. A lot of time wasted. - Azure speech-to-text - easier to set up with Nuget for Unity. Their website is a mess (standard Microsoft tbh) but they do offer student packages and 3 months free.
Finished milestone was a 2D app with speech-to-text, hard-coded languages and a clear way forward.
Milestone 3 (04.11)
- Feature 1 - Implement cloud-based translation. Required supported languages: English, Estonian, German, French, Spanish.
As the speech-to-text solution was picked due to it's integration with translate, this milestone was a lot simpler then last. Azure's documentation covering translate is also broader then speech-to-text (when it isn't shoving AI down your throat) so real-time API calls were a breeze.
Finished milestone had a 2D app with switchable language (not perfect, broke session) and live translation.
Milestone 4 (18.11)
- Feature 1 - Integrate everything. VR passthrough + speech-to-text + translation.
- Bonus feature - Hand-tracked menus for selecting languages. Gesture-based menu summoning.
Integrated 2 aspects which thus far had stayed in different Unity projects - VR and translate. Spent like 2 hours debugging mic access until I realized it was turned off in the system menu (at least got a nice debug menu out of it). Language switching isn't quite as seamless as I'd hoped.
Implemented hand-tracked menus but no gesture-based menu yet.
Milestone 5 (02.12)
- Feature 1 - Gesture-based menu summoning.
- Feature 2 - Always-on, threshold based decaying voice detection.
- Feature 3 - Polish: UI coat of paint, non-interrupting language change.
First foray into Quest gestures, quite intuitive framework. Liked it so much that I opted to use it for starting/stopping translate as well. Added translate canvas to user's hand because I've had negative experiences with stuff parented to the VR camera.
Milestone 5 (16.12)
- Feature 1 - Fix UI bugs.
- Feature 2 - Investigate passthrough end when device sleep ends.
- Feature 3 - Replace manual menu adjustment with automatic re-centering.
- Bonus feature - Play around with external mic.
Fixed non-draggable language buttons (scrollbars bad!), align menu to user when summoning it. Fixing the passthrough + sleep took longer then expected. I'd encountered a similar issue with Meta's XR framework where fixing it was a simple matter of toggling passthrough off/on, but OpenXR + ARFoundation wasn't that permissive. Internet search found a lot of similar issues popping up recently, with a promised fix on the way. Finally, with countless annoying builds, arrived at a working solution. I'm too afraid to touch it again.
Tested my earbuds for the external mic, didn't work. That old microphone debug menu should come in handy now.