The World's First Spatial Computing Hack
I finally found a bug that literally fills your room full of bugs.
Overview
Tl;Dr - I found a bug in visionOS Safari that allows a malicious website to bypass all warnings and forcefully fill your room with an arbitrary number of animated 3D objects (CVE-2024-27812). These objects persist in your space even after you exit Safari. I reported this bug to Apple in February 2024 and they fixed it in June and awarded me a bounty.
Disclaimer
Before we jump in, I want to set the stage - this is not a long, complicated, kill-chain write-up like my previous posts. This is a fun short story about exploring a new piece of technology, discovering an unusually "scary" vulnerability, and revealing the pitfalls of bug triaging. We aren't breaking SOP or gaining unauthorized camera access today. No, today we have a different mission - we are trying to hack "Spatial Computing."
Background
When Tim Cook announced the Vision Pro in 2023, he made it very clear that this was a different type of Apple device. This marvel of engineering is a magic face computer that tracks your eyes and fills your home with virtual 3D objects. This deeply personal interaction reasonably made people nervous about privacy and security, which is why Apple built-in a plethora of privacy protections. Let's take a brief look at some of them.
One of the big areas Apple is rightfully protective of is safeguarding who and what is allowed to enter your personal space inside Vision Pro. Wouldn't it be awful if a malicious app could scare you by spawning items behind you? Well thankfully, by default, native apps are restricted to a "Shared Space" context, where they act predictably and can be easily closed.
If an app wants a more immersive experience, they must receive explicit permission from the user via an OS-level prompt that places them in a trusted "Full Space" context.
But what about websites? Can a website piggy back off of Safari's privileged context and break out of the 2D world? Well, sort of. Apple quietly rolled out support for WebXR in their visionOS WebKit, although they hid it behind a series of experimental feature flags.
But letting websites spawn 3D objects in your room without permission is of course a massive privacy violation, which is why Apple rebuilt the "Full Space" permission model in a web context (similar to how they rebuilt the camera permission model). Websites that want to use WebXR (assuming the user has manually enabled the experimental feature) must be manually granted permission via a popup in Safari.
Ok, so our mission to hack Spatial Computing seems impossible. Untrusted websites apparently have no way to forcefully spawn 3D objects in the victim's room, right?
The Bug
Surprisingly, there is an older web-based 3D model viewing standard that the visionOS team seemed to have forgotten about - Apple AR Kit Quick Look! Back in 2018, when Apple first started to dabble in AR/VR/XR, they developed a new HTML-based method in iOS for rendering 3D Pixar files called In-Place USDZ Viewing. By adding the "ar" value to an anchor tag's "rel" attribute and placing an <img> tag inside the <a> element, any website could instruct mobile Safari to treat the link as an in-place 3D model. All the user had to do was click the link for Safari to tell the Quick Look application to render the file.
After some quick testing, I noticed that this standard is still alive and well in WebKit (including the visionOS build), and even supports the more modern ".reality" filetype made by Apple's Reality Composer. In fact, we can even add Spatial Audio so it feels like sound is coming from the object itself. Even better, these features work by default out-of-the-box, so the victim does not need to enable any fancy experimental features.
And here is the fun part - Safari does not enforce any type of permission model on this feature. Furthermore, it does not even require this anchor tag to have been "clicked" by the human. So programatic JavaScript clicking (i.e. document.querySelector('a').click()) works no problem! This means that we can launch an arbitrary number of 3D, animated, sound-creating, objects without any user interaction whatsoever.
If the victim just views our website in Vision Pro, we can instantly fill their room with hundreds of crawling spiders and screeching bats! Freaky stuff.
Screen recording of spiders literally crawling out of my malicious website
A huge NOPE right away.
My office was full of hundreds of screeching bats after viewing my website for a couple seconds
The exploit code was very straightforward -
<body onload="setInterval(function(){document.querySelector('a').click()},10)">
<a rel="ar" href="/bat.reality"><img></a>
</body>
To make things even freakier - since these animated files are being handled by a separate application (Quick Look), closing Safari does not get rid of them. And because visionOS does not have a Dock or any other Open Apps UI, there is no obvious way to get rid of them besides manually running around the room to physically tap each one.
It turns out it was surprisingly easy to find a loophole in the visionOS Spatial Computing permissions model. I just needed to dig around in old WebKit guides until I found some neglected attack surface ported from iOS. Success!
Reporting the Bug
This is where things get even more interesting. For some unknown reason, the Apple security team seemed to downplay the Spatial Computing angle and 100% focused on if this issue could induce a system crash and reboot (which it eventually will if enough 3D objects get rendered). They focused on this so much that it is actually the entire Impact statement on the CVE. And even weirder, the description claims that this issue was addressed by improving the file handling protocol, file:// (which has nothing to do with this bug).
We all know that triaging and classifying bugs is hard, but dang. But to Apple's credit, this was a very weird vulnerability. In a lot of ways, this bug perfectly highlights where rigid vulnerability classification taxonomies fall apart. In my opinion, determining the impact of this bug from its RAM/CPU/Thread utilization totally misses the point.
The goal of this project was to analyze threats unique to this new computing platform. I wanted to see how this new frontier of hyperrealistic mixed reality could invade personal space in surprising and unsettling ways. But I don't think a nuanced analysis of psychological impact is on the radar of bug triaging teams.
Perhaps its time for Apple to re-evaluate their Vision Pro threat model. This is a deeply personal product and classic vulnerability triaging guidelines may not capture the full impact anymore. Just imagine what would happen if this guy viewed my website-
[Update: After reading my blog post, Apple updated the CVE description to something more sensible.]