Hacking a Meta Quest 3

How we built a VR Headset that sees people through walls. (Treehacks ‘25)

7 min read4 days ago

For Treehacks 2025, my team & I decided that we needed X-Ray vision.

shoutout Scrapybara for the absolutely killer T-Shirt. ❤ (https://scrapybara.com)

Shoutout to River Dowdy, Ben Richeson, & Timothy Yu for being incredible, kind, hard-working teammates. I had a TON of fun building this with you guys.

The Technology

To accomplish this, we designed a mixed-reality device spanning the entire hardware stack:

Memory allocation on ESP32 microcontrollers
Convolutional neural networks for channel-state information (CSI) data
Delivering information via TCP servers
AI-inference software on NVIDIA Jetson
Spatial information rendering through Unity

How it all started:

We formed the team on meet.treehacks.com.

I received a dozen or so “matches” based on my interests, as well as their phone numbers. Having recently swapped to a Samsung Z Flip 6, this was a perfect opportunity to free myself from Apple’s draconian grasp & set up Pulse SMS to & mass-deliver automated invites to potential teammates. Who wants to say “hey, wanna build something cool” 12 separate times?

The day of, I wound up with just two responses — my script had accidentally sent each copy/pasted message twice. Naturally, River & everyone else thought it was some sort of scam, but he eventually realized it was just me being “efficient” with my time 😬.

The Team:

River brought his solid EE expertise, his roommate, Tim, joined with AI knowledge, I enjoy spatial computing & design-engineering, and we later met Ben, who excelled at backend architecture. I felt that this project was special, as everyone seemed to get the chance to work on sub-problems aligned with our own, individual interests.

The Idea:

We came up with this concept after reading about similar research projects & papers, where physicists have used software-defined radios & sensors costing around $400. In our case, we used two ESP32 microcontrollers which cost a little under $6 each.

Our last-minute “demo” submitted ~2 minutes before the 9:00 AM hackathon deadline.

We thought this would have implications for search & rescue technology as well. The leading cause of death for firefighters is spending too much time in hot, smoky buildings, so detecting people behind walls would be a game-changer. New advancements in local computing power make this sort of technology possible.

Implementation

We used the NVIDIA Jetson to power a convolutional neural network (CNN) trained on the channel-state information (CSI) data from two WiFi-enabled ESP32 microcontrollers. Doing this made a rudimentary “radar” system. Once we had a decent method of interpreting a room’s WiFi signature, we linked it up to a VR headset in passthrough mode, & represented the location of human-shaped distortions as green dots in 3D space.

Hardware Stack

We transformed two $3 ESP32 microcontrollers into a makeshift software-defined radio system:

CSI-TX: First ESP32 continuously transmits WiFi packets
CSI-RX: Second ESP32 captures resulting CSI data
Packets contain unique signal reflections from objects/humans through walls
Pushed ESP32 CSI Tool framework beyond typical use cases

Two small ESP32 microcontrollers mounted on the back of a NVIDIA Jetson Nano via a 3D-printed bracket.

Data Pipeline

Built a custom pipeline streaming raw CSI data from ESP32 to Jetson Nano:

30 subcarriers per measurement
100 Hz sampling rate (10ms intervals)
50-packet capture windows
Total throughput: 30 × 50 × 100 measurements/second
Critical bottleneck: Baud rate limits and data integrity

Our initial setup of the WiFi transmission/reception data collection system with Channel-State Information (CSI) data pictured on the screen

Neural Network Architecture

Custom three-block CNN running on 2GB Jetson Nano:

Input: Complex CSI patterns from data pipeline
Binary classification for human presence
Real-time (x,y) coordinate mapping
Batch normalization + dropout layers for noise handling
Sub-100ms inference latency
~90% detection accuracy

Real-time Processing

Jetson Nano handles multiple critical tasks:

Lightweight WebSocket server
Streams processed detection data
Pushes coordinates + confidence scores
Maintains persistent connection
Real-time Unity updates

Mixed Reality Interface

Unity application transforms data into spatial overlay:

WebSocket client consuming Jetson data
Meta Quest inside-out tracking integration
Real-time 3D position mapping

Getting clean CSI data proved difficult — the ESP32’s readings showed significant noise and inconsistency. We spent hours optimizing sampling configurations, balancing sampling rate, packet window size, and subcarrier count to maintain signal quality while meeting real-time demands.

Our initial attempts at data collection were failing. at first, we thought we didn’t have strong enough signals, so we built a rudimentary “signal amplifier.” Turns out, this actually works! But this also turned out to unrealistically skew our training data, and, in the end, we needed to remove it from the setup.

The biggest challenge was processing CSI data streaming into our terminal at blazing speed. This data needed to be captured into CSV files for CNN training. Deploying the model on our constrained 2GB Jetson Nano required heavy optimization — we implemented batch normalization and dropout layers to handle noisy data while maintaining sub-100ms detection latency.

Unity integration was another hurdle. Direct consumption of the Jetson Nano’s data stream wasn’t working, so we built a WebSocket server on the Jetson instead. Added architectural complexity, but finally got real-time updates working through established protocols.

Training data quality was our final challenge. Our test environments were too stable and quiet, making it hard to gather data that reflected real-world conditions.

The Final Push

Day one went smooth — we knew building this was ambitious but figured a minimal viable product reading room signals would be worth attempting. Saturday night, things got intense. The neural net started having issues — data readings didn’t match transmission. Tim and Ben needed sleep, but River (already running 24+ hours) said he’d stick it out. We refused to give up.

Powered by caffeine and snacks, we worked until 8 AM, completing device hookups and testing on a Tesla advertisement sign. Submitted our demo at 8:58 AM, two minutes before deadline.

The Aftermath

During project pitching, we faced an unexpected challenge — our sensitive radar couldn’t handle being in a room with 600+ hackers and their devices. Live demos were impossible. Instead, we used separated components and diagrams to pitch to a dozen NVIDIA judges who swarmed our table. They grilled us with physics questions, with one engineer explicitly stating afterward “I was just trying to trip you guys up.” Quite fun!

Mid-event, I spotted engineers from a frontier AI lab we all respected. Despite not being in their track, I couldn’t pass up the opportunity — it’s not every day engineers from a company you deeply respect walk directly in front of you. I left from behind my table, approached them, and said “Hi. We built a VR headset that can see people through walls. Would you like to see it?”

They seemed interested in our pitch before continuing on their way. I felt a bit embarrassed about pulling unrelated people over to make my team pitch for no apparent benefit. I briefly considered doubling down and offering my LinkedIn, but thankfully, I maintained some sense of shame (lol).

We didn’t qualify for the standard prizes — perhaps spending only 30 seconds on our presentation and lacking a live-demo setup had killed our chances. However, the frontier-lab engineers returned with unexpected news: “Hey, you guys should come see us after the event. We liked your project anyway. We also want you to give us your emails and we’ll mail you some merch. We also want to know where you live, because we’re going to fly you out to our offices in a few months.”

We were absolutely elated. After everything wrapped up, I collapsed and slept for twelve straight hours.

10/10. Would hack again.

Next Steps

Our proof of concept demonstrated significant potential, but several key improvements will enhance its capabilities as we continue to flush out the system:

Upgrade to professional-grade software-defined radios (SDRs) to replace the ESP32s. While our $6 microcontrollers proved the concept, improved SDRs would provide:

Increased detection range
Higher resolution spatial mapping
More reliable signal processing

2. Create comprehensive video documentation of the system in action:

Through-wall detection tests
Real-time visualization of the CNN processing
Complete system architecture walkthrough + real-time POV recording over Oculus Link

3. Implement more sophisticated signal processing:

Advanced noise reduction algorithms
Real-time signal quality monitoring
Real-time signal quality improvements (requires lots of math!)

Our cheap, initial prototype showed that low-cost hardware can achieve what previously required expensive research equipment. With these improvements, we could develop a practical tool for real-world applications.