I Tried Running NVIDIA's PersonaPlex on RunPod — Here's What Happened

A step-by-step walkthrough of setting up NVIDIA's PersonaPlex speech-to-speech model on RunPod, including the challenges, workarounds, and whether it's worth the effort.

Posted Jan 30, 2026

By Michael Masters

5 min read

I recently wanted to test NVIDIA’s PersonaPlex — a real-time speech-to-speech conversational AI model — but I don’t have an NVIDIA GPU. My M3 Max has 64GB of unified memory, which sounds impressive until you realize PersonaPlex is built for CUDA. So I turned to cloud GPUs.

This is a walkthrough of getting PersonaPlex running on RunPod, the gotchas I hit along the way, and my honest take on whether the open-source speech-to-speech space is ready for prime time.

What is PersonaPlex?

PersonaPlex is NVIDIA’s real-time, full-duplex speech-to-speech model built on the Moshi architecture from Kyutai. It enables persona control through text-based role prompts and audio-based voice conditioning. In plain English: you can talk to it, and it talks back with consistent character voices.

The appeal is obvious — instead of the traditional STT → LLM → TTS pipeline with its accumulated latency, speech-to-speech models process audio directly. They understand emotional context and verbal cues better than text intermediaries.

Cloud GPU Options and Pricing

Since PersonaPlex requires CUDA, my options were cloud GPU providers. Here’s what the landscape looks like:

GPU	Provider	Price/hour
A100 80GB	RunPod	$1.49 - $1.79
A100 80GB	Lambda Labs	$1.79
H100	Hyperstack	$1.90 - $2.40
H100	AWS/GCP/Azure	$4.00 - $8.00

I went with RunPod for the per-second billing and straightforward interface.

Setting Up the Pod

After creating a RunPod account and adding credits (~$10-25 is plenty for testing), the deployment process is:

Click Pods → Deploy
Select A100 SXM 80GB
Choose the RunPod PyTorch 2.x template
Set Volume Disk to 50-100 GB (model weights are large)
Important: Add 8998 to Expose HTTP Ports (more on this later)
Deploy

The port exposure step is easy to miss, and you’ll hit a wall without it.

Installation Steps

Once the pod is running, connect via the web terminal and run:

        
      
cd /workspace
git clone https://github.com/NVIDIA/personaplex.git
cd personaplex

# Install opus codec
apt-get update && apt-get install -y libopus-dev

# Install the package (no requirements.txt — it installs from the moshi directory)
pip install moshi/.

The lack of a requirements.txt tripped me up initially. The install happens via the local package directory.

The HuggingFace Token Dance

PersonaPlex downloads model weights from HuggingFace, and you’ll hit authentication errors without proper setup. Here’s the sequence of errors I encountered:

Error 1: 401 Unauthorized

1
requests.exceptions.HTTPError: 401 Client Error: Unauthorized

Fix: Export your HuggingFace token:

        
      
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx

Error 2: 401 Still (Gated Repo)

1
2
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error
Cannot access gated repo for url...

Fix: Go to the model page on HuggingFace and click “Agree and access repository” to accept the license.

Error 3: 403 Forbidden

1
403 Forbidden: Please enable access to public gated repositories in your fine-grained token settings

Fix: Your token needs the right permissions. Go to HuggingFace → Settings → Tokens and create a new token with Read access (not fine-grained), or if using fine-grained tokens, enable “Access to public gated repos.”

Running the Server

The README says to run:

        
      
SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR"

This works, and you’ll see the model download and load:

1
2
3
4
5
6
INFO - loading mimi
INFO - mimi loaded
INFO - loading moshi
INFO - moshi loaded
INFO - warming up the model
======= Running on https://0.0.0.0:8998 =======

But here’s the catch — the --ssl flag breaks RunPod’s proxy.

The Port Exposure Problem

RunPod proxies HTTP services through URLs like https://<pod-id>-<port>.proxy.runpod.net. When I tried accessing the server, I got a 502 Bad Gateway error.

The issue: RunPod’s proxy handles SSL termination itself. It expects backends to serve plain HTTP. The --ssl flag makes PersonaPlex serve HTTPS internally, which the proxy can’t handle.

The fix: Run without SSL:

        
      
python -m moshi.server --host 0.0.0.0 --port 8998

Now you can access the web interface at:

1
https://<your-pod-id>-8998.proxy.runpod.net

Was It Worth It?

After all that setup — creating accounts, configuring tokens, debugging SSL issues — I finally got PersonaPlex running.

And… it was underwhelming.

The latency was noticeable, the voice quality didn’t match my expectations, and the persona control felt limited. For roughly 30-40 minutes of A100 time, I spent about $1.50. Not a big loss, but it highlighted how far open-source speech-to-speech models have to go.

The Alternatives

If you’re exploring real-time voice AI, here’s what else is out there:

Hosted (Easiest Path)

Service	Notes
GPT-4o Voice	Best quality, ~$20/mo via ChatGPT Plus
Gemini Live	Google’s equivalent, free tier available
ElevenLabs	Great voice quality, conversational AI API

Open Source / Self-Hosted

Model	Notes
Kyutai TTS 1.6B	From the Moshi team, newer and improved
Kyutai Pocket TTS	100M params, runs on CPU
Ultravox	Speech-to-speech from Fixie.ai
Sesame CSM	Expressive character voices

Honest Take

If you just want a good voice conversation experience right now, GPT-4o’s voice mode in the ChatGPT app is the most polished option. The open-source models are catching up, but they still lag in naturalness and latency.

For production voice AI, the traditional STT → LLM → TTS pipeline with providers like Deepgram, Claude, and Cartesia still offers more control, better quality, and predictable costs — even if it means slightly higher latency.

Key Takeaways

Cloud GPU testing is cheap — You can test CUDA-only models for a few dollars
RunPod’s HTTP proxy expects plain HTTP — Don’t use --ssl flags for services you want to expose
HuggingFace gated models require three things: A valid token, license acceptance, and correct token permissions
Open-source speech-to-speech isn’t ready — For production use, hosted APIs or traditional pipelines are still the better choice
Always clone to /workspace — On RunPod, data outside this directory is lost when pods reset

Resources

Total cost of this experiment: ~$1.50 in cloud GPU time, plus the sunk cost of my expectations.

voice-ai, cloud-gpu

personaplex runpod cloud-gpu speech-to-speech nvidia huggingface

This post is licensed under CC BY 4.0 by the author.