Talking to the Vault
Dispatch #003
Voice is the one area where every AI player has simultaneously tried and failed. There are options. ChatGPT has Advanced Voice. Gemini has Live. Claude has voice input. xAI now has Super Voice. None of it, until very recently, could reach into your actual stuff.
What I wanted was simple. A decent voice model, on my phone, that could access not just its training or a web search, but my notes. The actual junk I’ve been saving in Obsidian for years because I’ll need it one day. Insurance documents, project plans, half-finished ideas, meeting summaries from calls I have already forgotten.
I threw money at ElevenLabs. I built workarounds. I wired things together that had no business being wired together. It was always fragile. Latency, lag, sessions falling over. Nothing worked well enough to stick.
Last week Grok shipped custom MCP connectors. That meant there was a path. Last night I took it. I’ve vibed it, it works, it’s probably terrible, but it does exactly what I’ve wanted for months.
Here is how.
What we built
My Obsidian vault is now queryable by voice. I ask a question out loud. “When does my car insurance expire?” or “What’s the plan for the AI workshop?” Grok searches my notes and answers. No opening Obsidian, no searching, no copy-pasting.
The stack:
Obsidian
└── Local REST API plugin (localhost, HTTPS)
└── obsidian-mcp-server (cyanheads)
└── Bun proxy ← the critical piece
└── Cloudflare named tunnel
└── vault.yourdomain.com/mcp
└── Grok Voice connector
All three local services run as launchd agents on my M1 Mac so they survive reboots.
The proxy
The cyanheads obsidian-mcp-server does the heavy lifting: it translates your Obsidian vault into MCP tools that any MCP-capable AI can use. It’s well built. But Grok’s connector implementation has some quirks that required a thin proxy between the server and the public internet.
Four things the proxy handles:
1. Immediate tool list notification
This is the root cause of the “Authentication” error most people will hit.
Grok’s connector-manager waits for a notifications/tools/list_changed event on the SSE channel before it calls tools/list. It does not fetch the tool list proactively. It waits to be told the list is ready.
That notification has to arrive the instant the SSE channel opens. The MCP server sends ping events on that channel, but pings can be 30+ seconds apart. If you wait for the first ping before injecting the notification, Grok times out and kills the session.
The fix is to emit the notification using ReadableStream.start() so it’s enqueued the moment the stream is created, not waiting on backend data.
2. Strip the tasks capability
The cyanheads framework advertises a non-standard tasks capability in the initialise response. Grok’s connector-manager doesn’t know what to do with it and fails silently.
3. Strip listChanged flags
Removing listChanged: true from capabilities means Grok fetches tools immediately rather than waiting for further change notifications.
4. Fix the OAuth discovery
The MCP server exposes /.well-known/oauth-protected-resource advertising bearer auth. Grok sees this, tries to find an auth server at /.well-known/oauth-authorization-server, gets a 404, and gives up.
The proxy implements a full OAuth server with a password-gated authorisation page. When you add the connector, Grok opens the auth page, you enter a password, Grok stores the token. Nobody else can access your vault without your password to get through the OAuth flow.
The debugging process
The MCP handshake was completing successfully. Grok was doing everything right: OAuth worked, the session initialised, the SSE channel opened. Then after exactly 30 seconds it deleted the session and showed “Authentication failed.”
Logging every request through the proxy made the pattern obvious: Grok never called tools/list. It was waiting. The 30-second window was its timeout for that wait.
Once I found it was waiting for notifications/tools/list_changed, and that the notification wasn’t arriving because the proxy was waiting for the backend’s first ping, the fix was straightforward. The “Authentication” error turned out to be Grok’s generic failure message, not an OAuth problem at all.
This is the kind of bug that should take twenty minutes and takes seven hours. The system logs success until it doesn’t. Nothing points to the cause until you log everything and watch what never arrives.
Requirements
- A domain with DNS you control (for the Cloudflare tunnel)
- Cloudflare account (free tier works)
- The Obsidian Local REST API plugin installed and running
- Bun (for the proxy and the MCP server)
- Grok with connector access
Is it worth it?
Yes, genuinely. Voice retrieval from a personal knowledge base is something I’ve wanted for a while. The latency is low enough that it feels natural. Grok finds the right note, pulls the relevant section, and answers without me needing to remember which file something lives in.
The more you put in Obsidian, the more useful this becomes. Insurance docs, reference notes, meeting summaries, project plans. All queryable by voice.
Running since last night. Already used it twice. Early signs are good.
The proxy code
The proxy is on GitHub. The core of it is about 150 lines of Bun/TypeScript. If you’re hitting the same “Authentication” wall trying to connect a custom MCP server to Grok, the four fixes above are almost certainly why.
Originally published on Viberpsychosis.