Use this file to discover all available pages before exploring further.

Talk mode

Talk mode is a continuous voice conversation loop:

Listen for speech
Send transcript to the model (main session, chat.send)
Wait for the response
Speak it via the configured Talk provider (
text
talk.speak
)

Behavior (macOS)

Always-on overlay while Talk mode is enabled.
Listening → Thinking → Speaking phase transitions.
On a short pause (silence window), the current transcript is sent.
Replies are written to WebChat (same as typing).
Interrupt on speech (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.

Voice directives in replies

The assistant may prefix its reply with a single JSON line to control voice:


json
{ "voice": "<voice-id>", "once": true }

Rules:

First non-empty line only.
Unknown keys are ignored.
text
once: true
applies to the current reply only.
Without
text
once
, the voice becomes the new default for Talk mode.
The JSON line is stripped before TTS playback.

Supported keys:

text
voice
/
text
voice_id
/
text
voiceId
text
model
/
text
model_id
/
text
modelId
text
speed
,
text
rate
(WPM),
text
stability
,
text
similarity
,
text
style
,
text
speakerBoost
text
seed
,
text
normalize
,
text
lang
,
text
output_format
,
text
latency_tier
text
once

Config (
text
`~/.openclaw/openclaw.json`
)


json5
{
  talk: {
    provider: "elevenlabs",
    providers: {
      elevenlabs: {
        voiceId: "elevenlabs_voice_id",
        modelId: "eleven_v3",
        outputFormat: "mp3_44100_128",
        apiKey: "elevenlabs_api_key",
      },
      mlx: {
        modelId: "mlx-community/Soprano-80M-bf16",
      },
      system: {},
    },
    speechLocale: "ru-RU",
    silenceTimeoutMs: 1500,
    interruptOnSpeech: true,
  },
}

Defaults:

text
interruptOnSpeech
: true
text
silenceTimeoutMs
: when unset, Talk keeps the platform default pause window before sending the transcript (
text
700 ms on macOS and Android, 900 ms on iOS
)
text
provider
: selects the active Talk provider. Use
text
elevenlabs
,
text
mlx
, or
text
system
for the macOS-local playback paths.
text
providers.<provider>.voiceId
: falls back to
text
ELEVENLABS_VOICE_ID
/
text
SAG_VOICE_ID
for ElevenLabs (or first ElevenLabs voice when API key is available).
text
providers.elevenlabs.modelId
: defaults to
text
eleven_v3
when unset.
text
providers.mlx.modelId
: defaults to
text
mlx-community/Soprano-80M-bf16
when unset.
text
providers.elevenlabs.apiKey
: falls back to
text
ELEVENLABS_API_KEY
(or gateway shell profile if available).
text
speechLocale
: optional BCP 47 locale id for on-device Talk speech recognition on iOS/macOS. Leave unset to use the device default.
text
outputFormat
: defaults to
text
pcm_44100
on macOS/iOS and
text
pcm_24000
on Android (set
text
mp3_*
to force MP3 streaming)

macOS UI

Menu bar toggle: Talk
Config tab: Talk Mode group (voice id + interrupt toggle)
Overlay:
- Listening: cloud pulses with mic level
- Thinking: sinking animation
- Speaking: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode

Android UI

Voice tab toggle: Talk
Manual Mic and Talk are mutually exclusive runtime capture modes.
Manual Mic stops when the app leaves the foreground or the user leaves the Voice tab.
Talk Mode keeps running until toggled off or the Android node disconnects, and uses Android's microphone foreground-service type while active.

Notes

Requires Speech + Microphone permissions.
Uses
text
chat.send
against session key
text
main
.
The gateway resolves Talk playback through
text
talk.speak
using the active Talk provider. Android falls back to local system TTS only when that RPC is unavailable.
macOS local MLX playback uses the bundled
text
openclaw-mlx-tts
helper when present, or an executable on
text
PATH
. Set
text
OPENCLAW_MLX_TTS_BIN
to point at a custom helper binary during development.
text
stability
for
text
eleven_v3
is validated to
text
0.0
,
text
0.5
, or
text
1.0
; other models accept
text
0..1
.
text
latency_tier
is validated to
text
0..4
when set.
Android supports
text
pcm_16000
,
text
pcm_22050
,
text
pcm_24000
, and
text
pcm_44100
output formats for low-latency AudioTrack streaming.

OpenClaw Docs

Talk mode

Behavior (macOS)

Voice directives in replies

Config (
text
`~/.openclaw/openclaw.json`
)

macOS UI

Android UI

Notes

Related

OpenClaw Docs

Talk mode

Behavior (macOS)

Voice directives in replies

Config (textCopy~/.openclaw/openclaw.json)

macOS UI

Android UI

Notes

Related

Config (
text
`~/.openclaw/openclaw.json`
)