Use this file to discover all available pages before exploring further.

Google (Gemini)

The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.

Provider:
text
google
Auth:
text
GEMINI_API_KEY
or
text
GOOGLE_API_KEY
API: Google Gemini API
Runtime option:
text
agents.defaults.agentRuntime.id: "google-gemini-cli"
reuses Gemini CLI OAuth while keeping model refs canonical as
text
google/*
.

Getting started

Choose your preferred auth method and follow the setup steps.

**Best for:** standard Gemini API access through Google AI Studio.


text
<Steps>
  <Step title="Run onboarding">
    ```bash}
    openclaw onboard --auth-choice gemini-api-key
    ```

    Or pass the key directly:

    ```bash}
    openclaw onboard --non-interactive \
      --mode local \
      --auth-choice gemini-api-key \
      --gemini-api-key "$GEMINI_API_KEY"
    ```
  </Step>

  <Step title="Set a default model">
    ```json5}
    {
      agents: {
        defaults: {
          model: { primary: "google/gemini-3.1-pro-preview" },
        },
      },
    }
    ```
  </Step>

  <Step title="Verify the model is available">
    ```bash}
    openclaw models list --provider google
    ```
  </Step>
</Steps>

<Tip>
  The environment variables `GEMINI_API_KEY` and `GOOGLE_API_KEY` are both accepted. Use whichever you already have configured.
</Tip>

**Best for:** reusing an existing Gemini CLI login via PKCE OAuth instead of a separate API key.


text
<Warning>
  The `google-gemini-cli` provider is an unofficial integration. Some users
  report account restrictions when using OAuth this way. Use at your own risk.
</Warning>

<Steps>
  <Step title="Install the Gemini CLI">
    The local `gemini` command must be available on `PATH`.

    ```bash}
    # Homebrew
    brew install gemini-cli

    # or npm
    npm install -g @google/gemini-cli
    ```

    OpenClaw supports both Homebrew installs and global npm installs, including
    common Windows/npm layouts.
  </Step>

  <Step title="Log in via OAuth">
    ```bash}
    openclaw models auth login --provider google-gemini-cli --set-default
    ```
  </Step>

  <Step title="Verify the model is available">
    ```bash}
    openclaw models list --provider google
    ```
  </Step>
</Steps>

* Default model: `google/gemini-3.1-pro-preview`
* Runtime: `google-gemini-cli`
* Alias: `gemini-cli`

Gemini 3.1 Pro's Gemini API model id is `gemini-3.1-pro-preview`. OpenClaw accepts the shorter `google/gemini-3.1-pro` as a convenience alias and normalizes it before provider calls.

**Environment variables:**

* `OPENCLAW_GEMINI_OAUTH_CLIENT_ID`
* `OPENCLAW_GEMINI_OAUTH_CLIENT_SECRET`

(Or the `GEMINI_CLI_*` variants.)

<Note>
  If Gemini CLI OAuth requests fail after login, set `GOOGLE_CLOUD_PROJECT` or
  `GOOGLE_CLOUD_PROJECT_ID` on the gateway host and retry.
</Note>

<Note>
  If login fails before the browser flow starts, make sure the local `gemini`
  command is installed and on `PATH`.
</Note>

`google-gemini-cli/*` model refs are legacy compatibility aliases. New
configs should use `google/*` model refs plus the `google-gemini-cli`
runtime when they want local Gemini CLI execution.

Capabilities

Capability	Supported
Chat completions	Yes
Image generation	Yes
Music generation	Yes
Text-to-speech	Yes
Realtime voice	Yes (Google Live API)
Image understanding	Yes
Audio transcription	Yes
Video understanding	Yes
Web search (Grounding)	Yes
Thinking/reasoning	Yes (Gemini 2.5+ / Gemini 3+)
Gemma 4 models	Yes

tip

Gemini 3 models use `thinkingLevel` rather than `thinkingBudget`. OpenClaw maps Gemini 3, Gemini 3.1, and `gemini-*-latest` alias reasoning controls to `thinkingLevel` so default/low-latency runs do not send disabled `thinkingBudget` values.

text

/think adaptive

keeps Google's dynamic thinking semantics instead of choosing a fixed OpenClaw level. Gemini 3 and Gemini 3.1 omit a fixed

text

thinkingLevel

so Google can choose the level; Gemini 2.5 sends Google's dynamic sentinel

text

thinkingBudget: -1

Gemma 4 models (for example

text

gemma-4-26b-a4b-it

) support thinking mode. OpenClaw rewrites

text

thinkingBudget

to a supported Google

text

thinkingLevel

for Gemma 4. Setting thinking to

text

off

preserves thinking disabled instead of mapping to

text

MINIMAL

Image generation

The bundled

text

google

image-generation provider defaults to

text

google/gemini-3.1-flash-image-preview

Also supports
text
google/gemini-3-pro-image-preview
Generate: up to 4 images per request
Edit mode: enabled, up to 5 input images
Geometry controls:
text
size
,
text
aspectRatio
, and
text
resolution

To use Google as the default image provider:


json5
{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "google/gemini-3.1-flash-image-preview",
      },
    },
  },
}

note

See [Image Generation](/tools/image-generation) for shared tool parameters, provider selection, and failover behavior.

Video generation

The bundled

text

google

plugin also registers video generation through the shared

text

video_generate

tool.

Default video model:
text
google/veo-3.1-fast-generate-preview
Modes: text-to-video, image-to-video, and single-video reference flows
Supports
text
aspectRatio
,
text
resolution
, and
text
audio
Current duration clamp: 4 to 8 seconds

To use Google as the default video provider:


json5
{
  agents: {
    defaults: {
      videoGenerationModel: {
        primary: "google/veo-3.1-fast-generate-preview",
      },
    },
  },
}

note

See [Video Generation](/tools/video-generation) for shared tool parameters, provider selection, and failover behavior.

Music generation

The bundled

text

google

plugin also registers music generation through the shared

text

music_generate

tool.

Default music model:
text
google/lyria-3-clip-preview
Also supports
text
google/lyria-3-pro-preview
Prompt controls:
text
lyrics
and
text
instrumental
Output format:
text
mp3
by default, plus
text
wav
on
text
google/lyria-3-pro-preview
Reference inputs: up to 10 images
Session-backed runs detach through the shared task/status flow, including
text
action: "status"

To use Google as the default music provider:


json5
{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
      },
    },
  },
}

note

See [Music Generation](/tools/music-generation) for shared tool parameters, provider selection, and failover behavior.

Text-to-speech

The bundled

text

google

speech provider uses the Gemini API TTS path with

text

gemini-3.1-flash-tts-preview

Default voice:
text
Kore
Auth:
text
messages.tts.providers.google.apiKey
,
text
models.providers.google.apiKey
,
text
GEMINI_API_KEY
, or
text
GOOGLE_API_KEY
Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with
text
ffmpeg

To use Google as the default TTS provider:


json5
{
  messages: {
    tts: {
      auto: "always",
      provider: "google",
      providers: {
        google: {
          model: "gemini-3.1-flash-tts-preview",
          voiceName: "Kore",
          audioProfile: "Speak professionally with a calm tone.",
        },
      },
    },
  },
}

Gemini API TTS uses natural-language prompting for style control. Set

text

audioProfile

to prepend a reusable style prompt before the spoken text. Set

text

speakerName

when your prompt text refers to a named speaker.

Gemini API TTS also accepts expressive square-bracket audio tags in the text, such as

text

[whispers]

text

[laughs]

. To keep tags out of the visible chat reply while sending them to TTS, put them inside a

text

[[tts:text]]...[[/tts:text]]

block:


text
Here is the clean reply text.

[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]

note

A Google Cloud Console API key restricted to the Gemini API is valid for this provider. This is not the separate Cloud Text-to-Speech API path.

Realtime voice

The bundled

text

google

plugin registers a realtime voice provider backed by the Gemini Live API for backend audio bridges such as Voice Call and Google Meet.

Setting	Config path	Default
Model	text `plugins.entries.voice-call.config.realtime.providers.google.model`	text `gemini-2.5-flash-native-audio-preview-12-2025`
Voice	text `...google.voice`	text `Kore`
Temperature	text `...google.temperature`	(unset)
VAD start sensitivity	text `...google.startSensitivity`	(unset)
VAD end sensitivity	text `...google.endSensitivity`	(unset)
Silence duration	text `...google.silenceDurationMs`	(unset)
Activity handling	text `...google.activityHandling`	Google default, text `start-of-activity-interrupts`
Turn coverage	text `...google.turnCoverage`	Google default, text `only-activity`
Disable auto VAD	text `...google.automaticActivityDetectionDisabled`	text `false`
API key	text `...google.apiKey`	Falls back to text `models.providers.google.apiKey` , text `GEMINI_API_KEY` , or text `GOOGLE_API_KEY`

Example Voice Call realtime config:


json5
{
  plugins: {
    entries: {
      "voice-call": {
        enabled: true,
        config: {
          realtime: {
            enabled: true,
            provider: "google",
            providers: {
              google: {
                model: "gemini-2.5-flash-native-audio-preview-12-2025",
                voice: "Kore",
                activityHandling: "start-of-activity-interrupts",
                turnCoverage: "only-activity",
              },
            },
          },
        },
      },
    },
  },
}

note

Google Live API uses bidirectional audio and function calling over a WebSocket. OpenClaw adapts telephony/Meet bridge audio to Gemini's PCM Live API stream and keeps tool calls on the shared realtime voice contract. Leave `temperature` unset unless you need sampling changes; OpenClaw omits non-positive values because Google Live can return transcripts without audio for `temperature: 0`. Gemini API transcription is enabled without `languageCodes`; the current Google SDK rejects language-code hints on this API path.

note

Control UI Talk supports Google Live browser sessions with constrained one-use tokens. Backend-only realtime voice providers can also run through the generic Gateway relay transport, which keeps provider credentials on the Gateway.

For maintainer live verification, run

text

OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts

. The Google leg mints the same constrained Live API token shape used by Control UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload, and waits for

text

setupComplete

Advanced configuration

Model selection

Choosing providers, model refs, and failover behavior.

Image generation

Shared image tool parameters and provider selection.

Video generation

Shared video tool parameters and provider selection.

Music generation

Shared music tool parameters and provider selection.

OpenClaw Docs

Google (Gemini)

Getting started

Capabilities

tip

Image generation

note

Video generation

note

Music generation

note

Text-to-speech

note

Realtime voice

note

note

Advanced configuration

Related

Model selection

Image generation

Video generation

Music generation