Does the transcript audio support models other than gemini-2.5-flash? #14724

zbf1999 · 2026-03-25T15:48:11Z

zbf1999
Mar 25, 2026

Hi, thanks for the great project.

I’m currently testing AFFiNE in a self-hosted/company environment, so I would prefer not to call external APIs directly. Because of that, I’m trying to use local models through Ollama instead of cloud providers.

I noticed that for Transcript audio, the official/default configuration seems to use gemini-2.5-flash. I want to ask:

Besides gemini-2.5-flash, what other models can be used for Transcript audio?
Does AFFiNE support using Ollama models for this feature?
If yes, are there any recommended models on Ollama that work well for audio transcription / transcript-related flows?
Is there any requirement for the model capability, such as:
audio input support
tool calling support
structured output support
specific OpenAI-compatible API behavior

My use case is:

self-hosted AFFiNE
internal company network
avoid external network/API access as much as possible
prefer local deployment with Ollama

I have already tried replacing the default model with an Ollama model in some AI-related configuration, but I’m not sure whether Transcript audio has stricter requirements than normal chat/structured generation.

So I’d like to confirm:

which models are officially supported for Transcript audio
whether local Ollama models are feasible
and if there are any known working model examples

Thanks a lot.

name: 'Transcript audio',
action: 'Transcript audio',
model: 'gemini-2.5-flash',
optionalModels: [
  'gemini-2.5-flash',
  'gemini-2.5-pro',
  'gemini-3.1-pro-preview',
],

Answered by darkskygit

Mar 31, 2026

Based on our tests, at present only Gemini 2.5/3.x Flash/Pro can reliably transcribe stably without audio preprocessing. As far as I know, there is currently no offline model that can simultaneously support speech + tools + structured output; using Gemini is recommended.

View full answer

darkskygit · 2026-03-31T08:37:29Z

darkskygit
Mar 31, 2026
Maintainer

Based on our tests, at present only Gemini 2.5/3.x Flash/Pro can reliably transcribe stably without audio preprocessing. As far as I know, there is currently no offline model that can simultaneously support speech + tools + structured output; using Gemini is recommended.

1 reply

zbf1999 Apr 17, 2026
Author

Is gemma4 e4b | Step-Audio 2 mini okay?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does the transcript audio support models other than gemini-2.5-flash? #14724

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Does the transcript audio support models other than gemini-2.5-flash? #14724

Uh oh!

zbf1999 Mar 25, 2026

Replies: 1 comment · 1 reply

Uh oh!

darkskygit Mar 31, 2026 Maintainer

Uh oh!

zbf1999 Apr 17, 2026 Author

zbf1999
Mar 25, 2026

Replies: 1 comment 1 reply

darkskygit
Mar 31, 2026
Maintainer

zbf1999 Apr 17, 2026
Author