Does the transcript audio support models other than gemini-2.5-flash? #14724
-
|
Hi, thanks for the great project. I’m currently testing AFFiNE in a self-hosted/company environment, so I would prefer not to call external APIs directly. Because of that, I’m trying to use local models through Ollama instead of cloud providers. I noticed that for Transcript audio, the official/default configuration seems to use gemini-2.5-flash. I want to ask: Besides gemini-2.5-flash, what other models can be used for Transcript audio? My use case is: self-hosted AFFiNE I have already tried replacing the default model with an Ollama model in some AI-related configuration, but I’m not sure whether Transcript audio has stricter requirements than normal chat/structured generation. So I’d like to confirm: which models are officially supported for Transcript audio Thanks a lot. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Based on our tests, at present only Gemini 2.5/3.x Flash/Pro can reliably transcribe stably without audio preprocessing. As far as I know, there is currently no offline model that can simultaneously support speech + tools + structured output; using Gemini is recommended. |
Beta Was this translation helpful? Give feedback.
Based on our tests, at present only Gemini 2.5/3.x Flash/Pro can reliably transcribe stably without audio preprocessing. As far as I know, there is currently no offline model that can simultaneously support speech + tools + structured output; using Gemini is recommended.