New update for Gemini 1.5 Pro AI introduces support for audio prompts with limitations

Google introduced the Gemini 1.5 Pro upgrade, a major enhancement for its large language model (LLM), in mid-February. This upgrade powers the free Gemini product available to all users, while the paid version, Gemini Ultra, is included with a Google One subscription.

The Gemini 1.5 Pro is already on par with Ultra and recently received a substantial upgrade with a context window of up to 1 million tokens. This allows users to input prompts of approximately 700,000 words, over 30,000 million lines of code, 11 hours of audio, or 1 hour of video content.

Moving forward to mid-April, Google announced that Gemini 1.5 Pro is now open for testing by enterprise users through the Vertex AI development platform. This testing phase will include support for audio files in prompts, a valuable feature in the world of genAI products. Unfortunately, not every user currently has access to Gemini 1.5 Pro.

Users who have the opportunity to test Gemini 1.5 Pro can upload any type of audio file and request information from the AI based on those files. This feature is highly anticipated, especially for individuals accustomed to using similar genAI products like Whisper for audio file transcription.

The inclusion of support for audio files in Gemini 1.5 Pro opens up new possibilities for users. This feature can be utilized for tasks such as transcribing interviews and video calls, enhancing memory recall, and simplifying the transcription process overall.

Related Posts