Google gives Gemini 1.5 Pro "ears" and launches Vertex AI bot builder

According to the announcement at the Google Next event, Gemini 1.5 Pro will now be able to listen to its users - in practice, the model will be able to react to uploaded audio files or provide information based on calls and videos without the need to upload a transcript.

The Gemini 1.5 Pro model was first launched in February and is now the most powerful language model from Google (outperforming Gemini Ultra in performance). Undoubtedly, its main feature is the amount of context the model can process: from 128,000 to 1 million tokens. A million tokens is approximately equivalent to 700,000 words or about 30,000 lines of code - this is roughly four times more data than the flagship models Anthropic, Claude 3, and about eight times more than OpenAI's GPT-4 Turbo. max.

Gemini 1.5 Pro will be available for preview on Vertex AI - a new platform builder where Google business customers will be able to create their own chatbots.

The text generation model in Imagen 2 images has also been updated - and now offers 'inpainting' and 'outpainting' features, which allow adding or removing elements from images. All images generated by the neural network can also receive a SynthID mark - an invisible watermark that indicates the image's origin.

Source: The Verge, Techcrunch