Summary of Google I/O 2024 — Gemini Pro, Multi Modality, Long Context, Agents, Veo, Imagen 3, Project Astra, Gemma
First of all, we are so sorry for situation in Rafah, Gaza, Palestine. We call for an immediate ceasefire and urge the allowance of humanitarian aid. Let Palestine be free from genocide.
Gemini
- Multi-Modal Functionality: Gemini offers the capability to process any input and generate any desired output, ensuring versatility and flexibility.
- 1.5M+ developers uses it
- Also 2B user
- In recent months, Gemini has enabled the utilization of 1 million tokens in prompts. With this enhancement, users can now, for example, incorporate videos into prompts and generate JSON data from objects within the video.
Gemini 1.5 Pro
- Enabled to all developers globally
- 2M tokens private preview for developers
Gemini 1.5 Flash
- Geared towards achieving low latency and optimal efficiency.
- Developers have access to 2 million tokens for use.
Search
AI OVERVIES in Google Search
- Available in the US with plans for expansion to other countries in the near future.
- Search across videos, websites, maps, finance, shopping, hotels, and books, providing instant answers with just one query. The overview will be displayed before presenting website results.
- The locations map will be displayed below the answer.
- AI-curated dynamic search result pages for various categories such as books and hotels. Additionally, users can customize the presented results according to their preferences.
Planning
- You can design a personalized diet plan, tailor-made for a vegan lifestyle, and export it. Additionally, you can include necessary food items in a shopping list.
- Craft comprehensive travel itineraries including flights, accommodations, and suggested destinations to visit.
Ask/Troubleshoot live with video in Google Search
When encountering an issue, simply open your camera, voice your question, and voilà! The AI overview will appear alongside relevant websites, articles, videos, and more.
Project Astra
- Act as a your Personal agent
- Engage in real-time video conversations with Gemini, asking anything behind the camera or about related objects within view. Draw on your phone to indicate specific points of interest. Receive information while on a trip by querying about the history of the sights you encounter. 😎
You can ask in google photos with complex questions
- Gemini comprehensively analyzes all your photos, establishing meaningful connections between them. You can inquire about anything, anyone, or any activity depicted in your images from anywhere.
Generative media
Imagen 3
- Generate intricately detailed images with effects based on text prompts.
- Crafting imaginative textual visuals.
Music AI Sandbox
- Enhance your sounds with modifications and add captivating effects.
Veo — Create Viodes
- Generates high-quality 1080P videos with effects based on prompts provided by you.
- Produces cinematic videos featuring timelapse, aerial shots, and stunning landscapes.
- You can prolong the resulting scenes to longer durations using VideoFx.
- The waitlist for labs.google is now open.
Gemini App
- Serve as your personal AI assistant.
- Learn, create, and code anything you can imagine.
- You can utilize it through text, voice, or camera inputs.
- Gemini Live: real-time video comprehension.
- You can create Gems, a customized version of Gemini tailored to act as anything you need. Customize them according to your requirements. Coming in the next few months.
- Plan your trips and customize them according to your preferences.
- Gemini Advanced, launching in the summer, will support 35+ languages.
- Gemini Advanced introduces a premium subscription model. Subscribers gain access to Gemini 1.5Pro.
- 1M tokens, long context understanding (up to 1500 pages of PDF, 30,000 lines of source code, or 1 hour of video).
- You can incorporate various models into a single question or prompt, including PDFs, sound files, videos, and spreadsheets.
- Put your sheets and ask for visualization
- You can input your spreadsheet data into the question prompt and request visualization.
Gemini in Shopping
- Discover and purchase products you’re interested in using Circle Search.
- Initiate product returns by retrieving order numbers from your Gmail, filling out return forms, and completing the process seamlessly.
AI integrated at the core of Android.
Gemini, at the core of the system, will provide continuous tracking and support, ensuring assistance whenever needed.
Additionally, it will feature Pixel on-device Gemini Nano, offering multimodal capabilities with low latency.
AI powered Circle Search
- Just in Android
- Resolve course questions, search images, and manipulate images.
- Generate image
- Gemini tracks your activity, enabling you to ask questions about videos while you watch them.
- When you open a PDF, Gemini recognizes you because it’s integrated into the core of Android. This allows you to ask questions about the contents of the PDF.
- Gemini protects you when someone calls you. It listens the call -privately on your device- , understanding the content. If it detects bad intentions, it will warn you.
- Gemini will help to Talkback
AI at Google Workspace
By Side Panel:
- You can take actions based on services (mail, drive, docs…).
- Organize mails, ask to spreadsheets and obtain visualized answers (e.g., When I’ve spent money?).
- Combine and synchronize Google ecosystem services such as mail, drive, and files.
Gemini in Gmail
- You can use Gmail on your mobile device to ask Gemini. It will search through all your emails, including attachments and videos. Additionally, you can inquire about specific attachments. For example, if the answer resource is an attachment like a video, you’ll be able to ask about that video as well.
- You can summarize the email and/or email threads.
- Reply to emails using Gemini based on context. Long-press to preview
Gemini in meet expanding 68 languages
AI Teammate
With this bot, you can track all activities in your team’s workspace. You can ask it anything, such as whether a task is completed or how much time remains until a meeting. By using this bot, you’ll save a lot of time and avoid getting lost in files, emails, and other documents.
- You can customize it to meet the specific needs of your team.
NotebookLM
- Audio Overview: You can interact with Gemini 1.5 Pro by providing your materials in any format, making the experience personalized and interactive.
Developers
Gemini 1.5 and 1.5 Flash
Google AI, Vertex
Which model is the best fit for you?
Try AI Studio
- It is free
- You can try Flash and Pro models.
- Once you are done, you can obtain an API key and integration code with your configurations.
Gemma
- The lightweight model of Gemini.
- PaliGemma’s first vision model, LLMA, enables image capturing and querying based on the images.
- Gemma 2 is coming in June.
- A 27B parameter will be added soon to Gemma 2.
During the event, they mentioned interacting with AI over 120 times! 😂