Author Archives: Logan Kilpatrick

5 things to build with Google’s new Nano Banana image editing & generation model

Re-posted from: https://medium.com/around-the-prompt/5-things-to-build-with-googles-new-nano-banana-image-editing-generation-model-ddfb0d167715?source=rss-2c8aac9051d3------2

How to build with Nano Banana for free in Google AI Studio and the Gemini API

Two weeks ago, we launch Nano Banana (aka Gemini 2.5 Flash Image) and it has taken the world by storm. As of the end of Sept, we already have more than 500,000,000 images edited just in the Gemini app, with hundreds of millions more across other surfaces. This model, which excels at targeted edits, can be used for some pretty wild use cases. In this blog, we will explore 5 simple ideas of how you can start using Nano Banana right now to solve actual problems people have. We will be using https:///aistudio.google.com which is completely free, along with the Gemini API.

As always, you are reading this on my personal blog, so you guessed it, these are my personal opinions : )

AI powered interior design and editing with Nano Banana

Personally, this is one of the coolest use cases. I have always struggled to image the possible in a room, but this model makes it super easy to do. In this example, which you can follow along with in AI Studio, we take a product image plus a scene, and let the user drag the image of the product into the scene, letting the Nano Banana model fuse them together into a single image. If you want to see the prompt used, which was not that complex, you can click the “code” tab and then “geminiService.ts” and scroll down to line 300. This is a great example of Gemini’s native spatial understanding capabilities coming into play, something which no other image model has.

If you want to riff on this example from Google AI Studio, just use the chat bar on the left to prompt the edits you want, the model will rebuild the app and make that experience possible (this will apply to all the other examples we look at as well!).

Character consistency and editing with Nano Banana

So far, I think this has been the use case folks are most awe-struck by, mostly because you can easily upload a picture of yourself and see it in action. But the Nano Banana model is exceptionally good at character consistency, meaning you can make targeted edits without distorting the key features of the original character. We made a free example of this called past forward in Google AI Studio where you can visualize what you would look like through the past 5 decades, it is pretty funny.

Image captured by author in Google AI Studio

The applications of this world class character consistency are endless. I have seen apps already going viral where you help people visualize what they would look like with different haircuts as an example. And like I showed before, the cool part about this experience in Google AI Studio is we can actually build that on the fly, let me try taking the example about and use the prompt “okay now take the same idea we have here with past forward but help me visualize 8 different haircut styles, take into account common men / women styles”. This will take around 90 seconds (I am doing it live while I write this blog), so hopefully it all works and turns out okay!

Okay wow, that is almost exactly what I was looking for (though not sure any of these styles are speaking to me). The level of complexity to build these types of products continues to go down, it is so cool to see! You really are 1 prompt away from a great idea these days.

Creative editing with Nano Banana

When I saw this example, I immediately went and took an image of my childhood home and sent it to my parents, their response was so positive, they loved it. The ability for the model to capture different stylistic behaviors, in this case, water coloring, is extremely impressive, while still retaining the DNA of the original picture (that is my home, now some AI derivative of it).

In this example, we use the Google Maps API to capture satellite data of a location and edit the image to be water color based. You can try this yourself in Google AI Studio if you want, it is a lot of fun to play around with! I also imagine there are lots of cool and unique businesses to be created with something like this (let you retrace some path through satellite images and do something creative with all these images).

Virtual “try on” experiences with Nano Banana

One of the biggest questions when someone does clothing shopping is “will this look good on me”. For the last 10 years there has been a huge amount of investment and innovation happening to try and bridge this gap. With Nano Banana, it now “just works”. You can take an image of yourself and a clothing item you want to image yourself in, and simply fuse the two together. From a technical POV, this is a near identical setup to the first example I showed above with home remodeling with AI.

The reason I wanted to include this example is that it is widely applicable. Everyone selling any physical product should be using this type of setup to showcase the product in different setups. You can play around with this try it on example app we created in Google AI Studio. You can also imagine you end up with a human AI avatar that does something like scrape your email and show you an inventory of all your personal clothes at home, which would be a great app to build : )!

Nano Banana for Video Generation

One of the last use cases I will talk about (even though there are 100’s more) is around video generation, specifically with Veo 3 (which we just dropped the price of by ~50%). One of the big challenges of video generation today is that the video’s are only 8 seconds when generated by an AI model. You need to stitch together multiple 8 second videos to create anything useful. Further, one of the most common failure modes is that the character consistency between 8 second videos ends up not being good enough and subtly changes in a way that breaks a longer form video. With Nano Banana however, you can lean on the model’s character consistency strength to ensure you have a good starting frame for every video you make.

In the example above, we are using tldraw’s canvas which lets you chain together different workflows and do AI explorations visually, including with our models like Nano Banana and Veo 3. You can try this example for free in Google AI Studio (but note that Veo does require a paid API key).

The tldraw canvas is very powerful, you can put together pretty much anything, but it takes a little to grok what is going on if you have never used it before. What I did that helped me was put in an image into the main chat UI, select the dropdown on the input field, and then go “Generate image” based on the image I provided I asked for a targeted edit.

Overall, there is so much to be built with Nano Banana. I have already seen thousands of new startups spawning around these very simple ideas, and some even going after the most ambitious AI image problems you can imagine. To me, what has made this so much fun is being able to vibe code it all in AI Studio. I am of course biases since I work on AI Studio but being able to play with or build apps around new frontier AI capabilities and having something up and running in ~90 seconds for free has never happened before. It is amazing to see the trend of democratizing access to be able to build with this technology. Happy building, and please send over any feedback about AI Studio’s build mode or the Nano Banana model!

5 things to build with Google’s new Nano Banana image editing & generation model was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

Going from 98% to 99.9% in AI is where all the work is

By: Logan Kilpatrick

Re-posted from: https://medium.com/around-the-prompt/going-from-98-to-99-9-in-ai-is-where-all-the-work-is-ff7f1adff6e4?source=rss-2c8aac9051d3------2

How to build in the age of AI, advice from Chamath Palihapitiya

There are lots of phenomena happening in AI right now. On one hand, going from idea to code to working app has never been easier. AI has proved it can dramatically accelerate the creation of very good demos / MVPs. But where is the value created in the world? I would posit that much of it comes down to actually making things work in production. This is more true now than ever when the barrier for entry in AI continues to go down.

Tools like https://bolt.new, https://lovable.dev/, https://v0.dev and others are enabling this new wave of accelerated software creation. For the long tail of builders, these tools work very well, but one of the main limitations is how to capture the “cartilage” that makes lots of companies actually work. I had a conversation with Chamath Palihapitiya about this, and he did a great job of capturing the state of this:

<a href="https://medium.com/media/20a6fb34944709b4682c1b3396cf88be/href">https://medium.com/media/20a6fb34944709b4682c1b3396cf88be/href</a>

So how do we get the last 2% and make some of these more difficult problems work? This is the $1,000,000 question. Right now, it still takes a lot of human work in order to translate super complex legacy processes into something powered by AI. Part of my inclination is that agents might be helpful to do this, but as Chamath mentioned, it’s likely this is going to be a “10 year process”.

One of the things I like to think about is the bitter lesson, which if folks have not heard about this can be summarized as the fact that general purpose approaches usually win out vs specialized approaches in technology specifially. In the content of getting this last 2% of reliability, you might imagine that what you go do is build a bunch of scaffolding, 100 different vertical agents, or even completely re-engineer some human system in order to work well for the age of AI. A lot of this depends on your timelines, but if you believe that model capabilities will keep scaling and generalizing to solve new problems, it is worth considering how much of an investment you should make into any one of those today, vs just waiting for the models to get good enough and solve the problem out of the box for you. The caveat here is the level of agency you should take vs waiting for the innovation to come to you is likely a factor of how much this change is going to disrupt you. If the chance is high, then you should pay the cost of building the scaffolding, doing the process re-engineering, etc in order to migrate the risk of large scale change.

At the same time as of all that is true, I was reminded by Sully this morning of just how beautiful it is that the barrier to creating software has come down 10x in the last 2 years, and what you can build has increased by 10x. The only thing that is stopping you is having an idea and the desire to solve the problem.

<a href="https://medium.com/media/e118586966c648a18b444c4237ea2327/href">https://medium.com/media/e118586966c648a18b444c4237ea2327/href</a>

So yeah, solving problems in large legacy systems is not easy (regulated industries, large companies, etc), but if you just want to build 0 to 1, there has never been a better time in human history than today to do so. So go build something people want, bet on the models progressing, and make the world better along the axis you care about.

Going from 98% to 99.9% in AI is where all the work is was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

Everything you need to know about the Gemini API as a developer in less than 5 minutes

By: Logan Kilpatrick

Re-posted from: https://medium.com/around-the-prompt/everything-you-need-to-know-about-the-gemini-api-as-a-developer-in-less-than-5-minutes-5e75343ccff9?source=rss-2c8aac9051d3------2

Get started building with the Gemini API

Gemini is Google’s family of frontier generative AI models, built from the ground up to be multi-modal and long context (more on this later). Gemini is available across the entire Google suite, from Gmail to the Gemini App. For developers who want to build with Gemini, the Gemini API is the best place to get started.

In this article, we will explore what the Gemini API offers, how to get started using Gemini for free, and more advanced use cases like fine-tuning. As always, you are reading my personal blog, so you guessed it, these are my personal views. Let’s dive in!

How can I test the latest Gemini models?

If you want to first test the Gemini models (everything from the latest experimental models to production models) without writing running any code, you can head to Google AI Studio. Once you get done testing there, you can also generate a Gemini API key in AI Studio (“Get API Key” in the top left corner). AI Studio is free and there is a generous free tier on the API as well, which includes 1,500 requests per day with Gemini 1.5 Flash.

Image captured by Author in aistudio.google.com

What does the Gemini API offer?

The Gemini API comes standard with most of the things developers are looking for. At a high level, it comes with:

Fine-tuning support for Gemini 1.5 Flash
Context caching, to help reduce production deployment costs
Code execution, to augment the models capabilities by running code
Structured outputs, to extract data from input sources
Video, image, and audio understanding
Document processing, supporting PDFs up to 1,000 pages long

And much more! In general, the Gemini API offers most if not all of the features developers have come to expect when building with large language model API’s, in addition to many things that are unique to Gemini (like long context, video understanding, and more).

<a href="https://medium.com/media/432672276d4baf75d0bab4ef5cd3c587/href">https://medium.com/media/432672276d4baf75d0bab4ef5cd3c587/href</a>

What models does the Gemini API support?

By default, the two model variants available in the Gemini API as of September 21st, 2024 are Gemini 1.5 Flash and Gemini 1.5 Pro. There are different instances of these models available, some of which are newer and have performance updates. Each model also offers different features, such as the context length of ability for the model to be tuned. You can check out the Gemini models page for more details.

Image captured by Author on ai.google.dev

Sending your first Gemini API request

With as little as 6 lines of code, you can send your first API request, make sure to get your API key from Google AI Studio before running the code below:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["API_KEY"])

model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Explain how AI works")
print(response.text)

The Gemini API SDK’s also support creating a chat object which makes it so you can append messaged to a simple structure:

model = genai.GenerativeModel("gemini-1.5-flash")
chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hello"},
        {"role": "model", "parts": "Great to meet you. What would you like to know?"},
    ]
)
response = chat.send_message("I have 2 dogs in my house.")
print(response.text)
response = chat.send_message("How many paws are in my house?")
print(response.text)

If you want a simple repo with a little more complexity to get started with, check out the official Gemini API quickstart repo on GitHub.

How much does the Gemini API cost?

There are two tiers in the Gemini API, the free tier and paid. The former is well, free, and the later comes with an increased rate limit intended to support production workloads. Gemini 1.5 Flash is the most competitively priced large language model in its capability class and recently had its price decreased by 70%.

Image captured from Google Developers Blog

Or put another way, you can access 1.5 billion tokens for free with Gemini every single day:

<a href="https://medium.com/media/c12dcafb260435d98e04066ea29271f6/href">https://medium.com/media/c12dcafb260435d98e04066ea29271f6/href</a>

Fine-tuning Gemini 1.5 Flash

Gemini 1.5 Flash can be fine-tuned for free through Google AI Studio and the tuned model does not cost more to use than the base model, a benefit that is rather unique in the AI ecosystem. Once you tune the model, it can be used as a drop in replacement in the existing code you have. Google AI Studio also comes with sample datasets to do testing tuning with and a mode called “Structured prompting” which is useful for creating fine-tuning datasets.

Image capture by Author in Google AI Studio

Closing thoughts

The Gemini API continues to get better week over week, there is a steady stream of new features landing which continue to improve the developer experience. If you have feedback, suggestions, or questions, join the conversation on the Google AI developer forum. Happy building!

Everything you need to know about the Gemini API as a developer in less than 5 minutes was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

juliabloggers.com

A Julia Language Blog Aggregator