Tag Archives: AI

5 things to build with Google’s new Nano Banana image editing & generation model

Re-posted from: https://medium.com/around-the-prompt/5-things-to-build-with-googles-new-nano-banana-image-editing-generation-model-ddfb0d167715?source=rss-2c8aac9051d3------2

How to build with Nano Banana for free in Google AI Studio and the Gemini API

Two weeks ago, we launch Nano Banana (aka Gemini 2.5 Flash Image) and it has taken the world by storm. As of the end of Sept, we already have more than 500,000,000 images edited just in the Gemini app, with hundreds of millions more across other surfaces. This model, which excels at targeted edits, can be used for some pretty wild use cases. In this blog, we will explore 5 simple ideas of how you can start using Nano Banana right now to solve actual problems people have. We will be using https:///aistudio.google.com which is completely free, along with the Gemini API.

As always, you are reading this on my personal blog, so you guessed it, these are my personal opinions : )

AI powered interior design and editing with Nano Banana

Personally, this is one of the coolest use cases. I have always struggled to image the possible in a room, but this model makes it super easy to do. In this example, which you can follow along with in AI Studio, we take a product image plus a scene, and let the user drag the image of the product into the scene, letting the Nano Banana model fuse them together into a single image. If you want to see the prompt used, which was not that complex, you can click the “code” tab and then “geminiService.ts” and scroll down to line 300. This is a great example of Gemini’s native spatial understanding capabilities coming into play, something which no other image model has.

If you want to riff on this example from Google AI Studio, just use the chat bar on the left to prompt the edits you want, the model will rebuild the app and make that experience possible (this will apply to all the other examples we look at as well!).

Character consistency and editing with Nano Banana

So far, I think this has been the use case folks are most awe-struck by, mostly because you can easily upload a picture of yourself and see it in action. But the Nano Banana model is exceptionally good at character consistency, meaning you can make targeted edits without distorting the key features of the original character. We made a free example of this called past forward in Google AI Studio where you can visualize what you would look like through the past 5 decades, it is pretty funny.

Image captured by author in Google AI Studio

The applications of this world class character consistency are endless. I have seen apps already going viral where you help people visualize what they would look like with different haircuts as an example. And like I showed before, the cool part about this experience in Google AI Studio is we can actually build that on the fly, let me try taking the example about and use the prompt “okay now take the same idea we have here with past forward but help me visualize 8 different haircut styles, take into account common men / women styles”. This will take around 90 seconds (I am doing it live while I write this blog), so hopefully it all works and turns out okay!

Okay wow, that is almost exactly what I was looking for (though not sure any of these styles are speaking to me). The level of complexity to build these types of products continues to go down, it is so cool to see! You really are 1 prompt away from a great idea these days.

Creative editing with Nano Banana

When I saw this example, I immediately went and took an image of my childhood home and sent it to my parents, their response was so positive, they loved it. The ability for the model to capture different stylistic behaviors, in this case, water coloring, is extremely impressive, while still retaining the DNA of the original picture (that is my home, now some AI derivative of it).

In this example, we use the Google Maps API to capture satellite data of a location and edit the image to be water color based. You can try this yourself in Google AI Studio if you want, it is a lot of fun to play around with! I also imagine there are lots of cool and unique businesses to be created with something like this (let you retrace some path through satellite images and do something creative with all these images).

Virtual “try on” experiences with Nano Banana

One of the biggest questions when someone does clothing shopping is “will this look good on me”. For the last 10 years there has been a huge amount of investment and innovation happening to try and bridge this gap. With Nano Banana, it now “just works”. You can take an image of yourself and a clothing item you want to image yourself in, and simply fuse the two together. From a technical POV, this is a near identical setup to the first example I showed above with home remodeling with AI.

The reason I wanted to include this example is that it is widely applicable. Everyone selling any physical product should be using this type of setup to showcase the product in different setups. You can play around with this try it on example app we created in Google AI Studio. You can also imagine you end up with a human AI avatar that does something like scrape your email and show you an inventory of all your personal clothes at home, which would be a great app to build : )!

Nano Banana for Video Generation

One of the last use cases I will talk about (even though there are 100’s more) is around video generation, specifically with Veo 3 (which we just dropped the price of by ~50%). One of the big challenges of video generation today is that the video’s are only 8 seconds when generated by an AI model. You need to stitch together multiple 8 second videos to create anything useful. Further, one of the most common failure modes is that the character consistency between 8 second videos ends up not being good enough and subtly changes in a way that breaks a longer form video. With Nano Banana however, you can lean on the model’s character consistency strength to ensure you have a good starting frame for every video you make.

In the example above, we are using tldraw’s canvas which lets you chain together different workflows and do AI explorations visually, including with our models like Nano Banana and Veo 3. You can try this example for free in Google AI Studio (but note that Veo does require a paid API key).

The tldraw canvas is very powerful, you can put together pretty much anything, but it takes a little to grok what is going on if you have never used it before. What I did that helped me was put in an image into the main chat UI, select the dropdown on the input field, and then go “Generate image” based on the image I provided I asked for a targeted edit.

Overall, there is so much to be built with Nano Banana. I have already seen thousands of new startups spawning around these very simple ideas, and some even going after the most ambitious AI image problems you can imagine. To me, what has made this so much fun is being able to vibe code it all in AI Studio. I am of course biases since I work on AI Studio but being able to play with or build apps around new frontier AI capabilities and having something up and running in ~90 seconds for free has never happened before. It is amazing to see the trend of democratizing access to be able to build with this technology. Happy building, and please send over any feedback about AI Studio’s build mode or the Nano Banana model!

5 things to build with Google’s new Nano Banana image editing & generation model was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

Going from 98% to 99.9% in AI is where all the work is

By: Logan Kilpatrick

Re-posted from: https://medium.com/around-the-prompt/going-from-98-to-99-9-in-ai-is-where-all-the-work-is-ff7f1adff6e4?source=rss-2c8aac9051d3------2

How to build in the age of AI, advice from Chamath Palihapitiya

There are lots of phenomena happening in AI right now. On one hand, going from idea to code to working app has never been easier. AI has proved it can dramatically accelerate the creation of very good demos / MVPs. But where is the value created in the world? I would posit that much of it comes down to actually making things work in production. This is more true now than ever when the barrier for entry in AI continues to go down.

Tools like https://bolt.new, https://lovable.dev/, https://v0.dev and others are enabling this new wave of accelerated software creation. For the long tail of builders, these tools work very well, but one of the main limitations is how to capture the “cartilage” that makes lots of companies actually work. I had a conversation with Chamath Palihapitiya about this, and he did a great job of capturing the state of this:

<a href="https://medium.com/media/20a6fb34944709b4682c1b3396cf88be/href">https://medium.com/media/20a6fb34944709b4682c1b3396cf88be/href</a>

So how do we get the last 2% and make some of these more difficult problems work? This is the $1,000,000 question. Right now, it still takes a lot of human work in order to translate super complex legacy processes into something powered by AI. Part of my inclination is that agents might be helpful to do this, but as Chamath mentioned, it’s likely this is going to be a “10 year process”.

One of the things I like to think about is the bitter lesson, which if folks have not heard about this can be summarized as the fact that general purpose approaches usually win out vs specialized approaches in technology specifially. In the content of getting this last 2% of reliability, you might imagine that what you go do is build a bunch of scaffolding, 100 different vertical agents, or even completely re-engineer some human system in order to work well for the age of AI. A lot of this depends on your timelines, but if you believe that model capabilities will keep scaling and generalizing to solve new problems, it is worth considering how much of an investment you should make into any one of those today, vs just waiting for the models to get good enough and solve the problem out of the box for you. The caveat here is the level of agency you should take vs waiting for the innovation to come to you is likely a factor of how much this change is going to disrupt you. If the chance is high, then you should pay the cost of building the scaffolding, doing the process re-engineering, etc in order to migrate the risk of large scale change.

At the same time as of all that is true, I was reminded by Sully this morning of just how beautiful it is that the barrier to creating software has come down 10x in the last 2 years, and what you can build has increased by 10x. The only thing that is stopping you is having an idea and the desire to solve the problem.

<a href="https://medium.com/media/e118586966c648a18b444c4237ea2327/href">https://medium.com/media/e118586966c648a18b444c4237ea2327/href</a>

So yeah, solving problems in large legacy systems is not easy (regulated industries, large companies, etc), but if you just want to build 0 to 1, there has never been a better time in human history than today to do so. So go build something people want, bet on the models progressing, and make the world better along the axis you care about.

Going from 98% to 99.9% in AI is where all the work is was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

The future of AI agents with Yohei Nakajima

By: Logan Kilpatrick

Re-posted from: https://logankilpatrick.medium.com/the-future-of-ai-agents-with-yohei-nakajima-2602e32a4765?source=rss-2c8aac9051d3------2

Delving into AI agents and where we are going next

The future is going to be full of AI agents, but there are still a lot of open questions on how to get there & what that world will look like. I had the chance to sit down with one of the deepest thinkers in the world of AI agents, Yohei Nakajima. If you want to check out the video of our conversion, you can watch it on YouTube:

<a href="https://medium.com/media/14b34006e9adc85e3cb22077614fd9b4/href">https://medium.com/media/14b34006e9adc85e3cb22077614fd9b4/href</a>

Where are we today?

There has been a lot of talk of agents over the last year since the initial viral explosion of HustleGPT, where the creator famously told the chatbot system that it had $100 and asked it to try and help him make money for his startup.

Since then, the conversation and interest around agents has not stopped, despite there being a shockingly low number of successful agent deployments. Even as someone who is really interested in AI and has tried many of the agent tools, I still have a grand total of zero agents actually running in production right now helping me (which is pretty disappointing).

Despite the lack of large scale deployments, companies are still investing heavily in the space as it is widely assumed this is the application of LLMs that will end up providing the most value. I have been looking more and more into Zapier as the potential launching point for large scale agent deployments. Most of the initial challenge with agent platforms is they don’t actually hook up to all the things you need them too. They much support Gmail but not Outlook, etc. But Zapier already does the dirty work of connecting with the worlds tools which gets me excited about the prospect this could work out as a tool.

Why haven’t AI agents taken off yet?

To understand why agents have not taken off, you need to really understand the flow that autonomous agents take when solving tasks. I talked about this in depth when I explored what agents were in another post from earlier last year. The TLDR is that current agents typical use the LLM system itself as the planning mechanism for the agent. In many cases, this is sufficient to solve a simple task, but as anyone who uses LLMs frequently knows, the limitations for these planners are very real.

Simply put, current LLMs lack sufficient reasoning capabilities to really solve problems without human input. I am hopeful this will change in the future with forthcoming new models, but it might also be that we need to move the planning capabilities to more deterministic systems that are not controlled by LLMs. You could imagine a world where we also fine-tune LLMs to specifically perform the planning task, and potentially fine-tune other LLMs to do the debugging task in cases where the models get stuck.

Beyond the model limitations, the other challenge is tooling. Likely the closest thing to a widely used LLM agent framework is the OpenAI Assistants API. However, it lacks many of the true agentic features that you would need to really build and autonomous agent in production. Companies like https://www.agentops.ai/ and https://e2b.dev are taking a stab at trying to provide a different layer of tooling / infra to help developers building agents, but these tools have not gained widespread adoption.

Where are we going from here?

The agent experience that gets me excited is the one that is spun up in the background for me and just automates away some task / workflow I used to do manually. It still feels like we are a very long way away from this, but many companies are trying this using browser automation. In those workflows, you can perform a task once and the agent will learn how to mimic the workflow in the browser and then do it for you on demand. This could be one possible way to decrease the friction in making agents work at scale.

Another innovation will certainly be at the model layer. Increased reasoning / planning capabilities, while coupled with increased safety risks, present the likeliest path to improved adoption of agents. Some models like Cohere’s Command R model are being optimized for tool use which is a common pattern for agents to do the things they need. It is not clear yet if these workflows will require custom made models, my guess is that general purpose reasoning models will perform the best in the long term but the short term will be won by tool use tailored models.

juliabloggers.com

A Julia Language Blog Aggregator