Author Archives: Logan Kilpatrick

Delving into AI agents and where we are going next

The future is going to be full of AI agents, but there are still a lot of open questions on how to get there & what that world will look like. I had the chance to sit down with one of the deepest thinkers in the world of AI agents, Yohei Nakajima. If you want to check out the video of our conversion, you can watch it on YouTube:

<a href="https://medium.com/media/14b34006e9adc85e3cb22077614fd9b4/href">https://medium.com/media/14b34006e9adc85e3cb22077614fd9b4/href</a>

Where are we today?

There has been a lot of talk of agents over the last year since the initial viral explosion of HustleGPT, where the creator famously told the chatbot system that it had $100 and asked it to try and help him make money for his startup.

Since then, the conversation and interest around agents has not stopped, despite there being a shockingly low number of successful agent deployments. Even as someone who is really interested in AI and has tried many of the agent tools, I still have a grand total of zero agents actually running in production right now helping me (which is pretty disappointing).

Despite the lack of large scale deployments, companies are still investing heavily in the space as it is widely assumed this is the application of LLMs that will end up providing the most value. I have been looking more and more into Zapier as the potential launching point for large scale agent deployments. Most of the initial challenge with agent platforms is they don’t actually hook up to all the things you need them too. They much support Gmail but not Outlook, etc. But Zapier already does the dirty work of connecting with the worlds tools which gets me excited about the prospect this could work out as a tool.

Why haven’t AI agents taken off yet?

To understand why agents have not taken off, you need to really understand the flow that autonomous agents take when solving tasks. I talked about this in depth when I explored what agents were in another post from earlier last year. The TLDR is that current agents typical use the LLM system itself as the planning mechanism for the agent. In many cases, this is sufficient to solve a simple task, but as anyone who uses LLMs frequently knows, the limitations for these planners are very real.

Simply put, current LLMs lack sufficient reasoning capabilities to really solve problems without human input. I am hopeful this will change in the future with forthcoming new models, but it might also be that we need to move the planning capabilities to more deterministic systems that are not controlled by LLMs. You could imagine a world where we also fine-tune LLMs to specifically perform the planning task, and potentially fine-tune other LLMs to do the debugging task in cases where the models get stuck.

Beyond the model limitations, the other challenge is tooling. Likely the closest thing to a widely used LLM agent framework is the OpenAI Assistants API. However, it lacks many of the true agentic features that you would need to really build and autonomous agent in production. Companies like https://www.agentops.ai/ and https://e2b.dev are taking a stab at trying to provide a different layer of tooling / infra to help developers building agents, but these tools have not gained widespread adoption.

Where are we going from here?

The agent experience that gets me excited is the one that is spun up in the background for me and just automates away some task / workflow I used to do manually. It still feels like we are a very long way away from this, but many companies are trying this using browser automation. In those workflows, you can perform a task once and the agent will learn how to mimic the workflow in the browser and then do it for you on demand. This could be one possible way to decrease the friction in making agents work at scale.

Another innovation will certainly be at the model layer. Increased reasoning / planning capabilities, while coupled with increased safety risks, present the likeliest path to improved adoption of agents. Some models like Cohere’s Command R model are being optimized for tool use which is a common pattern for agents to do the things they need. It is not clear yet if these workflows will require custom made models, my guess is that general purpose reasoning models will perform the best in the long term but the short term will be won by tool use tailored models.

Don’t forget about GPT-4

By: Logan Kilpatrick

Re-posted from: https://logankilpatrick.medium.com/dont-forget-about-gpt-4-d5ab8c9493fc?source=rss-2c8aac9051d3------2

Exploring the model that changed the path of AI and machine learning history

The age of powerful language-based AI is upon us, and few players compare to the might and potential of OpenAI’s GPT-4. Let’s delve into the intricacies, capabilities, and potential applications of this revolutionary language model.

Picture the Power of GPT-4

GPT-4 has truly broken barriers with its ability to generate up to 25,000 words of text, a monumental increase of about eight times compared to its predecessor, chat GPT. This leap forward enhances GPT-4’s abilities in handling long passages of text, making it a significant tool for a range of applications requiring long-duration interactions or wide-spanning narratives.

Advanced Image Understanding

GPT-4’s advance into understanding, interpreting, and coherently describing images revolutionizes the idea of automated systems. Imagine snapping a picture of a scene, uploading it to GPT-4, and having the AI describe the visual elements perfectly. The idea that an AI can not only “see” an image but also make sense of different elements and predict outcomes, like explaining that cutting the strings of balloons would make them fly away, is fascinatingly next-gen.

GPT-4’s ability to understand images makes it an invaluable assistant in several fields — from virtual education to diverse areas where describing visuals in word processing is required.

Unique Challenges and Improvements

Like any technology, AI language models come with their challenges, including adversarial usage, unwanted content, and privacy concerns. However, OpenAI has put substantial effort into mitigating these issues. With GPT-4, the team has implemented further measures for safety, alignment, and usefulness to make the model more user-friendly and secure.

Groundbreaking Applications in Education

GPT-4’s potential in revolutionizing education is immense. Imagine enriching every classroom with a personal AI tutor capable of addressing questions on a wide range of subjects. Or a fifth-grader getting unlimited time for personalized math tutoring with this AI that never gets tired or impatient. GPT-4 makes tailor-made tutoring accessible to all, directly in the comfort of their homes.

Ultimately, GPT-4 elevates everyday life through advancements in AI. Whether it’s boosting productivity, teaching new skills, or simply organizing our days, AI like GPT-4 stands to ameliorate our lives in countless ways.

Shaping the Future of AI with Microsoft

The strategic partnership between OpenAI and Microsoft is aimed at transforming AI technology into useful tools accessible to everyone. Their concerted efforts lay the groundwork for harnessing AI’s full potential to enhance productivity, ultimately leading to an improved quality of life. GPT-4, a product born from the convergence of numerous technology advances, holds incredible promise for the future.

From enhancing education with AI-powered tutors to bringing valuable assistance into our lives, GPT-4 is on the verge of redefining our interactions with technology. As with any tool, ensuring that AI serves us correctly and safely is essential to leverage its benefits fully. As we sculpt the future of AI, learning, updating, improving, and transparency stand as our guiding tenets.

As we eagerly anticipate wider access to GPT-4 and similar AI, it’s critical to approach this revolutionary technology with informed understanding and responsible usage. OpenAI’s breakthrough serves as a testament to humanity’s unyielding prowess to innovate and evolve, even in the realms of artificial intelligence. Happy coding!

Source video [1]: https://www.youtube.com/watch?v=–khbXchTeE

Note: This blog post was generated by a GPT-4 pipeline as part of a demo for the AI Engineer Summit presentation in collaboration with Simon Posada Fishman.

Why is everyone in AI talking about Llamas?