First Impressions of GPT-4o
I’ve been exploring the capabilities of the recently announced successor to GPT-4: GPT-4o (“omni”). At a glance, GPT-4o feels like a natural evolution of the GPT family. Not just bigger or faster, but more flexible in how it interacts with me across text, voice and vision. Where GPT-4 felt like a defining step into multimodality, GPT-4o feels more like that step applied across real workflows I care about.
What’s Different: From GPT-4 to GPT-4o
When comparing GPT-4 to GPT-4o, a few clear shifts stand out.
Speed and responsiveness GPT-4o is noticeably snappier in its responses across text interactions. This is something I notice immediately when switching between models. Everything feels more immediate, which makes longer back-and-forth sessions less mentally taxing.
Multimodal integration GPT-4 introduced image and text support, but GPT-4o extends this further by reasoning natively across text, vision and audio. I can move between formats without consciously changing how I interact. A typed question, a spoken follow-up, or an image dropped into the conversation all feel part of the same flow.
Natural voice mode The new voice interface feels genuinely conversational. Rather than stopping to wait for me to finish typing, it listens, responds, and adapts in a way that feels fluid. This does not feel like text-to-speech layered on top of a text model. It feels like voice is a first-class input.
Multilingual understanding and translation One of the more impressive moments during the launch was seeing just how capable GPT-4o is with languages. During the live OpenAI demonstration, CTO Mira Murati spoke to the model in Italian and had it translate fluidly between English and Italian in real time. Seeing this play out live made it clear that multilingual reasoning is not an afterthought in GPT-4o, but a core capability. In practice, this opens up far more natural translation, language learning, and cross-language conversations without breaking the flow of interaction.
Natural Voice Conversations
One of the standout features for me has been natural voice mode. Previous voice features felt interesting but slightly awkward. GPT-4o’s voice interactions feel far more natural and human.
I experimented with using voice mode for career-focused conversations, asking questions like “I enjoy DevOps and automation, but I’m unsure whether to specialise further or broaden my skills. How would you approach that decision?” and then talking through the response as if I were having a real discussion rather than issuing a prompt. I talked through where I am professionally, areas I enjoy, areas I feel less confident in, and where I might want to head next. The conversation felt closer to speaking with a mentor or colleague than issuing prompts to a tool.
What stood out most was the pacing and responsiveness. The model responded quickly, picked up on context, and asked sensible follow-up questions. I did not feel the need to carefully structure my thoughts. I could simply talk naturally.
This made it easier to explore:
- Career direction and long-term skill development
- Weighing trade-offs between specialising and staying broad
- Talking through uncertainty without feeling rushed to produce a perfect prompt
It was one of the first times an AI interaction genuinely felt conversational rather than transactional.
Multimodal Features in Practice
Beyond voice, GPT-4o’s ability to reason across multiple input types has been surprisingly useful in day-to-day tasks.
Images and text together When troubleshooting or reviewing configuration issues, I can share screenshots alongside written context and get coherent, relevant responses that take both into account.
Voice and visuals combined In some cases, I’ve described a problem verbally while showing an image or diagram. GPT-4o handled this combination naturally, without losing context or requiring me to repeat myself.
Longer written workflows Even with lengthy inputs such as logs or structured notes, the faster response times make interactions feel smoother and less interruptive.
These are small things individually, but together they reduce friction in a way that adds up.
Where GPT-4 Still Has Strengths
It is worth noting that GPT-4 can still feel more deliberate and cautious in certain scenarios, particularly when dealing with dense or highly specialised text. There are times when that slower, more methodical approach is desirable.
For my everyday use though, the balance of speed, flexibility and multimodal reasoning in GPT-4o often makes it the more practical choice.
Final Thoughts
GPT-4o does not feel like a radical leap in intelligence. Instead, it feels like a meaningful step towards a more natural way of working with AI.
The ability to move seamlessly between typing, speaking and showing context has changed how I interact with the model. Natural voice conversations in particular have made it easier to think out loud, explore ideas, and reflect without over-engineering prompts.
For learning, problem-solving and general questions, GPT-4o feels less like a tool I query and more like an assistant I can talk to. I’m looking forward to exploring how this more natural style of interaction shapes the way I work with AI over the coming weeks and months.