Embodying AI: Why the Future of LLMs Is Visual

Apply for Beta

Embodying AI: Why the Future of LLMs Is Visual

May 20, 2026

Large language models (LLMs) like ChatGPT have proven astonishingly adept at generating text and engaging in conversation. Yet, in their current form, they remain disembodied voices in a chat box. As powerful as they are, something fundamental is missing: a visual, social layer on top of these LLMs to make AI more relatable and useful for everyday consumers. In other words, the next evolution of AI will be about giving it a face (literally) and a social presence. Tech leaders from OpenAI to Meta are hinting at this future, and early evidence from research and industry alike shows why visual and social AI could dramatically accelerate adoption of AI in our daily lives.

From Text to Multimodal: Adding the Visual Layer

Humans are highly visual creatures. In fact, over half of the human brain’s cortex is devoted to visual processing. We naturally communicate with facial expressions, body language, and imagery, not just text. So it’s no surprise that an AI confined to text can feel limited. To bridge the gap, today’s LLMs are rapidly evolving into multimodal AI, able to see and generate visual content, not just words. OpenAI’s GPT-4 already accepts images as input, and competitors like Google’s Gemini are “built from the ground up to be multimodal,” capable of understanding text, images, video, and audio. OpenAI even rolled out vision and voice features for ChatGPT, so it “can now see, hear, and speak” in conversations. These advances foreshadow a future where AI can seamlessly interact through visual media, describing what it sees, generating images on demand, or even guiding us through augmented reality.

However, visual AI isn’t just about an algorithm analyzing pixels. It’s also about giving AI a visual persona. Research shows that people respond strongly to embodied, human-like agents. In one study, an anthropomorphic robot assistant with facial expressions was trusted more and led to higher task engagement than a voice-only smart speaker. By simply giving a conversational AI a face and gaze, users felt more comfortable and performed better. This aligns with our intuitive experience: a friendly face or avatar can make an interaction feel natural, whereas disembodied chat can feel abstract or lifeless.

It’s no wonder, then, that AI leaders are pushing towards embodiment. Sam Altman, CEO of OpenAI, has emphasized that AI shouldn’t remain trapped behind a screen. OpenAI is actively exploring physical and virtual embodiment – backing efforts to put advanced models into robots – because as Altman says, “we live in a physical world, and we want things to happen in the physical world.” The true breakthrough will come when AI can move and act in our world, whether through actual robots or richly interactive avatars. This physical or visual presence is what will unlock capabilities beyond typing out answers, from helping around the house to engaging with us as companions.

Social AI: From Chatbot to Companion

Equally important is making AI social. Today’s LLM interactions are often one-on-one (user and chatbot) and task-oriented. But human intelligence is inherently social, we learn and thrive through interpersonal interaction and community. The next generation of AI will embrace that, turning chatbots into social companions, collaborators, and participants in our digital lives.

The thinking is that people might not flock to a generic Q&A bot, but they will enjoy interacting with an AI that has a name, face, and relatable personality. Snapchat’s chat assistant, My AI, was given a friendly avatar icon and injected into users’ friend lists, and in a matter of months it saw 150+ million users exchange over 10 billion messages in 2023.

The appeal goes beyond novelty. There’s a genuine human need for connection and conversation, which social AI can help fulfill (albeit with important caveats). Zuckerberg recently mused that many people would welcome AI friends in their lives, noting that “the average person has demand for meaningfully more [friends]” and suggesting AI avatars could help fill that gap. This vision of AI companions to chat with, play with, and learn from, is catching on fast, especially among younger users.

What makes an AI interaction social? It’s more than slapping a face on a chatbot. Social AI involves memory, personality, and interactivity. The latest LLM systems are being augmented with long-term memory and personalization, so they can remember your past conversations, adapt to your preferences, and develop a consistent persona over time. This means your AI could “remember who you are, what you like, and how you think”. We’re essentially teaching AI the art of conversation and relationship-building, skills like recalling that you hate mornings or that you prefer humorous banter, and responding in kind. When an AI can do that, it starts to feel less like a tool and more like a friend or colleague.

New Use Cases Unlocked by Visual & Social AI

By combining these trends, the visual embodiment and social intelligence of AI, we unlock a host of new use cases that go far beyond today’s chatbots. A few examples are already coming to fruition:

Education and Tutoring: AI can step into the role of tutors or mentors that feel more like a personal teacher. Picture a friendly avatar on your tablet that not only gives you correct answers, but also cheers you on with a smile, adapts its teaching style to you, and uses visuals (diagrams, videos, sketches) to explain concepts. Early studies suggest students may respond better to an enthusiastic on-screen tutor than to plain text. The visual element helps convey tone and encouragement, while the social element (memory of past struggles, a dash of personality) keeps students engaged. An embodied AI can even read your facial expression or confusion and adjust its approach, much like a human teacher.
Gaming and Entertainment: Game worlds are poised to be revolutionized by AI characters. Traditionally, non-player characters in games follow scripted dialogues and behaviors. Now, with LLM-driven agents, we can have NPCs that converse freely, remember the player’s actions, and coordinate with each other. This makes games far more immersive – every minor character could have unscripted depth. A visually embodied AI can express emotion and humor in ways plain text never could, which opens the door to AI-driven entertainment that is truly engaging.
Social Networking and Companionship: As mentioned, billions of people are already encountering AI “friends” in chat apps. We can expect social networks to evolve with AI participants woven into the fabric. There might be AI influencers with avatar profiles that post content (curated entirely by AI), AI moderators that help manage online communities, or personal companion AI that you bring into your group chats and video calls. Far from replacing human friends, these AIs could act as ice-breakers, assistants, or creative collaborators among groups of people. The goal for social AI should be to augment human interaction, not erode it. Used thoughtfully, an AI buddy might help someone practice social skills, or simply be a comforting presence at times when no one else is around, without supplanting human relationships.
Work and Collaboration: In professional settings, visual-social AIs could serve as coworkers or aides that participate in meetings (perhaps as a little avatar on the video call), remember context across projects, and facilitate teamwork. Instead of a disembodied voice from a smart speaker, an office AI with a face on a screen or a telepresence robot could project more trust and accountability, acting like a true team member.

Conclusion: Toward a Visual, Social AI Era

The writing on the wall is clear: the next chapter of AI will be defined by embodiment and social intelligence. LLMs have given machines the gift of language; now we’re giving them a face and a personality. This shift is more than cosmetic, it’s about aligning AI with the fundamentally visual and social way that humans experience the world. When AI can show you what it means and share experiences with you, it becomes vastly more accessible and useful to everyday people.

There will be challenges, of course. We’ll have to guard against pitfalls like over-attachment to AI friends or misuse of ultra-realistic avatars. But done responsibly, visual and social AI can enrich our lives, making interactions with technology more natural, engaging, and even fun. It’s a future where interacting with an AI could feel as easy as talking to a neighbor or playing with a pet, rather than typing into a void.

This vision is exactly what we’ve been building toward at Genies. We believe your AI shouldn’t live in a simple chat UI forever. In the near future, AI avatars will act as our digital “mini-me” companions: fully animated, expressive, and able to think and speak with the power of LLMs. Our game-ready AI avatars are designed to bring this concept to life, embodying AI in visually rich, social experiences. From personalized virtual friends to interactive characters in games and communities, we’re excited to help usher in the era of AI that you can see, feel, and socialize with.