BreakthroughMAY 28, 2026 · ACCESSIBILITY · VISION AI

What GPT-4o Vision Actually Changed for Blind Users

Be My AI processes 2M photos a month. Aira sells minutes by the hundred. The qualitative shift was not better description — it was always-on description.

By Kadin Nestler · May 28, 2026 · 11 min read

Share X LinkedIn Email

Be My Eyes was a video-call app. A blind user pointed their phone camera at a thing, a sighted volunteer somewhere on the planet picked up, and described what was on the screen. Founded in 2015 by Hans Jørgen Wiberg, a visually impaired Danish craftsman, it had 10,000 volunteers within the first 24 hours of launch. By early 2023 it had roughly 6 million volunteers and around 500,000 blind users across 150 countries.

That was the shape of the product for eight years. Useful, free, and structurally bottlenecked by one thing: a volunteer has to be available, and the user has to be willing to call one.

In February 2023, OpenAI phoned Be My Eyes and asked if they wanted early access to GPT-4's vision capability. By March 2023, "Be My AI" was in beta. By September 2023 it was open to hundreds of thousands of users. By the time GPT-4o shipped in May 2024 with real-time vision and lower latency, Be My AI was processing roughly 2 million photographs a month and had become, for many users, the default tool they reached for before they would have called a person.

The shift is not "AI describes images better than a volunteer." It usually does not. The shift is that AI describes images at 2 a.m., in the kitchen, when you do not want to bother anyone, when the question is small, when you do not want to be on a video call with a stranger to read a thermostat. The product changed when the friction of asking dropped to roughly zero.

The volunteer model and what it could not do

Be My Eyes' volunteer service still works exactly the way it did in 2015. You open the app, hit "Call a volunteer," and it rings out to sighted volunteers worldwide until one picks up. Average pickup time is under 30 seconds in well-staffed time zones. The volunteer sees through your phone camera, you talk through your phone speaker, they describe what they see. The service has grown to 8.3 million volunteers as of January 2025 and handles 43 million requests per year at a reported 97% customer satisfaction.

The volunteer model has three constraints that nothing inside the model can fix.

It requires a stranger. Not every blind user wants to call a stranger to read a piece of mail. Especially mail that may be a medical bill, a court summons, a paycheck stub, or a letter from a family member. The volunteer is well-intentioned and screened, but they are still seeing into your life.

It requires availability. Volunteer density is concentrated in the United States and Western Europe. A user in Senegal who speaks French at 4 a.m. local time may wait several minutes for a pickup, or get bounced through three or four volunteers who do not speak the language. The 180-language coverage Be My Eyes advertises is real on paper and uneven in practice.

It requires the user to think a task is worth bothering someone for. This is the constraint nobody outside the community talks about. Blind users have told the Be My Eyes team for years that they ration their volunteer calls — they save them for "real" problems and live with annoyance for everything below the threshold. The threshold is the part that AI demolished.

What Be My AI actually does

Be My AI is built into the same app. Instead of "Call a volunteer," you tap "Describe with Be My AI," take a photo, and a few seconds later GPT-4 (now GPT-4o) returns a paragraph-length description. You can then ask follow-up questions in voice or text — "what color is the third button from the left," "what does the small text under the logo say," "is there any blood on the bandage."

Milagros Costabel, a totally blind writer who profiled the tool for Slate in October 2023, described what the follow-up loop unlocked: she could ask the AI to identify the colors of clothes in her closet and suggest combinations she would not have thought of on her own, photograph restaurant menus and ask for specific dishes that met dietary criteria, then go back through years of old photos on her camera roll and have them, in her words, "come to life before me, transformed into vivid descriptions." She also described, plainly, the dependency risk — "Artificial intelligence can be wrong, and a blind person like me would have little way of noticing unless I knew in advance what was in an image" — and the bigger fear, that the company providing the API could pull it at any time and the assistance she had built her routine around would vanish.

Lucy Edwards, a UK-based blind activist with several million followers across TikTok and Instagram, was one of the most public early adopters. Her demonstrations covered the same use cases the Be My Eyes team had been documenting internally: matching outfits, identifying food in the fridge, reading mail, describing photos sent by family. Her phrasing for what the experience felt like was the line that traveled: she could finally ask "small" questions without imposing on anyone.

The official Be My Eyes "100 ways to use" list, refreshed after the AI launch, reads like a catalog of small tasks the volunteer model was always overserving or underserving: identifying which can in the pantry is tomatoes versus beans, reading the dosage on a prescription bottle, checking whether the milk in the fridge has expired, learning the buttons on a new microwave, decoding the symbols on a clothing-care tag, telling which set of keys is which by their shape. None of these are emergencies. All of them used to require either memorizing layouts, asking a family member, or accepting that the question was not worth asking.

WHAT GPT-4 UNLOCKED THAT 8 MILLION VOLUNTEERS COULD NOT

The volunteer model has one binary state: connected or not. The AI surface has six. You can ask the same question three different ways. You can pause and come back. You can ask the model to read all the text on the page, then re-ask it to focus on just the small print at the bottom. You can take ten photos of the same thing from different angles and have the model reconcile them. You can ask for a description in plain language, then ask for it again in clinical language. You can ask the model to compare two photos. None of these interaction patterns are possible with a human volunteer on a 90-second call. They are possible with an API.

How it fits with the rest of the stack

Be My AI is not the only AI vision tool a blind user reaches for in 2026. The category is now a small ecosystem, and the tools have settled into reasonably clear lanes.

Microsoft Seeing AI, launched in 2017 by Saqib Shaikh — a Microsoft engineer who lost his sight at age 7 — is the most heavily-used standalone tool. It is iOS-only, free, and has logged over 10 million tasks. It is purpose-built for narrow channels: short text, full documents, products via barcode, currency, color, scene description, handwriting recognition, face recognition for people the user has trained. AppleVis, the largest community of blind iOS users, gave it the 2024 Golden Apple for best app. Seeing AI's strength is reliability on the narrow channels — it is consistently rated the most accurate tool for printed-text reading.

Google Lookout is the Android equivalent. Same broad shape, fewer features, no face recognition. It is the default the Android-first half of the community uses.

Aira is the higher-touch alternative — a paid service that connects users to a trained human agent (not a random volunteer) within seconds, 24/7. Pricing runs on minute-based subscription tiers. Aira's value proposition is the credentialed agent — trained on accessibility, bound by privacy, capable of handling complex tasks like reading a legal document, navigating a hospital, or walking a user through an unfamiliar transit hub. Hundreds of organizations now sponsor free Aira access for their venues — Kansas City International Airport added it in August 2025, RWJBarnabas Health sponsors it across its hospital network, several universities sponsor it campus-wide. The free five-minutes-every-48-hours tier exists for short emergencies. The subscription model exists because some users do not want to pay per minute to read their mail. Aira also has its own AI assistant for photo descriptions.

Envision Glasses are a hardware play — a pair of smart glasses built on Google Glass Enterprise Edition 2 hardware running Envision's software. They sell for around $3,500 in the US with a $200/year subscription on top. The "Ask Envision" feature ships GPT-powered scene questions to the wearer through bone-conduction speakers. The category is small but growing — Envision is what users who want a true hands-free experience adopt when the smartphone-out-of-pocket pattern is the friction.

Ray-Ban Meta and Oakley Meta glasses have become the surprise mass-market entry. Be My Eyes launched on Ray-Ban Meta in November 2024 — a user says "Hey Meta, Be My Eyes" and a sighted volunteer (or Meta's own multimodal AI) gets routed the live camera feed through the glasses. The glasses cost a few hundred dollars instead of a few thousand. They look like Ray-Bans. They have become the wedge that pulled smart-glasses-for-accessibility out of the assistive-tech catalog and into Sunglass Hut.

The pattern across all five tools is the same: GPT-class vision models commoditized the underlying recognition, and each product is now competing on form factor, latency, integration depth, and the trust profile the user wants — anonymous AI, named human, both at once.

The honest limitations

Vision-language models hallucinate. They hallucinate less than they did in early 2023, but they still hallucinate, and a blind user has structurally less ability to catch it than a sighted one. Costabel made this point in writing. Several beta testers on the AppleVis forum raised it in 2023 — one specifically asked the Be My Eyes team how to report cases where the AI confidently described a window as a door. The mechanism for reporting exists. The error rate is not published.

Low-light accuracy is meaningfully worse than daylight accuracy. A user trying to identify pill bottles in a dim bathroom at 3 a.m. is closer to the failure mode than a user reading mail at a kitchen table at noon. Several users on r/Blind and in AccessibleAndroid threads have flagged this specifically as the gap they wish the model would close before they fully trust it for medication confirmation.

Latency is variable. GPT-4o brought response times down from the 8-12 seconds the original GPT-4 vision API was hitting in 2023 to roughly 2-4 seconds for a single photograph in 2026. Real-time video — the demo OpenAI showed in May 2024 where GPT-4o described a New York street scene live — is still gated behind paid tiers and rolls out unevenly. The free Be My AI tier inside Be My Eyes runs on a frame-based pattern: snap a photo, wait for description, ask follow-up. It is not yet the "ambient camera describing the world to you" demo from the launch event. That product still requires the Ray-Ban Meta integration or a paid ChatGPT Plus subscription.

The biggest unspoken limitation is the one Costabel named: the providers can pull the plug. Be My Eyes is a free app, and Be My AI is free because OpenAI is sponsoring the API cost. If that sponsorship changes — if OpenAI decides to charge, or if Be My Eyes loses access — the daily-life pattern hundreds of thousands of users have built around the tool gets disrupted. The community remembers the apps that died: KNFB Reader, several earlier image-description tools that depended on now-discontinued APIs. Trust in the durability of these tools is lower than the marketing implies. That is structurally rational.

What changed about daily life

The right way to read this category is not "AI now helps blind people see." That framing is what the launch coverage in 2023 reached for, and it is the framing that the community itself has consistently pushed back on. Blind people were not waiting to be helped. They had a working set of tools, a deep set of techniques, and a low tolerance for technology that promised independence and shipped dependency.

What actually changed is the threshold for which questions get asked. Pre-2023, a blind user reading mail at home would either memorize the shape of envelopes, ask a family member when they were around, save the pile for the weekly volunteer reader, or call Be My Eyes if the envelope looked official. Post-Be-My-AI, the same user takes a photo of every envelope as it comes in and gets a one-sentence summary in three seconds. The decision tree collapsed. The set of envelopes that get opened on the day they arrive went up. The pile that used to sit on the kitchen table for a week stopped existing.

That is not "AI replacing volunteers." It is "AI handling the long tail of small questions that the volunteer service was never going to be the right tool for." Volunteer call volume on Be My Eyes did not collapse after the AI launch. It stayed roughly flat while AI usage exploded. Two different products, two different jobs to be done.

The same pattern showed up in groceries. Pre-AI, a blind shopper either memorized aisle layouts, asked the store for a helper (most chains comply, half the time the helper does not show up), or used the Aira sponsored hours that the chain had bought. Post-AI, the same shopper points the Ray-Ban Meta glasses at a shelf, asks "is the Heinz ketchup on the left or the right," and gets an answer in two seconds. They might still call Aira for the complicated stuff — pharmacy questions, returns, anything that needs navigating a person — but the basic shelf-reading loop moved to the model.

In mail-reading, medication identification, appliance buttons, restaurant menus, photo descriptions, and "what does this thing in my hand look like" — five of the top six tasks the volunteer service was getting called for in 2022 — AI took the long tail and the volunteers retained the complex top. The total volume of visual assistance the community gets in a year went up roughly 10x, by Be My Eyes' own internal numbers. Most of that increase is questions that would never have been asked five years ago.

That is the real story of what GPT-4o vision changed. Not better description. Always-on, low-friction, no-one-bothered description, available at the moment of the question.

What this means for businesses serving accessibility users

If your business interacts with blind or low-vision users — a hotel, an airline, a hospital, a bank, a utility, a retailer — the relevant operational question changed too. You used to be designing for "the user calls our accessibility support line, possibly through a relay service." You are now designing for "the user is wearing camera-equipped smart glasses and has GPT-class vision on the device that can read your signage, your menu, your appliance panel, and your written instructions."

That changes what high-contrast signage is for. It changes what audio description on your website is for. It does not eliminate the need for either — relying on the user's tool to bridge a gap in your design is the wrong stance — but it does mean the user's tool will bridge gaps you did not realize you had. The first audit a hotel chain ran after the Ray-Ban Meta launch found that their in-room temperature control panels were unreadable by both a human descriptor and by Be My AI because the white-on-white labeling reflected too much glare. The fix was a $40 sticker pack. The same audit would have surfaced the same problem in 2019, but nobody had run it.

The other shift is that the user is now arriving with a transcript. Hotel front desks have started receiving guests who say "your check-in form is the second sheet on the right side of the counter, correct," because their glasses already told them. That is a positive change for accessibility — the user knows what they are walking into — but it requires the front-of-house staff to assume the same baseline competence with the environment that they would assume of a sighted guest. The patronizing default — slow speech, repeated explanations, treating the question as if the user does not understand the space — was bad in 2019 and is worse now. The user has GPT-4o in their pocket.

Sources

Cite this article

Ascero AI. “What GPT-4o Vision Actually Changed for Blind Users.” May 28, 2026. https://asceroai.com/news/gpt4o-vision-blind-users-2026

Free to reference with attribution and a link back to this page.

Did this land? Pass it on.