Beyond Alt Text: Rethinking Visual Description in the Age of AI

10/16/2025

Tech Notes

Artificial intelligence (AI) is rewriting the rules of digital communication. Text is written in seconds, images appear with a single prompt, and now – without lifting a finger – you can get a caption for almost any photo. Alt text, the essential descriptive layer that blind and low vision users rely on to make visual content meaningful, is being auto-generated everywhere: from Facebook and Twitter to even LinkedIn and Microsoft Office. The promise with AI, at least on the surface, is enticing: more content coverage, more users reached, seamless inclusion with less manual labor. But as is often the case with technological “solutions” to longstanding accessibility gaps, what appears to be progress may in fact obscure or even reinforce the very inequities it claims to address.

The Surface Appeal of Automated Descriptions

The appeal of automated image description is understandable. Historically, image description has been labor-intensive, inconsistently applied, and largely dependent on individual effort or organizational resources. In many digital spaces, alt text remains an afterthought at best and an unknown feature at worst. AI-generated descriptions appear to offer an elegant, scalable remedy to this persistent problem. With minimal effort, platforms can now ensure that at least some description accompanies an image, even when authors provide none.

Moreover, the use of large language models in accessibility contexts such as Be My Eyes or Seeing AI demonstrates clear potential for enhancing user autonomy. These tools allow blind users to obtain conversational, contextual explanations of images in real time, far beyond the capacity of static alt text alone. In these instances, the assistive utility is both immediate and tangible.

But the core issue is not whether AI can describe images (which it can, at least to a limited extent) but whether the type of description it provides is meaningful, trustworthy, and responsive to the actual needs of blind and low vision users. And in many cases, the answer is “no.”

If you’ve encountered AI-generated alt text, you might’ve seen something like this: “Image may contain: outdoor, dog, person, grass.”

Technically true, maybe. But on a practical level it’s almost useless. There’s no narrative, no sense of relationship, and certainly no understanding of what makes the image meaningful in context. This isn’t just a one-off flaw; it’s a symptom of how these systems work. AI models don’t see images in the same way humans do. They analyze pixels, compare features, and generate statistically probable descriptions based on patterns in training data. What they lack is intentionality. They don’t know what’s relevant, emotionally salient, or functionally important – which is a critical gap in making visual content accessible to users.

And yes, in some contexts, this is a meaningful improvement over an empty alternative. People frequently forget to add alt text to their posts. Institutional image databases are overloaded. For a blind user trying to browse a website, some vague description is better than total silence. In this sense, AI is filling a long-standing gap in the digital ecosystem. But the thing is, images aren't just data. Alt text descriptions are part of how blind people navigate the world, interpret media, and form impressions. When you strip them of context, intention, or audience awareness, you risk doing more harm than good.

The Nuance AI Can’t (Yet) Grasp

AI struggles because image description isn’t just about what’s visible. It’s about what’s relevant, and which usually depends on context. Describing a graph for a researcher, a piece of art for a museum-goer, or a meme for a casual Instagram scroll all require different types of language, tone, and detail.

Human-authored alt text (when done well) accounts for that. It knows who the audience is, what the content is for, and what kinds of information might matter. Done well, alt text provides blind users with equivalent access to meaning. It invites someone into the visual world being constructed on the page. It is not merely a description of pixels. It is context, purpose, and relevance distilled into a few words. The best alt text writers know how to make decisions: when to describe, when to summarize, when to omit. Most of all, they know how to prioritize the user.

AI-generated alt text, on the other hand, defaults to what it thinks it sees and what it thinks people want to know, based on patterns in its training data. The result is a kind of lowest-common-denominator access: generic, impersonal, and often unhelpful. It ticks a box without really meeting a need.

Hallucinations and Misinformation

Even worse, sometimes the captions aren’t just vague but instead flat-out wrong. Generative models are prone to “hallucination,” confidently asserting details that aren’t actually present. Low-quality photos, stylized content, or even minor distortions can lead the model to invent visual features, describe fictional people, or misinterpret settings. Imagine being told “a group of people smiling at a party,” when it’s actually a stock image of three people in suits at a funeral.

For users who cannot verify the image visually, this isn't a minor error— it’s a breakdown in trust. A blind user may take an AI-generated caption at face value and make decisions based on it, only to realize later that the description was misleading or incorrect. In essence, we’re giving users the illusion of access but without the reliability or fidelity that meaningful access demands.

The Risk of “Automated Inclusion”

There’s something dangerous about AI being used to simulate inclusion while sidelining the very people it claims to serve. Let’s call it automated inclusion: the use of algorithms to create the appearance of accessibility without actually investing in accessibility infrastructure, human labor, or user autonomy.

Automated inclusion is tempting. It’s fast, cheap, and scalable. It ticks compliance boxes. But it often ends up being performative rather than transformative. It creates a sense that the problem is solved without actually improving the lived experience of disabled users. Even worse, once a system is automated, it becomes harder to challenge. When companies rely on AI-generated alt text, they may deprioritize manual workflows, assuming the machine will handle it. They may not offer users the ability to flag or edit faulty captions. They may not provide creators with meaningful prompts to write better alt text themselves.

The result? A false sense of completeness which disproportionately impacts users who are already systemically excluded from design conversations. When alt text becomes an automated output rather than a designed input, blind users are once again positioned as afterthoughts in digital systems.

How We Fix It

AI-based tools for alt text aren’t inherently bad. But they should be supportive, not substitutive. Here’s what needs to change:

Keep human-written alt text first. AI can still play a role, but not as the final author. It can prompt suggestions, help designers think through possibilities, or highlight images missing descriptions. Let AI suggest or assist, but don’t let it replace human intent.
Protect accessibility from the logic of scale. Fast is not the same thing as inclusive. If we care about access, we have to slow down and do it right. It means resisting the convenience of auto-describe buttons when they serve nobody, and remembering that disabled users deserve more than "good enough."
Support multiple description modes. Short alt text for screen readers, detailed descriptions on request, and layered contextual tags can be flexibly implemented if designers are willing.
Test with real users. Blind and low vision users should be included in training, evaluation, and feedback cycles for these tools. What sounds helpful to a developer might not land the same way with a screen reader.
Push for platform transparency. If alt text is AI-generated, it should say so. Users should have pathways to request more information, flag errors, or contribute their own insights.

A Final Note

Alt text is not just a feature, it’s a form of voice. It reflects not just what’s in an image, but what matters in it. When we hand that role over to AI, we should be asking hard questions about authorship, interpretation, and what lens the machine is using.

In some ways, this is better than nothing. But “better than nothing” is a dangerously low bar, especially when it comes to accessibility. If AI-generated alt text becomes the standard without careful oversight, intentionality, and user involvement, we risk entrenching a new kind of digital exclusion: one that’s harder to detect, easier to justify, and dressed up as progress.

Because access isn’t just about labeling. It’s about meaning. And meaning doesn’t scale quite so easily.

About The Author

Alex Katsarakes is a Digital Accessibility Resident at the American Foundation for the Blind and a Ph.D. student in Human Factors Psychology. With a background in cognitive psychology, artificial intelligence, and inclusive design, Alex brings a behavioral science lens to accessibility challenges in digital systems. Their work reflects a deep commitment to making technology more human-friendly - whether through prototyping interfaces for complex systems, mentoring emerging researchers, or advocating for more accessible digital experiences. Alex is actively charting a course at the intersection of AI, inclusion, and user experience. When not wrestling with research or designing user studies, they can usually be found propagating new plants, falling into Wikipedia rabbit holes, or hunting down the perfect bacon, egg, and cheese bagel.

About AFB Talent Lab

The AFB Talent Lab aims to meet the accessibility needs of the tech industry – and millions of people living with disabilities – through a unique combination of hands-on training, mentorship, and consulting services, created and developed by our own digital inclusion experts. To learn more about our internship and apprenticeship programs or our client services, please visit our website at www.afb.org/talentlab.

Citations

Das, Maitraye, et al. “From Provenance to Aberrations: Image Creator and Screen Reader User Perspectives on Alt Text for AI-Generated Images.” Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2024, pp. 1–21. ACM Digital Library, https://doi.org/10.1145/3613904.3642325.
Hanley, Margot, et al. “Computer Vision and Conflicting Values: Describing People with Automated Alt Text.” Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, Association for Computing Machinery, 2021, pp. 543–54. ACM Digital Library, https://doi.org/10.1145/3461702.3462620.
Huntsman, Sherena. “An Image for All: The Rhetoric for Writing Alt-Text.” 2022 IEEE International Professional Communication Conference (ProComm), 2022, pp. 49–52. IEEE Xplore, https://doi.org/10.1109/ProComm53155.2022.00012.
Song, Hyungwoo, et al. “AltAuthor: A Context-Aware Alt Text Authoring Tool with Image Classification and LMM-Powered Accessibility Compliance.” Companion Proceedings of the 30th International Conference on Intelligent User Interfaces, Association for Computing Machinery, 2025, pp. 124–28. ACM Digital Library, https://doi.org/10.1145/3708557.3716366.
Vu, Thao Chi. "Human-Centered Alt Text: Advocating for Inclusive Digital Practices through Education." Order No. 32039675 New York University Tandon School of Engineering, 2025. United States -- New York: ProQuest. Web. 28 July 2025.

Author Alex Katsarakes

Blog Topics

Talent Lab Tech Notes

Talent Lab Exclusive

Beyond Alt Text: Rethinking Visual Description in the Age of AI

Tech Notes

The Surface Appeal of Automated Descriptions

The Nuance AI Can’t (Yet) Grasp

Hallucinations and Misinformation

The Risk of “Automated Inclusion”

How We Fix It

A Final Note

About The Author

About AFB Talent Lab

Citations

Take Action Today

Partner with us

Donate

Have questions?

Beyond Alt Text: Rethinking Visual Description in the Age of AI

Tech Notes

The Surface Appeal of Automated Descriptions

The Nuance AI Can’t (Yet) Grasp

Hallucinations and Misinformation

The Risk of “Automated Inclusion”

How We Fix It

A Final Note

About The Author

About AFB Talent Lab

Citations

Partner with us

Donate

Have questions?

Sign up for the AFB Newsletter

Follow Us