Will Peterson Design

Alt Text Generation for ChatGPT

Personal Project

I am currently in the testing stage of this project, but the beta version is viewable at this link.

Background

Alt Text for All is a personal project I started to explore how AI could support better alt text writing, without taking the human out of the loop. While a lot of tools promise to “automate” alt text, a lot of their outputs end up just being descriptions of the images, albeit very good descriptions. But after consulting with a number of experts, it became clear that simply having a well-described image isn't good enough.

Good alt text depends on context. What’s the image for? Who’s the audience? What does the image need to say?

That's where the idea to build alt text generation into a conversational AI came from. Instead of an input/output tool, the user could provide the image, the AI could ask questions about the context and details, and together they could have really well-written alt text.

Research and Development Plan

After I started to explore this idea a little further, I realized I was going to need a very robust research and development plan. It would be unethical for me to put something out into the market that claims to support accessibility but doesn't. I wanted to make sure that the project was well researched and thoroughly tested.

You can read the full research and development plan here.

Where Other GPTs Fail

What stood out as I looked at the existing field of alt text GPTs was how many of them prioritized SEO over actual accessibility needs. Unfortunately, accessible alt text and alt text that improves SEO are rarely the same.

The example I often talk about is a photo of a group of students sitting on the Harvard Yard outside of the Harvard Library. On the official website for the Harvard Library, the alt text would want to identify that the library in the photo is the Harvard Library. But if it was being used as the hero image for a blog post titled "10 Tips for Freshmen This Fall," the fact that it's a photo of Harvard isn't relevant. It's meant to stand in for generic kids in college vibes. More appropriate alt text might be "3 students sit on a campus quad laughing."

Implementation

When ChatGPT launched the ability to include documents in the knowledge base attached to a GPT, I made a little discovery that set the project in motion: I could put a decision tree into the knowledge base, and have the core instructions tell the GPT to work through the decision tree with the user before giving a final output.

I did this with a "smoothie" machine that asked users what kinds of fruits they wanted, encouraged them to add spinach, and checked for any dietary restrictions. It was a really simple experiment, but it laid the groundwork for being able to write the more complicated decision trees needed to write effective alt text.

Alt Text for All relies on three core documents to guide the Custom GPT’s behavior and outputs:

Guidelines Document: Covers personality traits, guardrails, overarching instructions, and output formatting.
Context Decision Tree: Maps out a structured approach for the GPT to gather all necessary information before generating alt text. This ensures thoroughness and relevance in outputs.
Alt Text Writing Guide: Offers specific instructions and examples for the GPT to follow, ensuring high-quality alt text creation tailored to various scenarios.

Testing

I am currently implementing three types of testing to refine and improve the GPT before its public launch.

The first is quantitative testing with an expert. We've created a data set with a combination of images and contexts, and the required features that a piece of alt text would need to be successful. We then run those images through the system, providing it with the information it asks for about the context and giving it a score based on its output.

The second is a qualitative test with actual users. Because this is a human/AI partnership, we need to make sure that people without a deep background in accessibility practices are able to identify and be confident with the output of the GPT.

After running a test on an image, the user is asked the following questions:

Accuracy- Do you feel like the image has been accurately described?
Quality- Do you think the GPT has provided quality alt-text?
Satisfaction- Given the context provided for you, how satisfied are you with the results?

We can then compare the qualitative response against the quantitative evaluation from the experts to see if the output that is coming from the human/AI partnership meets our standards.

Progress and the Future

My hope is to release Alt Text for All to the ChatGPT platform sometime in the early Fall of 2025.

After launch, I think the next thing to look at is creating a dedicated mobile and web app that doesn't live on ChatGPT, so that we can design a more intentional interface for users to store their images and alt text, and a way for users to work alt text generation into their content creation workflows.

Portfolio

About

Portfolio

About

Portfolio

About