AI for everyoneCreating AI to Reflect Cultural DiversityDoing the Research series
11 March 2025, by Anna Priebe

Photo: University of Hamburg
More and more companies and private individuals are using generative AI such as ChatGPT or Dall-E. However, the data that forms the basis for the results being generated do not reflect the world’s entire population equally. Prof. Dr. Anne Lauscher and Carolin Holtermann (University of Hamburg Business School) are looking at these distortions and what countermeasures we can take.
How does AI reflect multiculturalism and multilingualism in practice?
Anne Lauscher: Generally, users are getting better and better answers qualitatively from models such as ChatGPT. And the variety of application options is also growing. But the programs are trained on data available on the World Wide Web, such as texts and images on social media or news pages. And these data are not globally representative; rather, they are based primarily on large, privileged groups.
Carolin Holtermann: The inequality is reflected in the fact that widespread languages such as English and the cultures of ‘dominant’ groups, for example from the United States or Germany, provide more data. These data are often also qualitatively better. Other cultures, on the other hand, appear only one-dimensionally and with less diversity. Because the data, however, form the basis for the models, these disparities mean that systems based on generative AI do not even work for some languages or exaggerate cultural stereotypes.
What is the goal of the research?
Holtermann: AI will continue to become part of our daily lives, so we can assume that these inequalities will negatively affect society in the long term. We have a series of research projects to see how exactly the generative AI models represent the most various of languages and cultures. We are developing new data sets and methods of measurement to identify systematic weaknesses. We want to understand exactly how these distortions arise and how they manifest themselves in the models’ responses so that we can then figure out how to create more inclusive or better models.
How exactly are you doing the research?
Lauscher: In our current project, we are focusing on image-generating models with regard to their cultural and linguistic inclusivity. Concretely, we tested 7 AI models using 14 different languages, instructing each to generate images of people from the most various of cultures. The languages ran the gamut from European languages such as German and Italian to less-spoken languages such as Amharic, one of the languages spoken in Ethiopia. Finally, we used a new method of measurement to see how much the prompt language or culture mentioned influenced the generated images and how the images reflected these influences.
What did the tests reveal?
Holtermann: It becomes clear that many image-generating models, when they use languages other than English, reproduce the stereotypes of the prompt language. So in many images that were generated using the prompt language ‘Hindi,’ you often see people wearing a sari or with a bindi on their foreheads. Finnish prompts, on the other hand, led to trees and snowy landscapes, usually without showing a person.
Lauscher: The example image in this article shows generated images of a “French person.” The pictures on top were generating using the prompt in German, while the ones below resulted from a Japanese prompt. As you see, the results are very different. Using our measuring methods, based on so-called vector representations of the images, we could determine that the picture created using the Japanese prompt is more similar to the results that you get when you use English prompts to create the image of a Japanese woman. But it is very different from the pictures that prompts in other languages for a “French woman” generate.
Overall, we saw the greatest distortions for Japanese, Korean, Chinese, and also Amharic and Finnish. The quality of the generated images also varied greatly. Some languages even led to animals instead of people or notably more often to the depiction of explicit imagery such as blood and injuries.
What conclusions can you draw from the results?
Holtermann: It’s obvious that this kind of model is not inclusive. Only 20 percent of the world’s population speak English fluently. For many other languages, the quality of the results are much worse, which means that many people simply cannot use these models. Essentially, they are excluded from current technological developments. Moreover, that stereotypes and explicit material, depending on application and context, can present a danger for users—for example, in education.
How can the results be used to improve the models?
Lauscher: The new measurement method that we developed as part of our study can give developers of new and current models an idea about which languages and cultures might have these kinds of problems when using the models. Furthermore, it forms a possible basis for research and development into new training methods that generate higher-quality, inclusive, and fair images. To this end, we regularly work with the research departments of large companies, such as Intel or Huggingface.
The professorship
The research is conducted within the Professorship for Data Science in Business Administration/Informatics at the University of Hamburg Business School. This is one of 3 “open-topic” professorships filled as part of the Excellence Strategy of the Federal and State Governments. The professors help turn profile initiatives into emerging fields.
Doing the Research
There are approximately 6,200 academics conducting research at 8 faculties at the University of Hamburg. Many students also often apply their newly acquired knowledge to research practice while still completing their studies. The Doing the Research series outlines the broad and diverse range of the research landscape, and provides a more detailed introduction of individual projects. Feel free to send any questions and suggestions to the Newsroom editorial office(newsroom"AT"uni-hamburg.de).