Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Inbreeding refers to genomic corruption when members of a population reproduce with other members who are too genetically similar. This often leads to offspring with significant health problems and other deformities because it amplifies the expression of recessive genes. When inbreeding is widespread — as it can be in modern livestock production — the entire gene pool can be degraded over time, amplifying deformities as the population gets less and less diverse.
In the world of generative AI, a similar problem exists, potentially threatening the long-term effectiveness of AI systems and the diversity of human culture. From an evolutionary perspective, first generation large language models (LLMs) and other gen AI systems were trained on a relatively clean “gene pool” of human artifacts, using massive quantities of textual, visual and audio content to represent the essence of our cultural sensibilities.
But as the internet gets flooded with AI-generated artifacts, there is a significant risk that new AI systems will train on datasets that include large quantities of AI-created content. This content is not direct human culture, but emulated human culture with varying levels of distortion, thereby corrupting the “gene pool” through inbreeding. And as gen AI systems increase in use, this problem will only accelerate. After all, newer AI systems that are trained on copies of human culture will fill the world with increasingly distorted artifacts, causing the next generation of AI systems to train on copies of copies of human culture, and so on.
Degrading gen AI systems, distorting human culture
I refer to this emerging problem as “Generative Inbreeding,” and I worry about two troubling consequences. First, there is the potential degradation of gen AI systems, as inbreeding reduces their ability to accurately represent human language, culture and artifacts. Second, there is the distortion of human culture by inbred AI systems that increasingly introduce “deformities” into our cultural gene pool that don’t actually represent our collective sensibilities.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
On the first issue, recent studies suggest that generative inbreeding could break AI systems, causing them to produce worse and worse artifacts over time, like making a photocopy of a photocopy of a photocopy. This is sometimes referred to as “model collapse” due to “data poisoning,” and recent research suggests that foundation models are far more susceptible to this recursive danger than previously believed. Another recent study found that as AI-generated data increases in a training set, generative models become increasingly “doomed” to have their quality progressively decrease.
On the second issue — the distortion of human culture — generative inbreeding could introduce progressively larger “deformities” into our collective artifacts until our culture is influenced more by AI systems than human creators. And, because a recent U.S. federal court ruling determined that AI-generated content cannot be copyrighted, it paves the way for AI artifacts to be more widely used, copied and shared than human content with legal restrictions.
This could mean that human artists, writers, composers, photographers and videographers, by virtue of their work being copyrighted, could soon have less impact on the direction of our collective culture than AI-generated content.
Distinguishing AI content from human content
One potential solution to inbreeding is the use of AI systems designed to distinguish generative content from human content. Many researchers thought this would be an easy solution, but it’s turning out to be far more difficult than it seemed.
For example, early this year, OpenAI announced an “AI classifier” that was designed to distinguish AI-generated text from human text. This promised to help distinguish fake documents or, in the case of educational settings, flag cheating students. The same technology could be used to filter out AI-generated content from training datasets, preventing inbreeding.
By July of 2023, however, OpenAI announced that their AI classifier was no longer available due to its low rate of accuracy, stating that it was currently “impossible to reliably detect all AI-written text.”
Watermarking generative artifacts
Another potential solution is for AI companies to embed “watermarking” data into all generative artifacts they produce. This would be valuable for many purposes, from aiding in the identification of fake documents and misinformation to preventing cheating by students.
Unfortunately, watermarking is likely to be moderately effective at best, especially in text-based documents that can be easily edited, defeating the watermarking but retaining the inbreeding problems. Still, the White House is pushing for watermarking solutions, announcing last month that seven of the largest AI companies producing foundation models have agreed to “developing robust technical mechanisms to ensure that users know when content is AI generated, such as watermarking.”
It remains to be seen if companies can technically achieve this objective and if they deploy solutions in ways that help reduce inbreeding.
We need to look forward, not back
Even if we solve the inbreeding problem, I fear widespread reliance on AI could be stifling to human culture. That’s because gen AI systems are explicitly trained to emulate the style and content of the past, introducing a strong backward-looking bias.
I know there are those who argue that human artists are also influenced by prior works, but human creators bring their own sensibilities and experiences to the process, thoughtfully creating new cultural directions. Current AI systems bring no personal inspiration to anything they produce.
And, when combined with the distorting effects of generative inbreeding, we could face a future where our culture is stifled by an invisible force pulling towards the past combined with “genetic deformities” that don’t faithfully represent the creative thoughts, feelings and insights of humanity.
Unless we address these issues with both technical and policy protections, we could soon find ourselves in a world where our culture is influenced more by generative AI systems than actual human creators.
Louis Rosenberg is a well-known technologist in the fields of VR, AR and AI. He founded Immersion Corporation, Microscribe 3D, Outland Research and Unanimous AI. He earned his PhD from Stanford, was a tenured professor at California State University and has been awarded more than 300 patents.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read More From DataDecisionMakers