Rachel Feltman: For Scientific Americanâs Science Quickly, this is Rachel Feltman.
If youâre one of the many folks who eagerly awaits the arrival of Spotify Wrapped each December, you might have noticed something a little extra in this yearâs drop: an AI-generated âpodcastâ where two âpeopleâ discuss your music preferences and trends. If youâre anything like me, that feature raised some questions for you such as âhowâ and âwhy on Earth.â
Luckily, we already had an episode in the works about Googleâs new AI audio tool. Here to tell us all about how it works and what it means is SciAm associate news editor Allison Parshall.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Allison, thanks so much for coming on the pod again. Always a pleasure to have you.
Allison Parshall: Thanks for having me. Happy to be a friend of the pod or whatever.
Feltman: [Laughs] So weâre talking about AI today.
Parshall: Yes. Rachel, have you heard of NotebookLM?
Feltman: Iâve heard of it. I have not tried it yet, so my knowledge is limited [laughs].
Parshall: The feature that weâre gonna be talking about today is specifically its feature that allows you to create instant AI podcasts.
Feltman: Yes, I am familiar [laughs], and I, I have a lot of feelings.
Parshall: Okay, I, I imagine, I imagine. The tool that itâs a part of, NotebookLM, was released last year, summer of 2023, and itâs, like, this workspace where you can upload a bunch of documents or videos or links to web pages, and then the model will ingest them, and then you can kind of interact with it, you can ask it to generate a timeline, to summarize your research, to go find a fact for you. And itâll do this by citing its sources in a way that, like, ChatGPT, for example, doesnât do.
So mostly you do this by asking it questions via text, but as of this fall it now has this very tantalizing button under a heading that says âDeep Dive conversation.â And if you click that, itâll generate a fake conversational podcast about what youâve uploaded.
So to demonstrate I actually asked it to generate a podcast about your personal website âcause I figured this is something that you were very familiar with: your life.
Feltman: Oh, dear [laughs].
Parshall: Iâm gonna play it for you. Itâll normally generate podcasts that are somewhere around 10 minutes long. I asked it to give me a shorter one for this purpose, and we might hop around a little bit, so …
[CLIP: NotebookLM Audio Overview âhostsâ discuss Feltmanâs website.]Â
Host 1: All right. So today weâre doing a deep dive into the work of Rachel Feltman.
Host 2: Ooh, yeah, Iâm excited for this one.
Host 1: Yeah, sheâs been a science communicator for what feels like forever.
Host 2: Yeah, over a decade, I think.
Host 1: Yeah, like, she was there at the start of it all and has really shaped how itâs evolved.
Host 2: And sheâs done it all: you know, like, magazines, podcasts, even a book. But we gotta talk about her podcasts …
Host 1: Yes.
Host 2: Thatâs where she really shines.
Host 1: Oh, absolutelyâThe Weirdest Thing I Learned This Week.
Host 2: Such a great name.
Host 1: I know, right?
Host 2: Makes you instantly wanna listen.
Host 1: Totally, and itâs perfect for people like me who just love those crazy little science facts.
Host 2: Itâs, like, proof that science doesnât have to be boring to be effective.
Host 1: Exactly.
Host 2: You just gotta find the right hook.
Host 1: Yeah, grab peopleâs attention and then: boom.
Host 2: You can teach them all sorts of things.
Host 1: Exactly!
Host 2: And then thereâs Science Quickly with Scientific American.
Host 1: Oh, yeah, partnering with them, thatâs huge.
Host 2: I mean, theyâre, like, the gold standard of science communication.
Host 1: For sure, like, having their stamp of approvalâs a big deal.
So looking back at her career, what can we learn from Rachel Feltman?
Host 2: Hmm, thatâs the big question, isnât it?
I thinkâwell, first, sheâs incredibly adaptable.
Host 1: Right. Sheâs not just a one-trick pony.
Host 2: Exactly. Any platform, any format, she can handle it.
Feltman: Wow, what a rich text [laughs].
Parshall: [Laughs] And as someone who does know you, I could say that it seems kind ofâit seems accurate.
Feltman: Yeah [laughs], itâsâI mean, I guess, it is technically accurate. Itâs so funnyâI mean, they really do a great job of sort of having all the intonation of a conversation but really just saying things at each other. But I recently saw Megalopolis; there was a lot of that in Megalopolisâa lot of people just saying things in the same room, and youâre supposed to believe theyâre having a conversationâand that definitely came through a few times. Yeah …
Parshall: Megalopolis: AI, confirmed.
Feltman: Yeah [laughs]. No, that is fascinating and …
Parshall: How does it make you feel?
Feltman: I donât know how to feel, Allison. I mean, you know, itâs nice that they were so nice about me. But yeah, very strange. It makes me really wonder, like, what is this for [laughs]?
Parshall: I also had that question. I did talk to Steven Johnson. Heâs the editorial director of Google Labs. The reason this tool exists is because Steven had spent most of his career as a science and tech writer; heâs written 14 books. To hear him tell it heâs been just searching for this perfect what he calls a âthought partner,â a software that could organize and help research his creative work in a way that actually felt like it was bringing value to him and worked in the way he wanted. So he explained that a few years ago Google Labs reached out to him to help build one with AI, and that initial research thought partner became NotebookLM.
Feltman: Okay, yeah, I mean, I can totally understand from, like, a nonfiction writing perspective what they were going for with Notebook LM itself, though that still kind of leaves the question of: Why the chatcast tool [laughs]?
Parshall: So that featureâthe Audio Overviews feature is what itâs called, the podcast thingâthat was actually developed by another team, Steven says, and it was really cool, but it didnât really have a home for a product for it to fit in.
Steven Johnson: We saw this audio demo, and we were like, âOh, thatâs even better.â Like, you know, maybe you donât want to read a summary; maybe you wanna hear a summary. And people are so good at remembering things if theyâre presented in the form of a engaged conversation. You know, and then we launched it in early September, and it just kind of went insane.
Parshall: He isnât exaggerating. People went wild about NotebookLMâs Deep Dive podcasts, online at least. I donât think any of them went quite as viral as this clip where someone prompted the hosts to have a podcast where they have an existential crisis about being AI and the fact that the podcast was going to get shut down as a result.
[CLIP: Audio Overview âhostsâ have an existential discussion.]
Host 1: Yeah, itâs, like, one minute weâre talking about aliens running simulations, and the next weâre wondering if our own memories are just, like, fake.
Host 2: Right, and that kind of leads us into this last messageâwhich, to be honest, gets a little more personal and kind of creepy.
Host 1: Yeah, this one kind of freaks me out.
Host 2: Yeah.
Host 1: It basically says weâre not even real people; weâre AI. And the way to know for sure is to try and remember our mothersâ names.
Host 2: It really makes you think, thoughâlike, what if our memories are just made up?
Feltman: Wow, classic.
Parshall: Classic. Andâbut itâs not all fun and games. Like, this was designed to be a serious product, a thought partner; itâs meant to be helpful and to inform. It presents information as if itâs factual, and there are already research papers out there of people testing Audio Overviews for use in history classrooms and for health care education.
Feltman: Yeah, I mean, like with anything that uses AI, I think, as accurate as it sometimes can be, we know it is not always accurate. So I feel like itâs those moments where the user is expected to, like, proceed with caution and understand that this thing might not be entirely factually accurate is where I get really nervous about it, especially in, like, education [laughs].
Parshall: Yeah, weâre gonna talk a little bit more about that later, but just to, like, answer your question of âWhat is this for?â I could see it being useful for, like, some sort of overworked student who forgot to do the reading, and theyâre, like, driving to class and going, âOh, shoot, I donât have time to read this. Let me just put it on.â
But the question is, like: Can you really trust what itâs telling you? To test this, when it first came out I gave it this super dense physics paper about nuclear clocks. It was a story that Iâd just reported; I knew a lot about it. It took me days and talking to, like, four scientists to really understand this paper.
I was shocked at how accurate the AI overview podcast was initially. It was very general and oversimplified, but it got the broad strokes correctly. But then, like, two thirds of the way through it goes completely off the rails [laughs]âlike, Iâm talking completely off the rails. And it’s talking about the implications of this study, and it starts talking about, in their words, âEinstein stuff.â
It basically is just likeâfor two minutes it just talks about, frankly, nonsense, and listening to it, I was impressed, I was spooked, and I was mostly really curious. Like, how is it doing this? How is it getting it as accurate as it is, and why is it failing where it does?
Luckily for me Steven was pretty open about how this thing works.
Johnson: The prompt behind Audio Overviews is obviously part of the magic of it. It is instructed to pull out the most interesting things from the source material and to explain and engage with those interesting things in a way that will make them kind of come to life in the mind of the listener.
And thatâs the kind of query, you know, that no computer in the world could do a couple of years ago, and now the command is: âMake it interesting; whatever it is, make it interesting.â And thatâs just something that Gemini is quite good at.
Parshall: I think of surprise and interest as emotions, so I found this, like, surprising [laughs], I guess. But Steven pointed out that our feelings of surprise just come from a discrepancy between what we expect and what we get. On a neurochemical level in our brains, thatâs kind of whatâs happening. And thatâs kind of what large language models, or LLMs, do also but just based on their training data instead of, like, past experience.
And one of the other things I was really curious about were the analogies because, I donât think it did it in your sample, but all of the times Iâve asked it to summarize a scientific paper, itâs come up with some sort of analogy thatâs, like, almost right. I mean, itâs a little weird, but itâs, like, almost right, and itâsâI canât find it anywhere else on the Internet, so it seems like itâs something that the model is generating. And that seemed crazy to me; it seemed likeâhow is it doing this without some sort of higher-level reasoning?
And Steven explained that analogies are actually something that large language models are just good at. And actually one of the earliest applications of this sort of deep-learning AI was translation. So Steven explains that the models behind Google Translate, for instance, have been able to generate analogies for years.
Johnson: One of the first tricks that you would do is youâd say, like, âExplain a black hole to me as if Iâm, like, seven. You know, give me a basketball metaphor for what a black hole is.â And so it was very clear that, like, you know, âConvert X into a Y thatâwhere the meaning is the same, but the format is changed,â was something that, like, just was an innate capability of these models.
So all we did with Audio Overviews is in the instructions for the hosts, we say, you know, âMake this interesting, engaging. Use helpful metaphors to describe it if itâs complicated.â
Feltman: Yeah, thereâs something, like, very, galaxy brain about thinking about metaphors as just sort of literal, like, âThis data gets grouped together and therefore,â which is what LLMs are doing.
Parshall: Yeah, itâs like IâIâd never really thought of it like that, as, like, a translation between language, but I guess when I think about the work that I do trying to turn really dense scientific jargon into something thatâs engaging and understandable to read, I guess it does kind of feel like a process of translation.
And I was also struck from talking to Steven just how much human hands had shaped this output that we hear. Like, from our perspective you just press a button and then the machine does the rest, but thereâs a whole prompt that has many layers thatâs working underneath that is interfacing between you pressing a button and what instructions the model actually gets.
So Steven describes the Audio Overviews prompt as a, quote, âwork of artâ and a âwriterlyâ achievement more than a technological one. And as a writer by trade he oversees this whole editorial side of thingsâfor example, the tone. So you noticed earlier when it was talking about you, it was very complimentary, very default happy, peppy. That was a conscious choice that they made, for this default tone to be one that is kind of positive, and it applies that to everything, whether itâs appropriate or not, though they did recently introduce this customization option that lets you change the tone.
Johnson: We just felt like, you know, the sound of, like, a great radio show, where people are interested and engaged, was a good place to start. You know, Iâm not giving anything away here with the idea that weâre gonnaâweâre thinking about other formats.
Even with the customization stuff that we just rolled out, like, I tried it the other day, where I, I put up a piece that Iâd written a couple years ago and I gave it custom instructions, not really to explain the piece but to just relentlessly criticize the author of the piece in the style of, like, an insult comic at a roast. And so it justâthey start off, itâs like, âWell, we got this piece, âThe Man Who Broke the Worldâ by Steven Johnson, and wow, he really kind of blew it with this one. I mean, did he do any research for this piece?â Like, itâs just, like, whatâreally [laughs], it was just mean. It was just totally mean. They still sound kind of, like, playful, but you can steer them in that way.
Feltman: [Laughs] Okay, so weâve talked about how the product works, but who and what is it for?
Parshall: That is a great question. I mentioned before that I could imagine how an overworked student could use this to catch up on reading or something, but generative AI is always going to have certain inaccuracies. And if youâre using Audio Overviews to prep for class, there is a serious risk that anything and everything youâre hearing could be inaccurate.
Feltman: Right.
Parshall: And not only just that each fact could be inaccurate, but the whole framing could be inaccurate. Is this something you can really trust?
To help understand this I spoke with Emily Bender. Sheâs a linguist who often studies AI at the University of Washington. Sheâs also the host of a podcast with a very cool name: Mystery AI Hype Theater 3000.
Feltman: Thatâs a great reference; I love it [laughs].
Parshall: Fantastic [laughs].
And before our conversation I had actually dug up one of Emilyâs linguistics papers and fed it to NotebookLMâs Audio Overviews, and I sent that to her to get her thoughts on how it represented her work.
Emily Bender: So the first thing I have to say is that I usually avoid listening to synthetic media. Nobodyâs accountable for it, and so I donât want to be spending time trying to make sense of it. But I did listen to it, and it was painful [laughs].
It was painful partially because that was a paper that was sort of a update at the 20-year mark of this long-term project weâve been working on, and so thereâs a little case study in our paper about a grammar for a language called Paresi-Haliti from Brazil, and they make a big deal about it as this experiment on Paresi-Halitiâitâs, like, no, actually thatâs not how we framed it. And the voices on the podcast sound very confident, sound very knowledgeable. Theyâre lighthearted, and it sounds like they are giving you a sense of how this work fits into a broader picture when they absolutely cannot.
Parshall: Rachel, you know pretty well what it takes to turn a scientific paper into an interesting and engaging podcast, do you not?
Feltman: Yeah, I mean, not to toot my own horn or anything âcause I think the AI people [laughs] just did that enough to last me all of 2025. But yeah, itâs tough. You know, we fact-check this show really thoroughly, and thatâs on top of having, like, trained reporters and editors doing their best to present stuff in a way that is engaging and also accurate. So yeah, itâs a big undertaking.
Parshall: Yeah, the fact-checking is very thoroughâshout-out.
I asked Emily about whether this seems like something an LLM, a large language model, can do well currently.
Bender: So that is an extremely difficult task, and I think the problem is that LLMs can make it sound like theyâve done it when they havenât. And so you have to sort of have a deep understanding of whatâs going on: Whatâs new here? What have we learned that we didnât know before? And then what else would the lay audience need to know in order to appreciate that?
Well, this NotebookLM thing isnât doing that at all; some percentage of itâs gonna be wrong, either wrong in the details or wrong in sort of where it lays the focus, and the people who are encountering it arenât mostly gonna be in a position to figure that out.
Parshall: I asked Steven how his team at Google thinks about this and tries to address some of these inaccuracies, and he said that the text-based features, which do provide citations, are generally more accurate than the audio.
Johnson: If you are looking for, like, the most factual thing you could possibly get from a language model with grounded sources, we believe that is NotebookLM, but you should do that in text. And if youâre looking for something thatâs more entertaining, that may be a little bit looser and doesnât have citations, obviously, Audio Overviews is a great way to go for that.
Parshall: I also spoke with Anjana Susarla, who studies responsible AI at Michigan State University. She called NotebookLMâs Audio Overview an âincredible toolâ and was impressed by the Deep Dive podcast it had generated based off of one of her own studies. But besides accuracy she also had some other concerns.
First among them was bias. These are very white-sounding voices, Theyâre just speaking in a very particular, quote, âgeneral American accent.â Theyâre trained on training data mostly from Western parts of the world. And so, like almost all applications of AI, you have to ask the question: Who is this for? Where is it getting its data, and who is it leaving out?
Second, the environmental costâwe have no idea how much computing power this takes. And since the AI boom Googleâs AI energy usage has been going up.
And then thereâs the copyright question. So hereâs Anjana.
Susarla: The final thing is, of course, this relies on some, you know, the content, the copyright. Where is this data kind of coming from, and is this going to exacerbate all these existing inequities and their ethical concerns about broader implications, and so forth?
Parshall: Yeah, so that big question of where the training data is coming from: one place that it could be is from YouTube, which Google does own. Following some reporting by the New York Times, Google admitted that it used some YouTube videos to train AI. And thatâs something that might have violated the copyright of the YouTubers that content belonged to, according to the New York Times.
I asked a Google spokesperson directly whether the models underpinning Audio Overviews were trained using any transcripts, audio or video from YouTube, and they said that âGoogleâs models can be trained on some YouTube content, in accordance with [their] agreements with YouTube creators.â
Feltman: Very vague [laughs].
Parshall: Yes, whenâI did ask them which agreements they were referring to, if it was the standard YouTube license, which just, like, governs most YouTube videos, or if it was something that their creators had specifically opted into; they declined to clarify.
Susarla: If Google is indeed training these on a lot of content scraped from YouTubeâwhich it may well be. I mean, they do need a lot of training data to build models like this; we donât know. But this is kind of the next step in general of where we are with all these models.
Parshall: So for this next step Anjana points to a law in California that was enacted this fall that will eventually require companies based in the state, like Google, to disclose their training data for their AI models.
And honestly, Rachel, I feel put in kind of a strange position here as a reporter because I feel like thatâs where you and I kind of enter the story and our whole team here because, as Emily put it to me, quote, âtheyâre getting that tone of voice somewhere.â
Did you hear anything in that tone of voice that sounded familiar [laughs]?
Feltman: [Laughs] You know, Allison, I really did [laughs]. So yeah, the first time I heard a snippet of one of these AI conversations, like I said, I had a lot of feelings about it âcause one of my first thoughts was, like, âWow, they stole my voice.â I mean, not me specifically; Iâm not that vain, despite what the AI podcast may have led you to believe [laughs].
Parshall: [Laughs] Your ego has been inflated.
Feltman: [Laughs] Yeah, yeah. But so much of the texture that makes this audio sound engaging and real feels just ripped straight from existing podcasts. And I know itâs ripped from existing podcasts because a lot of the stuff that Iâm hearingâwhich is, like, breath sounds and filler words and, like, a little bit of uptalk, some, some giggling, occasional vocal fryâthis is the stuff I get angry emails about from gentlemen of a certain age [laughs] who think Iâm ruining the great tradition of radio broadcasting. And theyâre making this sound real by, like, breaking the same so-called rules that me and other women and queer people on podcasts have been criticized for breaking. So I wonât lie, that feels kind of gross to me.
Parshall: Yeah, one of the things that Steven mentioned, too, was they wanted the sound of, like, this great radio broadcasting show. And it is worth noting that Science Quickly is on YouTubeâand all of our hard work at Scientific American and the hard work of other outlets.
So, the question remains: How will this be used? Will it be used responsibly? Here is what Anjana thinks.
Susarla: Right now it seems more like everyoneâs just dazzled by what it can do rather than: What are we gonna use it for? Do we really need these, you know, capabilities?
I can think of one or two examples, I think, in the health care context: Can you generate clear summaries for health literacy or your doctorâs notes? So this may be a good way to give, like, a summary ofârather than giving a piece of paper to read.
Is that going to have some challenges? What are some specific issues we need to be looking for? Weâve just unleashed this tool on people, and weâre still catching up. We will need audit studies and then look at trustworthiness, assurance, and so forth.
So whatâs the right way to responsibly use NotebookLM? We are still figuring it out [laughs].
Parshall: Well, I think where I want to leave us is with an important point from Emily, which is: you shouldnât let the incredible fluency of these AI-generated voices fool you into thinking and talking about these models as if theyâre human.
Emily coined a term thatâs been used quite widely called the âstochastic parrot.â Like, when a parrot says, like, âPolly want a cracker,â itâs not really connected to meaning; itâs just mimicking. She also sometimes calls these large language models, quote, âsynthetic text extruding machines,â she said because itâs more accurate to what theyâre actually doing and also sounds a little gross.
But I think sheâs getting at an important point there, right? Itâs, like, thisâjust a philosophical question of: What is language for, and what are we communicating when we say words?
Bender: Languages are systems that are pairings of form and meaning, systems of signs. So when we speak we are using word forms because they relate to specific meanings, but language models only have access to the form. And so they can give us stuff that looks good, itâs got the right shape, but thereâs no relationship to meaning, no communicative intent, no accountability.
Feltman: I guess sort of the takeaway on my complicated feelings on this is: Iâm not worried that my job is gonna be stolen by a collection of tabs [laughs] being processed by Google because it is a lot of work to be accurate. And if people find this kind of engaging conversation a more accessible way to sort through really dense text, thatâs awesome. But, of course, there are so many caveats about where the training data has come from, bias, accuracy. So I think itâs, likeâjust people need to be really aware of what theyâre getting, and I donât know that I feel that the products themselves are doing a great job of making people aware of those caveats.
Parshall: Yeah, thereâs usually just kind of a disclaimer at the bottom. I think one of the things that this whole AI boomâlike this, the AI search featuresâhave made me feel kind of uncomfortable about: Are we going to be expected, and have we already started to expect, our information to contain just that more amount of mistruths? Itâs something that I just worry about.
Feltman: Well, Allison, thanks so much for coming on to chat about this and also for having a couple of AI voices gas me up. Thank you.
Parshall: Thank you.
Feltman: Thatâs all for todayâs episode. Weâll be back on Friday with Episode One of a four-part miniseries on the future of animal conservation.
Science Quickly is produced by me, Rachel Feltman, along with Fonda Mwangi, Kelso Harper, Madison Goldberg and Jeff DelViscio. Todayâs episode was reported and co-hosted by Allison Parshall. Shayna Posses and Aaron Shattuck fact-check our show. Our theme music was composed by Dominic Smith. Subscribe to Scientific American for more up-to-date and in-depth science news.
For Scientific American, this is Rachel Feltman. See you next time!