- Published on
From PDF to Podcast: Experimenting with Docling, GPT-4o, and ElevenLabs
Since Google launched NotebookLM, there's been a lot of excitement around generating podcasts using large language models (LLMs). Meta caught attention with the release of NotebookLama, and other open-source tools like Podcastify have also emerged.
Initially, I assumed NotebookLM was a sophisticated model for generating speech (and it still might be) since the hosts sound very natural, with interruptions, tone variation, and other subtleties. But after exploring open-source tools, I realized the process was much simpler than I had imagined.
With that in mind, I decided to create a podcast using Docling (to convert a PDF to Markdown), GPT-4o (to generate a script), and ElevenLabs (for text-to-speech conversion). The results were far better than expected.
Here’s the result for a podcast based on this paper:
My Version
NotebookLM Version
Final Thoughts
It’s easy to look at a major project or tool from a big company and assume it’s too complex to replicate. And sometimes, that’s true—large companies often have more resources, and incredibly skilled developers work on these projects. But other times, it’s more about creativity and the willingness to experiment. Although none of the tools I used were new, the way they were combined was novel when NotebookLM came out.
This experiment was a reminder that innovation often stems from creativity. These AI tools open up so many possibilities for developing new products and applications. In the coming years, we’ll see innovations we can’t even imagine today.
For instance, this podcast tool got me thinking about the future of learning. We might have highly personalized experiences that go beyond podcasts—perhaps entirely new formats, all powered by an ever-available AI assistant helping us learn faster than ever before. I can imagine kids in the future asking, "How did you learn without an AI assistant?" or "You had to Google things manually?"