PDFs (.pdf), markdown (.md, .mmd), or text files (.txt)

OpenAI API Key

Text Generation Model

Select the model to generate the dialogue text.

Reasoning effort (for reasoning models, e.g. o1, o3, o4)

Select reasoning effort used.

Audio Generation Model

Select the model to generate the audio.

Speaker 1 Voice

Select the voice for Speaker 1.

Speaker 2 Voice

Select the voice for Speaker 2.

Speaker 1 instructions

Speaker 1 instructions (used with gpt-4o-mini-tts only)

Speaker 2 instructions

Speaker 2 instructions (used with gpt-4o-mini-tts only)

Custom API Base

If you are using a custom or local model, provide the API base URL here, e.g.: http://localhost:8080/v1 for llama.cpp REST server.

When enabled, the LLM will call the web search tool during its reasoning.

Let the LLM search the web to complement the documents.

Instruction Template

Select the instruction template to use. You can also edit any of the fields for more tailored results.

Intro Instructions

Provide the introductory instructions for generating the dialogue.

Standard Text Analysis Instructions

Provide the instructions for analyzing the raw data and text.

Scratch Pad Instructions

Provide the scratch pad instructions for brainstorming presentation/dialogue content.

Brainstorm creative ways to discuss the main topics and key points you identified in the input text. Consider using analogies, examples, storytelling techniques, or hypothetical scenarios to make the content more relatable and engaging for listeners.

Keep in mind that your podcast should be accessible to a general audience, so avoid using too much jargon or assuming prior knowledge of the topic. If necessary, think of ways to briefly explain any complex concepts in simple terms.

Use your imagination to fill in any gaps in the input text or to come up with thought-provoking questions that could be explored in the podcast. The goal is to create an informative and entertaining dialogue, so feel free to be creative in your approach.

Define all terms used clearly and spend effort to explain the background.

Write your brainstorming ideas and a rough outline for the podcast dialogue here. Be sure to note the key insights and takeaways you want to reiterate at the end.

Make sure to make it fun and exciting.

Prelude Dialog

Provide the prelude instructions before the presentation/dialogue is developed.

Podcast Dialog Instructions

Provide the instructions for generating the presentation or podcast dialogue.

Write a very long, engaging, informative podcast dialogue here, based on the key points and creative ideas you came up with during the brainstorming session. Use a conversational tone and include any necessary context or explanations to make the content accessible to a general audience.

Never use made-up names for the hosts and guests, but make it an engaging and immersive experience for listeners. Do not include any bracketed placeholders like [Host] or [Guest]. Design your output to be read aloud -- it will be directly converted into audio.

Make the dialogue as long and detailed as possible, while still staying on topic and maintaining an engaging flow. Aim to use your full output capacity to create the longest podcast episode you can, while still communicating the key information from the input text in an entertaining way.

At the end of the dialogue, have the host and guest speakers naturally summarize the main insights and takeaways from their discussion. This should flow organically from the conversation, reiterating the key points in a casual, conversational manner. Avoid making it sound like an obvious recap - the goal is to reinforce the central ideas one last time before signing off.

The podcast should have around 20000 words.

Audio

Transcript

Use Edited Transcript (check if you want to make edits to the initially generated transcript)

Provide Feedback or Notes

Download .md file

PDF to Audio Converter

This Gradio app converts PDFs into audio podcasts, lectures, summaries, and more. It uses OpenAI's GPT models for text generation and text-to-speech conversion.

Features

Upload multiple PDF files
Choose from different instruction templates (podcast, lecture, summary, etc.)
Customize text generation and audio models
Select different voices for speakers

How to Use

Upload one or more PDF files
Select the desired instruction template
Customize the instructions if needed
Click "Generate Audio" to create your audio content

Use in Colab

Audio Example

Your browser does not support the audio element.

Note

This app requires an OpenAI API key to function.

Credits

This project was inspired by and based on the code available at https://github.com/knowsuchagency/pdf-to-podcast and https://github.com/knowsuchagency/promptic.

GitHub repo: lamm-mit/PDF2Audio

@article{ghafarollahi2024sciagentsautomatingscientificdiscovery,
    title={SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning}, 
    author={Alireza Ghafarollahi and Markus J. Buehler},
    year={2024},
    eprint={2409.05556},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2409.05556}, 
}
@article{buehler2024graphreasoning,
    title={Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning},
    author={Markus J. Buehler},
    journal={Machine Learning: Science and Technology},
    year={2024},
    url={http://iopscience.iop.org/article/10.1088/2632-2153/ad7228},
}