Often, we find ourselves in situations where we can't spare enough time to listen to our beloved podcast. As technology enthusiasts, Pod-Cli transforms your humungous audio podcasts into text and provide you with a concise summary. The best part is that you can now enjoy listening to your favorite music while simultaneously reading the summarized text of your podcast.
- It takes the audio from the YouTube/Spotify database or audio is provided and use Natural Language Processing and AI trained models to convert the speech into text files.
- At first Audio is provided then it is given to the Assembly-ai api to come up with text file(basically converting audio to text).
- The obtained raw file which contain fresly generated text,is then processed through regex and packages like nltk to summarize by removing stopwords,repetitions and complexity and using approaches like: (a)Extractive Summarization (b)Abstractive Summarization (c)frequency algorithm and through English syntax analysis
Intially, we approached an algorithm that first divided the audio files into chunks of smaller length, then these chunks were individually processed to extract the text, the problem with this model was installing a number of dependencies and their incompatibility with the OS. We shifted our focus to whisper AI api which is an OSS by Open AI but that also did not go very well because of accuracy issues. At last, we found Assembly AI, handling the mp3 files with assembly AI is not at all complex, and the accuracy of the text generated is very almost perfect. The user need to get their own access token from their website, by following the commands given in "commands.txt" file a transcript.txt file is generated the file is then passed over to the "txtsumry.py" file that generates a short, crist and precise summary of the audio file instaniously. The generation of transcript takes about 10% to 15% of the time of the original audio file.
- nltk
- assembly AI
The following text file is generated when the "transcribe.py" script is executed in the terminal. this is the original text generated from the audio.
The following text file is generated when the "txvsumry.py" script is executes in the terminal. this is the summerized version of the previous text file.