YouTube: Working with Auto-generated Transcripts

Finding the Transcript Feature

Youtube has a hidden transcript feature.  Click on the 3 dots to the lower right of the video:

  • above the “Subscribe” button and
  • to the right of the “Save” button

 

Transcript Button

When you click on the 3 dots, click “Open Transcript” it will pull up the Transcript to the right of the video.

 

Download Transcripts

These transcripts can be downloaded with a python program:

youtube-dl –skip-download –write-auto-sub https://www.youtube.com/watch?v=iKvFlSedpNI

 

Cleanup Transcripts

There is a site that provides a nice UI to make it easier to cleanup the generated transcript.

 

Parse VVT Formatted Transcripts

The vtt format can be parsed by the webvtt.py python program:

webvtt-py 0.4.2

webvtt-py is a Python module for reading/writing WebVTT caption files. It also features caption segmentation useful when captioning HLS videos.

Requires Python 3.4+.

Documentation is available at http://webvtt-py.readthedocs.io.

Installation

$ pip install webvtt-py

Usage

import webvtt

for caption in webvtt.read('captions.vtt'):
    print(caption.start)
    print(caption.end)
    print(caption.text)

Listen Notes is the best podcast search engine

Listen Notes is the best podcast search engineTM. It’s like Google, but for podcasts.

Search the whole Internet’s podcasts.

  • Listeners find ALL podcast episodes interviewing or talking about a person.
  • Journalists do research and find information in podcasts.
  • Students learn specific topics from podcasts.
  • Podcasters find cross-promotion opportunities.
  • Developers use Listen API to build podcast apps.
  • More use cases of Listen Notes podcast search engine