Skip to content

abdojulari/transcript-editor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Transcript Editor

Notice: This codebase is currently in deep and rapid development and won't have a stable version until about April 2016. Please check back for updates and documentation.

This is an open-source, self-hosted, web-based tool for correcting transcripts that were automatically generated using speech-to-text software via auto-transcription services such as Pop Up Archive. It is being developed by NYPL Labs in partnership with The Moth and Pop Up Archive with generous support from the Knight Foundation.

You are in the right place if...

  • You have a collection of audio that you would like to produce quality transcripts for
  • You do not have a budget for human transcription services (~$60-$100 per hour of audio)
  • You either (1) have a budget for auto-transcription services (~$15 per hour of audio) such as Pop Up Archive, or (2) you are able to produce time-coded transcripts on your own using speech-to-text software
  • Automatically generated transcripts do not meet your standard of quality and needs to be corrected by humans
  • You and your team do not have the capacity to correct the transcripts yourselves
  • You or a member of your team has basic web development experience, specifically with creating a Ruby on Rails web application
  • Bonus: You have an audience of users who would be interested in helping fix transcripts (this app is uniquely designed to enable multiple users working on transcripts at the same time)

Setting Up Your Own Project

Requirements

You will need to have the following installed to run this project on your machine.

  • Git
  • Ruby - this app has been developed using 2.3.0. Older versions may not work
  • PostgreSQL

Once everything is installed, clone this repository

cd /my/projects/folder
git clone https://github.com/NYPL/transcript-editor.git
cd transcript-editor

If you forked this repository, replace the URL with your repository

Configure Your Project

  1. Create config/database.yml based on config/database.sample.yml - update this file with your own database credentials
  2. Create config/application.yml based on config/application.sample.yml - this file contains all your private config credentials such as Pop Up Archive or Google accounts. The only required configuration to start is:
  • SECRET_KEY_BASE. You can generate this value by running rake secret
  • PROJECT_ID. A project id that will be used to identify this project (e.g. my-project). Must be alphanumeric; no spaces or periods; underscores and dashes okay
  1. Copy the folder project/sample-project and rename it to the PROJECT_ID from the previous step (e.g. project/my-project). This folder will contain all the configuration, content, and language for your project.

Configure Your Project Details

Your project folder has the following structure:

 my-project/
 +-- assets/
 |  +-- css/
 |  +-- img/
 |  +-- js/
 +-- data/
 +-- layouts/
 +-- pages/
 +-- transcripts/
 +-- project.json

The primary place for project configuration the file project.json. For now, we can keep everything as defaults. We will cover the details of this folder in later steps.

Setup and run the app

  1. Run bundle - this will install all the necessary gems for this app
  2. Run rake db:setup to setup the database based on config/database.yml
  3. Run rake project:load['my-project'] to load your project folder (replace my-project with your project name)
  4. Run rails s to start your server. Go to http://localhost:3000/ to view your project

Your project should load, but since there's no transcripts, all you'll see is a header and blank screen! The next step is to seed the app with some transcripts

Generating your transcripts

This section will assume that you do not have transcripts yet, just audio files that need transcripts. We will be using Pop Up Archive to automatically produce the transcripts that will need to be corrected. Other vendors and services may be documented in the future based on demand.

Requirements

  • A Pop Up Archive account
  • Your audio files must be uploaded to the web so it is accessible via a public URL (e.g. http://website.com/my-audio.mp3)
    • Here are some examples of file hosting services
    • The following file formats are supported: 'aac', 'aif', 'aiff', 'alac', 'flac', 'm4a', 'm4p', 'mp2', 'mp3', 'mp4', 'ogg', 'raw', 'spx', 'wav', 'wma'

Update your credentials

If you are using Pop Up Archive, you must update your account credentials in the config/application.yml file. There are two values (PUA_CLIENT_ID and PUA_CLIENT_SECRET) which refer to your Pop Up Archive Client ID and Client Secret respectively. You can find these values by logging into you Pop Up Archive account and visiting https://www.popuparchive.com/oauth/applications

Creating a manifest file

New audio files and transcripts can be added to this app by creating manifest files in .csv format. These manifest files will contain basic information about your audio, e.g. an internal id, title, description, url to audio file, etc. These files will be used to perform a number of tasks such as uploading new audio for transcription, download processed transcripts, and updating information about your audio.

In your project folder, you should find an empty .csv file: project/my-project/data/transcripts_seeds.csv. It contains the following columns:

Column Description Required? Example
uid a unique identifier for your audio file. Must be alphanumeric; no spaces or periods; underscores and dashes okay. Case sensitive. Yes podcast-123, podcast_123, 123
title the title that will be displayed for this audio file Yes Podcast About Cats
description a description that will be displayed for this audio file No This is basically teh best podcast about cats; no dogs allowed
url a URL that will link back to where the audio is being presented on your website No http://mywebsite.com/podcast-123
audio_url a public URL to your audio file Yes, unless you already uploaded audio to Pop Up Archive or already have transcripts http://mywebsite.com/podcast-123.mp3
image_url a public URL to an image representing your audio; square and ~400px is preferred No http://mywebsite.com/podcast-123.jpg
collection the unique identifier for the collection this audio belongs to (see below for more on this) No cat-collection
vendor the vendor that will be doing the transcription Yes, unless you already produced transcripts yourself pop_up_archive
vendor_identifier if already uploaded the audio to Pop Up Archive, put the item id here Only if you already uploaded audio to Pop Up Archive 41326
notes any extra notes that will only be used internally (not public) No this audio contains explicit material

Populate at least the required fields of this file. You can load them into the app with this command:

rake transcripts:load['my-project','transcripts_seeds.csv']

Replace my-project with your project id and transcripts_seeds.csv if you are using a different file. You can run this command any number of times after editing the manifest file or with new manifest files. The script will check if the transcript already exists using the uid column value.

Making Collections/Groups

Sometimes you may want to group your audio in different ways for the user. If you are using Pop Up Archive, this step is required since Pop Up requires all your audio files to belong to a collection. You can create collections similar to how you create transcripts--with a manifest file.

In your project folder, you should find an empty .csv file: project/my-project/data/collections_seeds.csv. It contains almost the same columns as the transcript manifest file. If you are using Pop Up Archive, you must fill out the last two columns (vendor, vendor_identifier) as pop_up_archive and the Pop Up Archive collection id respectively. The collection id can be found by clicking on a collection in your Pop Up Archive dashboard and look at the URL (e.g. https://www.popuparchive.com/collections/1234), in which case the collection id is 1234

Once you fill out the manifest file, you can load them into the app with this command:

rake collections:load['my-project','collections_seeds.csv']

Similarly with transcripts, you can always re-run this script with new data and manifest files.

Uploading your files to Pop Up Archive

If you are using Pop Up Archive and have not yet created Pop Up Archive collection(s), you can run this command to create Pop Up collections from your manifest file:

rake pua:create_collections['my-project']

This will also update your database with the proper Pop Up Archive collection id in a column called vendor_identifier. It will be also useful for deployment later to update your manifest file with these identifiers. You can do that by running this command:

rake collections:update_file['my-project,'collections_seeds.csv']

If you have not yet uploaded your audio to Pop Up Archive, run this command:

rake pua:upload['my-project']

This will look for any audio items (that were previously defined in your transcript manifest files) that have pop_up_archive as vendor but do not have a vendor_identifier (i.e. has not been uploaded to Pop Up Archive), and for each of those items, create a Pop Up Archive item and uploads submit your audio file for processing. It will populate the vendor_identifier in the app's database with the Pop Up Archive item id upon submission, so you may run this script any number of times if you add additional audio items. Like with collections, you should update your manifest file with these identifiers:

rake transcripts:update_file['my-project','transcripts_seeds.csv']

Download processed transcripts from Pop Up Archive

Transcripts can generally take up to 24 hours to process. When you think they may be ready, you can run this script to downloaded finished transcripts to the app:

rake pua:download['my-project']

This will look for any audio items that have been submitted to Pop Up Archive, but not yet have a transcript downloaded. If an item's transcript is ready, it will download and save it to the app's database, and will become visible in the app. You can run this script any number of times until all transcripts have been downloaded.

Customizing your project

All project customization should happen within your project directory (e.g. /project/my-project/). Changes made anywhere else may result in code conflicts when updating your app with new code.

Whenever you make a change to your project directory, you must run the following rake task to see it in the app:

rake project:load['my-project']

Activating user accounts

This app currently supports logging in through Google or Facebook accounts (via OAuth2). You can activate this by the following:

Instructions for Google Account activation

  1. Log in to your Google account and visit https://console.developers.google.com/; complete any registration steps required

  2. Once you are logged into your Developer dashboard, create a project

  3. In your project's dashboard click enable and manage Google APIs. You must enable at least Contacts API and Google+ API

  4. Click the Credentials tab of your project dashboard, Create credentials for an OAuth client ID and select Web application

  5. You should make at least two credentials for your Development and Production environments (you can also create one for a Test environment)

  6. For development, enter http://localhost:3000 (or whatever your development URI is) for your Authorized Javascript origins and http://localhost:3000/omniauth/google_oauth2/callback for your Authorized redirect URIs

  7. For production, enter the same values, but replace http://localhost:3000 with your production URI e.g. https://myproject.com

  8. Open up your config/application.yml

  9. For each development and production, copy the values listed for Client ID and Client secret into the appropriate key-value entry, e.g.

    development:
      GOOGLE_CLIENT_ID: 1234567890-abcdefghijklmnop.apps.googleusercontent.com
      GOOGLE_CLIENT_SECRET: aAbBcCdDeEfFgGhHiIjKlLmM
    production:
      GOOGLE_CLIENT_ID: 0987654321-ghijklmnopabcdef.apps.googleusercontent.com
      GOOGLE_CLIENT_SECRET: gGhHiIjKlLmMaAbBcCdDeEfF
    

10. Google login is now enabled in the Rails app. Now we need to enable it in the UI. Open up `project/my-project/project.json`.  Under `auth_providers` enter:

"authProviders": [ { "name": "google", "label": "Google", "path": "/auth/google_oauth2" } ],


11. Run `rake project:load['my-project']` to refresh this config in the interface
12. Finally, restart your server and visit `http://localhost:3000`.  Now you should see the option to sign in via Google.

#### Instructions for Facebook Account activation

1. Log in to your Facebook account and visit [this link](https://developers.facebook.com/quickstarts/?platform=web)
2. Follow the steps to create a new app and go to the app's Dashboard
3. In your project's dashboard click *Settings* on the left panel. Then click the *Advanced* tab.
4. Under *Client OAuth Settings*:
- make sure *Client OAuth Login* and *Web OAuth Login* is on
- enter `http://localhost:3000/omniauth/facebook/callback` in *Valid OAuth redirect URIs*. Also include your production or testing urls here too (e.g. `http://myapp.com/omniauth/facebook/callback`)
- Save your changes
5. On the left panel, select *Test Apps*. Click *Create a Test App* and go to its dashboard after you create it.
6. Note these two values: *App ID* and *App Secret*
7. Open up your `config/application.yml`
8. For each development and production, copy the values listed for *App ID* and *App Secret* into the appropriate key-value entry, e.g.

development: FACEBOOK_APP_ID: "1234567890123456" FACEBOOK_APP_SECRET: abcdefghijklmnopqrstuvwxyz123456 production: FACEBOOK_APP_ID: "7890123456123456" FACEBOOK_APP_SECRET: nopqrstuvwxyz123456abcdefghijklm


10. Facebook login is now enabled in the Rails app. Now we need to enable it in the UI. Open up `project/my-project/project.json`.  Under `auth_providers` enter:

"authProviders": [ { "name": "facebook", "label": "Facebook", "path": "/auth/facebook" } ],


11. Run `rake project:load['my-project']` to refresh this config in the interface
12. Finally, restart your server and visit `http://localhost:3000`.  Now you should see the option to sign in via Facebook.


### Custom content

#### Pages

This app let's you create an arbitrary number of pages that you may link from the navigation menu or within other pages.  All pages are found within:

project/ +-- my-project/ | +-- pages/


- All pages are written in [Markdown](https://daringfireball.net/projects/markdown/syntax), but since Markdown supports HTML, you can use HTML syntax as well.
- If you create a page called `faq.md`, you can access it via URL `http://localhost:3000/page/faq`
- Subdirectories are supported, but the URL will always respond to just the filename, e.g. for the file `project/my-project/pages/misc/faq.md`, the URL will still be `http://localhost:3000/page/faq`
- You can embed assets in your markdown. For example
  - Place an image in assets folder like `project/my-project/assets/img/graphic.jpg`
  - You can refer to it in a page like this: `<img src="/my-project/assets/img/graphic.jpg" />`
- There are a few pages that the app comes with:
  - `home.md` - contains the content that shows up on the homepage
  - `transcript_edit.md` - contains the content that shows up on the top of all transcript editor pages
  - `transcript_conventions.md` - contains the transcript conventions that show up in the drop-down on all transcript editor pages

#### Menus

In your `project/my-project/project.json` file, there is an entry called `menus`.  These will contain all the available menus that will be displayed in the app.  Here are the available menus:

- `header` - this is the persistent menu that shows up on the top of all pages
- `transcript_edit` - this is the menu that shows up below the main header menu if you are on a transcript editor page
- `footer` - this is the persistent menu that shows up on the bottom of all pages

Each menu will contain a number of entries (or no entries). It may look like this:

"header": [ {"label": "Browse", "url": "/"}, {"label": "About", "url": "/page/about"}, {"label": "Main Website", "url": "http://otherwebsite.com/"} ],


The `label` is what will show up in the menu, and the URL is what that label links to. It can link to a page within the app or an external page.

Sometimes you only want to have a link show up on certain pages. You can accomplish this like so:

"header": [ {"label": "Browse", "url": "/"}, {"label": "About", "url": "/page/about"}, {"label": "Help", "url": "/page/help", "validRoutes": ["transcripts/:id"]} ],


In the above case, the `Help` link will only show up on transcript editor pages. You can see a list of available routes in the app's [router.js file](gulp/js/router.js)

#### Modals

Sometimes you don't want to redirect a user to a different page, but want to have the content show up in a pop-up modal. You can define modals in your `project.json` file like this:

"modals": { "help_modal": { "title": "A Brief Guide", "doneLabel": "Close", "page": {"file": "help.md"} }, "tutorial_modal": { "title": "A Brief Tutorial", "doneLabel": "Finished", "pages": [ {"label": "Editing", "file": "tutorial_1.md"}, {"label": "Conventions", "file": "tutorial_2.md"}, {"label": "Commands", "file": "tutorial_3.md"} ] } },


This will create two modals:

1. `help_modal` which contains the content of just one page: `project/my-project/pages/help.md`
2. `tutorial_modal` which contains tabbed content of three pages

You can invoke a modal from within a menu like so:

"menus": { "header": [ {"label": "Browse", "url": "/"}, {"label": "About", "url": "/page/about"}, {"label": "Help", "modal": "help_modal"} ], ... },


### Custom assets, styling, and functionality

You would probably want to customize the look and feel of your app. You can accomplish this by overriding the default CSS styling with a project CSS file:

project/ +-- my-project/ | +-- assets/ | +-- css/ | +-- styles.css


These styles will override any existing styles in the app. Similarly, you can add additional javascript functionality via custom js:

project/ +-- my-project/ | +-- assets/ | +-- js/ | +-- custom.js


Sometimes you may want to include additional files or tags in your app such as custom external font services, analytics, or meta tags. You can simply edit this page:

project/ +-- my-project/ | +-- layouts/ | +-- index.html


Be careful not to edit the existing app structure within the `#app` element. Also, there are a few javascript and css files that the app depends on that you shouldn't delete.

Be sure to run the project rake task if you make any changes:

rake project:load['my-project']


## Transcript Consensus

Coming soon. This section covers the rules for what makes a transcript or a transcript line "complete".

### What is consensus?

### The stages of consensus

### Configuring consensus

## Deploying your project to production

This example will use [Heroku](https://www.heroku.com/) to deploy the app to production, though the process would be similar for other hosting solutions. The commands assume you have [Heroku Toolbelt](https://toolbelt.heroku.com/) installed.

Before you start, if you used Pop Up Archive to generate your transcripts, make sure your manifest files are up-to-date to make sure your production server knows how to download the transcripts from Pop Up Archive.  Run these commands:

rake collections:update_file['my-project,'collections_seeds.csv'] rake transcripts:update_file['my-project','transcripts_seeds.csv']


Replace `my-project` and `.csv` files with your project key and manifest files. Commit the updated manifest files to your repository and continue.

1. Create a new [Heroku](https://heroku.com) app:

heroku apps:create my-app-name heroku git:remote -a my-app-name


(Only run the 2nd command if you already have an app setup)

2. Provision a PostgreSQL database:

heroku addons:create heroku-postgresql:hobby-dev heroku pg:wait heroku config -s | grep HEROKU_POSTGRESQL


Replace `hobby-dev` with your [database plan of choice](https://devcenter.heroku.com/articles/heroku-postgres-plans). This example uses the free "Hobby Dev" plan. Note that you should choose a higher plan (e.g. `standard-0`) for production; Hobby Dev has a row limit of 10,000 and a maximum of 20 connections. You can always [upgrade](https://devcenter.heroku.com/articles/upgrading-heroku-postgres-databases) an existing database.

3. Update your environment variables

figaro heroku:set -e production


This sets environment variables from `config/application.yml` in your production environment

4. Deploy the code and run rake tasks

git push heroku master heroku run rake db:migrate heroku run rake db:seed


5. Next you'll need to populate your transcripts. The last command will download your transcripts from Pop Up Archive. You can run these commands however many times you like if you update your manifest file or transcripts become available.

heroku run rake collections:load['my-project','collections_seeds.csv'] heroku run rake transcripts:load['my-project','transcripts_seeds.csv'] heroku run rake pua:download['my-project']


## Managing your project

Coming soon... this section will walk through admin and moderator functionality

### Updating your website

Coming soon... this section will walk through how to update your website with recent changes to the codebase

## Retrieving your finished transcripts

Coming soon... this section will walk through how you can download all your completed transcripts in a variety of formats for use elsewhere

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published