Tove is the API for the ALICE Text Editor tool. The text editor tool takes in reductions on transcription data from Caesar, and allows users to review, update and approve transcriptions.
- Install dependencies:
- Rails 6.0.1
- Ruby 2.6.5
- PostgreSQL 9.5
-
Clone the repo
git clone https://github.com/zooniverse/tove.git
-
cd
into the cloned folder -
Run
bundle install
-
Run
rake db:setup
to set up the database and generate test data -
Run
rails s
to start the app locally
- Install Docker and Docker Compose
-
Docker
-
Clone the repo
git clone https://github.com/zooniverse/tove.git
-
cd
into the cloned folder -
Prepare the Docker containers:
docker-compose build
docker-compose run --rm app bundle exec rails db:setup
docker-compose run --rm -e RAILS_ENV=test app bin/rails db:create
-
Create and run the application containers with
docker-compose up
-
Run tests with:
docker-compose run --rm -e RAILS_ENV=test app bundle exec rspec
Or interactively / manually in a docker shell
docker-compose run --rm -e RAILS_ENV=test app bash
# from the bash prompt
bin/rspec
Transcription data files are generated and saved to storage when a transcription is approved, and removed from storage when a transcription is unapproved.
When a transcription is approved, four files are generated for the transcription:
- raw_data.json: raw unparsed transcription data as json
- consensus_text.txt: transcription text only
- transcription_metadata.csv: datatable with metadata about the transcription
- transcription_line_metadata.csv: datatable with metadata about each line of the transcription
Users with edit permissions have the ability to download transcription data for a single transcription, or for a project, workflow, or transcription group. Files will be downloaded directly to the browser as a single zip file with a directory structure that mirrors the way that the transcriptions are grouped in the app (e.g. project_a/workflow_b/group_c/transcription_4/files).
Talking with Azure Blob Storage
Connecting to Blob Storage in Tove is handled by Rails Active Storage. Calls to upload transcription data to storage, or remove it from storage occur within the Transcription Controller.
For reference for future apps that may want to set up Rails Active Storage, here are the steps that were taken to get this set up:
- Add gems
azure-storage
andazure-storage-blob
to Gemfile. - In the transcription model, add the line
has_many_attached :export_files
. We will now useexport_files
to handle the uploading and removing of files. - Add methods on the transcription model to upload and remove files from storage. We have called them
upload_files_to_storage
andremove_files_from_storage
. Within the upload method, the key line is:export_files.attach(io: temp_file, filename: filename)
within the remove method, the key line is:export_files.map(&:purge)
- The methods on the transcription model are called when a transcription is either approved (
upload_files_to_storage
) or unapproved (remove_files_from_storage
). Note that files are not uploaded directly from the browser, which differs from how Panoptes uploads work.
Note that as of today (Feb 17, 2019), setup instructions for the current stable version of Rails (6.0.1) differ from the setup instructions for Rails Edge – be careful to look at the correct docs.
DataExports::DataStorage - Using Temp Directories
The process for downloading files from storage, zipping, and sending the zip file to the client makes use of ruby temp directories. All files generated during this process are downloaded to the temp directory. When the block opened by the Dir.mktmpdir
function closes, the temp directory is removed automatically, and the generated files are removed along with it.
Hence, the step of sending the zip file to the client must happen within a yield block – see TranscriptionController#export
for example. This allows the process of sending the file to happen within the block opened by the Dir.mktmpdir
function.
Reference the ALICE Caesar Setup doc for instructions on configuring Caesar with ALICE/Tove.
“You seem very clever at explaining words, Sir,” said Alice. “Would you kindly tell me the meaning of the poem ‘Jabberwocky’?”
“Let’s hear it,” said Humpty Dumpty. “I can explain all the poems that ever were invented—and a good many that haven’t been invented just yet.”
This sounded very hopeful, so Alice repeated the first verse:
‘Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.
“That’s enough to begin with,” Humpty Dumpty interrupted: “there are plenty of hard words there. ‘Brillig’ means four o'clock in the afternoon—the time when you begin broiling things for dinner.”
“That’ll do very well,” said Alice: “and ‘slithy’?”
“Well, ‘slithy’ means ‘lithe and slimy.’ ‘Lithe’ is the same as ‘active.’ You see it’s like a portmanteau—there are two meanings packed up into one word.”
“I see it now”, Alice remarked thoughtfully: “and what are ‘toves’?”
“Well, ‘toves’ are something like badgers—they’re something like lizards—and they’re something like corkscrews.”
“They must be very curious creatures.”
“They are that,” said Humpty Dumpty: “also they make their nests under sun-dials—also they live on cheese.”
--Lewis Carrol, "Through the Looking Glass"