Research Transcription Service

About AWS Transcription Service

The Social Sciences Computing Services provides a transcription service available to members of the University of Chicago. This transcription service is intended for research projects led by University of Chicago faculty, students, and staff. This service offers built-in security features and custom dictionaries to support fast and reliable transcription file generation. The University of Chicago license is managed by Information Technology Services and hosted on the Amazon Web Services (AWS) platform.

IRB Requirements

Due to the sensitive nature of using Generative Artificial Intelligence platforms, SSCS is requiring an IRB for all requested accounts, including research projects that qualify for exemption and those classified as non-research. The IRB will serve as a central record of the data submitted to the AWS transcription service. Please ensure that the Social Sciences Division Transcription Service is referenced in your IRB application.

For details on data security risk and their protection levels, please see the University’s Secure Research Data Strategy website.

*** This transcription service can only be used to transcribe data with a Research Data Strategy (SRDS) protection level classification of Low or Moderate. The SBS-IRB Approval Letter for Human Subjects research determination will identify the risk level as minimal or moderate risk. SBS-IRB projects with minimal risk are labeled as exempt or expedited. ***

The SBS-IRB Approval Letter with a research determination of “not research” will also be accepted for access to the service.

Data protected by HIPAA, FERPA, COPPA, and GDPR cannot be ingested into the transcription service.

Researchers

This transcription service is restricted to University of Chicago faculty, students, and staff conducting research under the university’s auspice.

Data Security Features

This implementation of the transcription service ensures that data is not used by AWS for its own purposes or shared with third parties. SSCS has enabled AWS’s opt-out policies to safeguard user data.

In addition, the University of Chicago license requires that all AWS storage be encrypted by default.

Users cannot view or access the files or transcription outputs of other users. File naming conventions are limited to CNet IDs and exclude project names or other metadata to maintain confidentiality.

Login

While on UChicago Wifi or UChicago Virtual Private Network (cVPN) when off campus, use your CNet ID to log in at: https://scribe.ssd.uchicago.edu/

Data Deletion Policy

Input and output files in SSD Scribe are automatically deleted after 30 days. Please store your original and output files using the secure storage options specified in your IRB.

Cost

Researchers with high-volume projects or those outside the Social Sciences Division may be required to provide a funding source to help offset the operational costs of using the service. High-volume projects usually involve multiple researchers (multiple accounts) working on one project. Individual thesis or individual course work research does not usually fall under the high-volume.

Request an SSCS Transcription Account

To request an account, please submit the SSCS ticket request here:
https://sscshelp.uchicago.edu/catalog_items/2609548-aws-transcription-service-request/service_requests/new

Account creation typically takes two to three business days.

***Only one IRB can be associated with each user account.***
To work on a project with a different IRB, please submit a new SSCS ticket request.

User Support

SSCS Teaching and Technology can help with introductory training and guidance on submitting a transcription job. For questions specific to using the service’s current features, please contact ssdtnt@uchicago.edu.

Account Feedback

For a specific use not currently available, please contact ssdtnt@uchicago.edu to share your project’s transcription needs. We welcome feedback and ideas for future versions of the service.

Account Closure

Your account will automatically closeout according to the date provided in the SSCS ticket.

If you wish to reactivate your account, please submit a new ticket request

If you would like to modify your project end-date, please contact ssdtnt@uchicago.edu

User Guide and Features

The user guide provides an overview on how to use the transcription service and lists the available features. To access the complete user guide, you can go to https://sscs.uchicago.edu/transcription/userguide

The current version of the transcription service features a custom dictionary and a custom language model.

Custom Dictionary
SSD Scribe allows you to upload a custom dictionary to ensure technical and domain-specific terms are accurately represented in your transcripts. The custom dictionary allows for uncommon words or certain words to be transcribed in a targeted way. *Please note that using a custom dictionary disables automatic multi-language detection, so you will need to specify the processing language.

Custom Language Model
SSD Scribe supports training a custom language model for transcription. You can upload text files as training data. To improve accuracy, it is recommended to upload files with a minimum of 100,000 words. After uploading your files (e.g. text files or word documents), training the language model typically takes 6–10 hours, depending on data size. *Currently supported languages include English, Spanish, German, Japanese, and Hindi.*

Terms of Use

Researcher Attestation:

  1. I will only use the transcription service for the files specified in the IRB number provided in this ticket and understand the University of Chicago’s Research Data Protection Policy.
  2. I understand that the transcription service is only for authorized users and authorized use of the service. The transcription service is monitored for use, auditing, and security purposes.
  3. The transcription file derived from an interview or data file are considered as sensitive as the input files, and are governed by the same IRB protocol. By using the system, you acknowledge that you will abide by your IRB protocol requirements.
  4. If output files are printed, hardcopies must follow storage practices as specified in project IRB.
  5. Any device connecting to this service must follow the University of Chicago’s End User Device Policy.