Research Transcription Service

About AWS Transcription Service

Social Sciences Computing Services is providing a pilot Transcription Service for use by Social Sciences Division Faculty, Students, and Staff. The University of Chicago license is held by Information Technology Services and hosted on Amazon Web Services (AWS) platform. The transcription service is limited to research being led by the University of Chicago.

IRB Requirements

Due to the sensitive nature of using Generative Artificial Intelligence platforms, SSCS is requiring an IRB for all requested accounts, including exempt research and research classified as non-research. The IRB will serve as a central record of the data being inputted into the AWS transcription service.

Please identify the Social Sciences Division Transcription Service in your IRB application.

***The current transcription pilot can only ingest data with a Research Data Strategy (SRDS) protection level classification of Low or Moderate.***

For details on data security risk and their protection levels, please see the University’s Secure Research Data Strategy website.

***The SBS-IRB Approval Letter will identify the risk level as minimal risk or moderate risk. SBS-IRB projects with minimal risk are labeled as exempt or expedited. ***

The SBS-IRB Approval Letter with a research determination of not-research will also be accepted for use of the service.

Data protected by HIPAA, FERPA, COPPA, and GDPR cannot be ingested into the transcription service.

Researchers

This transcription service is targeted at the University of Chicago Social Sciences Division faculty, students, and staff who are conducting research under the University of Chicago.

Data Security Features

This build of the transcribe service protects data from being utilized by AWS for their own purposes and protects data from being utilized with other third parties. SSCS has enabled AWS’ AI services opt-out policies.

The University of Chicago license also requires the AWS storage to be encrypted by default.

Users will not be able to see other user’s file content or access other users’ transcription output. Naming conventions will consist of CNet IDs and will not include project names or any other meta-data. For users with multiple projects, project names can only be listed numerically.

Best Use Features

AWS Transcription is ideal for an initial transcription. It can transcribe an audio file in a relatively short amount of time. Once a transcription job is complete, the researcher can download the output file (currently in word and json format). Researchers can then clean and analyze files in a separate Text Analysis software.

 

New Dictionary Feature

SSCS transcription jobs can now leverage user-defined vocabularies. You can now upload your own dictionary and the transcription service will prioritize those terms whenever you run a transcript.

Why it matters: Higher accuracy for jargon, names, and acronyms – perfect for discipline-specific and branded content.

This release cuts down manual clean-up time and delivers publication-ready text straight out of the pipeline.

How to Use it

  1. Upload your custom vocabulary dictionary to the dictionaries page
  2. Wait for the dictionary to get registered (2-3 minutes of wait-time)
  3. Once done, you’ll have the option to select your registered dictionary every time you start a transcription job

Sample dictionary files and guides are available in the newly created dictionaries page as a starter kit. 

 

Important note on file storage
Only use AWS storage as scratch space. Do not rely on AWS storage to hold your audio files and output files indefinitely. Once you have transcribed your audio files and downloaded the output, SSCS recommends deleting your files from the AWS S3 storage.

Request a Transcription Account

To create an account, please complete the SSCS Ticket:
https://sscshelp.uchicago.edu/catalog_items/2609548-aws-transcription-service-request/service_requests/new

Account creation will take approximately two to three business days.

***Only one IRB can be allowed per user account.***
Please submit a new SSCS ticket for projects with a new IRB.

User Support

SSCS Teaching and Technology can help with introductory training and guidance on submitting a transcription job. For questions specific to using the service’s current features, please contact ssdtnt@uchicago.edu.

Account Modifications

For a specific uses not currently available, please contact ssdtnt@uchicago.edu to share your project’s transcription needs. We welcome feedback and ideas for future versions of the service.

Data Deletion Policy

Input and Output files in AWS are automatically deleted after 30 days. Please store your original and output files as specified in your IRB.

Account Closure

Your account will automatically closeout according to the date provided in the SSCS ticket.

If you are not using the account and want to use it at a later date, SSCS can accommodate this request. SSCS will deactivate your account and delete the files held in the account. When the account is paused, it cannot hold any files, audio or output files.

If you would like to modify your account closeout date please contact ssdtnt@uchicago.edu

Terms of Use

Researcher Attestation:

  1. I will only use the transcription service for the files specified in the IRB number provided in this ticket and understand the University of Chicago’s Research Data Protection Policy
  2. I understand that the transcription service is only for authorized users and authorized use of the service. The transcription service is monitored for use, auditing, and security purposes.
  3. The transcription file derived from an interview or data file is at least as sensitive as the input file, and are governed by the same IRB protocol. By using the system, you acknowledge that you will abide by your IRB protocol requirements.
  4. If output files are printed, hard-copies must follow storage practices as specified in project IRB.
  5. Any device connecting to this service must follow the University of Chicago’s End User Device Policy.