From Sound to Script: Speech-To-Text Software

Recently, the Adelphi University Innovation Center has been exploring and writing about text-to-speech (TTS) systems. A natural complement to this technology is speech-to-text (STT), which performs the reverse task: converting spoken language into written text. Once again, there are significant accessibility implications: lectures, meetings, and conversations can be transcribed, supporting community members who are deaf or hard of hearing while also producing searchable written records of spoken material. Beyond accessibility, speech-to-text systems open new possibilities for hands-free computing, voice-driven interfaces, and rapid documentation of ideas. For many everyday tasks, dictating notes, drafting emails, or capturing research reflections verbally can be faster and more natural than typing.

As with other recent advances in artificial intelligence, speech-to-text systems can now run quickly on a standard laptop. In 2022, OpenAI released the STT system Whisper, which became the basis of the enhanced system Faster-Whisper, featuring 2x-4x faster transcription speeds and lower RAM usage. As always, we make simple, thoroughly documented sample code and demonstrations available through our GitHub repository, serving as a resource for the open-source community and a starting point for experimentation with these systems.

Running these systems locally offers several important advantages. First, privacy: when recordings and transcripts are processed on your own computer, sensitive material, such as recordings of research discussions, classroom dialogue, or internal meetings, remain entirely under your control. Second, cost: there are no purchases or subscriptions required, and there are no usage limitations as there are with some services offering a free trial. Third, customization: a local STT engine can be integrated with other tools running on the same machine, including note-taking systems or locally hosted language models that can summarize, analyze, or organize transcripts automatically.

Developing familiarity with technologies like these empowers our community with practical, low-cost tools that can support teaching, research, and experimentation across disciplines. As speech recognition becomes increasingly accurate and accessible, locally run systems allow faculty, students, and staff to build voice-enabled workflows that respect privacy while expanding the ways we capture, interact with, and share knowledge, advancing the mission of both the Innovation Center and the university as a whole.