Internet Archive is using AI to extract words from 100-year-old records

The words from As We Parted At The Gate, a recording from 1915, were successfully extracted by OpenAI’s Whisper tool / Image: Internet Archive

Internet Archive, a digital platform with millions of free ebooks, audiobooks, movies, software, and other cultural artifacts, is heavily exploring ways of using AI tools to improve the library.

Recently, the company was experimenting with Whisper, a speech recognition tool from OpenAI, the company behind ChatGPT and DALL-E.

The testers wanted to learn whether Whisper would be able to extract spoken and sung words from old, noisy 78rpm records.

The results were promising, for instance, the tool found most of the words in As We Parted At The Gate, a recording from 1915.

All the extracted texts are now available online for free. They will help better understand  100 year-old Edison recordings that were donated to the Internet Archive by the University of California Santa Barbara Library.

💬 The recordings and the transfers were so good that the automatic tools were able to make out many of the words.

All the 78rpm recordings are a part of Great 78, a community project for the preservation and discovery of old records dating from 1898 to the 1950s. Currently, over 400,000 carefully remastered recordings are available.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: