New AI-Powered Audio Codec From Meta is Promising 10x Compression Compared to MP3 Format

ODSC - Open Data Science
3 min readDec 7, 2022

Meta, the parent company of Facebook, WhatsApp, and Instagram announced in late October a new AI-Powered audio codec. Called EnCodec, it can reportedly compress audio files 10 times smaller than the MP3 Format we’re all familiar with. Not only that, but it can compress at 64kps with no loss in music quality. If proven to be reliable, Meta claims that the technique could dramatically improve the sound quality of speech on low-bandwidth connects. Connections such as phone calls in areas with weak signals would see a drastic improvement.

This technology was first debuted back in late October in a paper titled, “High Fidelity Neural Audio Compression.” It was authored by Meta AI researchers Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi who summarized their research in this blog. In the blog, the team emphasizes the importance of compression technology, which is most notable when it concerns video, image, and audio files. Compression technology enables higher quality multi-media experience online as users enjoy platforms such as Netflix, YouTube, and others. But how does this exactly work?

According to Meta, the method is a three-part system that is trained to compress audio files into the desired target size. The encoder first transforms uncompressed data into a lower frame rate, or “latent space” representation. From there the “quantizer” then will compress the representation to the target size while also keeping track of the most critical information needed to rebuild the original signal. Finally, the last stage will see the decoder turn the compressed data back into the audio file in real-time using a neural network on a single CPU.

The key to all of this is Meta’s AI-powered audio codec’s use of discriminators. They allow the files to be compressed as much as possible without the loss of critical elements of the signal that make it both recognizable and distinctive: “The key to lossy compression is to identify changes that will not be perceivable by humans, as perfect reconstruction is impossible at low bit rates. To do so, we use discriminators to improve the perceptual quality of the generated samples. This creates a cat-and-mouse game where the discriminator’s job is to differentiate between real samples and reconstructed samples. The compression model attempts to generate samples to fool the discriminators by pushing the reconstructed samples to be more perceptually similar to the original samples.”

Though compressing music and audio files isn’t exactly new, this AI-powered technology can, in theory, increase the quality of communication in regions/areas where signal quality is poor. This would not only open the door to greater internet accessibility across the globe, but possibly provide networks with relief as they experience bottlenecks from being overburdened.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.