AI Voices: New Forms of Scams Are on the Rise

Scammers use Artificial Intelligence to impersonate family members in demand of ransom money.

In+2022%2C+the+U.S+public+received+50.3+billion+unwanted+scam+and+spam+calls.+For+many+of+us%2C+these+are+a+daily+occurence%2C+and+these+figures+are+only+set+to+increase+as+technology+advances.+%28Photo+Credit%3A+Lindsey+LaMont+%2F+Unsplash%29

In 2022, the U.S public received 50.3 billion unwanted scam and spam calls. For many of us, these are a daily occurence, and these figures are only set to increase as technology advances. (Photo Credit: Lindsey LaMont / Unsplash)

AI Voice technology and its use in malicious activity have been on the rise in recent months. From impersonating people in search of ransom to asking for money when in times of need, these malicious actors use AI voice replicating technology to artificially copy the voices of people online. 

In April 2023, an Arizona family reported an unknown caller claiming that he had kidnapped the family’s 15-year-old daughter, Brianna. The caller threatened to harm the girl if the family were to contact any federal or local authorities, demanding a one million dollar ransom in exchange for her return. The mother, Jennifer Stefano, explained to CNN that she was confident that the girl’s voice in the call was that of her daughter and that what convinced her was how her voice matched the crying voice in the call. 

“It was the crying, it was the sobbing. What really got to me is that she’s not a wailer. She’s not a screamer. She’s not a freak out. She’s more of an internal, try-to-contain, try-to-manage person. That’s what threw me off. It was the voice, matching with the crying” said Jennifer Stefano.

When Stefano did call 911 for help, dispatchers identified the call to be a scam and a virtual kidnapping. DeStefano’s worries finally came to an end when Brianna herself called her mother to assert that she was fine and that the phone call was fabricated.

In the case of this Arizona family, authorities have not yet verified whether or not the use of AI technology was involved in this specific case, but the likelihood of this in the future is far from zero. 

With social media being increasingly accessible, it is not difficult for malicious actors to download recordings of people’s voices online. By simply tracing a potential target’s digital footprint, it can be quite easy to collect a compilation of voice recordings to be used as training data in AI technology. 

When it comes to AI and machine learning, the amount of data to train them, along with the quality of their expected outputs, varies. According to Altered, a company dedicated to developing powerful AI software for commercial use, even a few seconds of voice recordings may be sufficient to clone someone’s voice. Of course,  larger datasets typically result in a more accurate depiction of an individual’s voice, but with advancements in technology, smaller datasets can also achieve the same results. 

With AI being on the rise, more companies are allocating resources to developing better AI and Machine learning products. (Photo Credit: Prompt by JPxG, model by Boris Dayma, upscaler by Xintao Wang, Liangbin Xie et al. (Apache License 2.0 <http://www.apache.org/licenses/LICENSE-2.0> or BSD <http://opensource.org/licenses/bsd-license.php>), via Wikimedia Commons)

Besides Altered, other big tech companies including Microsoft and Google have been developing their own AI software for recreating voices. According to Microsoft researchers, VALL-E can easily simulate a voice with a recording as short as three seconds long. Moreover, Google’s Text-to-Speech AI is already available with a variety of tuning features to users on the cloud. 

Although it may seem relatively easy to recognize an AI scam call, this may not be the case for those who are unfamiliar with the development of AI technology and its potential uses. According to the Federal Trade Commission, elderly individuals may be targets in family emergency schemes in what is essentially an AI version of a grandparent scam.

In these cases, it may be useful for families to consider using a “safe word” to verify the legitimacy of calls. But, before all of that, there are other precautions that one should consider, such as if the phrasing of the speaker is something that the person would usually say and contact them by their reputable means of communication first. 

Scam calls are not the first instances of AI voice technology misuse. Earlier this year, ElevenLabs, another company developing AI voice technology, has seen “an increasing number of voice cloning misuse cases.” Simultaneously, 4chan members were found using the technology to create racist material with the voices of popular figures such as Joe Rogan and Emma Watson. ElevenLabs stated on their website that anyone may not use their product with the intent of abuse, and suspension or termination of the account may result in doing so. 

The future of AI and its consequences are still in a gray area. Whether or not government branches choose to apply proper legislations remains unclear. (Photo Credit: mikemacmarketing, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons)

However, when used correctly, AI Voice technology can prove to be quite an effective tool for creating forms of entertainment. For example, AI voice products can generate effective and believable recordings that can be used for storytelling or podcast purposes. Many examples can be found on the ElevenLabs website, in which members of the community use the voice product to create their own content. Another example is in the context of video games where AI can be used to generate lines for characters without the need for extra recording time. However, the positive and negative impacts of AI on industry are still to be debated. 

In terms of video games, another popular use of AI voice technology is with presidential elects. In this case, AI technology is used to clone the voices of past presidents and used in Text to Speech services to make it seem as if the presidential elects are playing a video game. Multiple videos of cloned presidents were posted on TikTok and YouTube, where they are heard and even seen playing and arguing about games such as Minecraft. As a novel TikTok and Twitter trend that began earlier this year, the number of videos with past or current presidents such as Barack Obama, Donald Trump, and Joe Biden playing video games has only increased. The amusement of presidents seemingly playing games has also inspired others to even create petitions for these causes.

Other miscellaneous uses of AI Voice technology include using machine learning to recreate the music of various genres. With different AI music generators such as MusicGen and Audoir, internet users are able to create distinct songs that sound vocally identical to the real artist. Through this, individuals and even artists can efficiently create new compositions of music at a relatively low cost. Drake fans, for instance, have used this trend to their advantage, creating many songs in Drake’s voice by using keywords of their specified subjects. 

However, these AI music generators still have their drawbacks. These models are still susceptible to abuse and the potential copyright infringement cases that may arise from the use of music as training data. In addition, the extent to which these AI models can contribute to creativity is only so far, as they cannot necessarily copy the unique creativity of a human; or at least not yet. 

It is clear that understanding the potential of AI and its effect on society is something that we cannot disregard in the future. As it stands right now, AI is still a large research problem, and there are still new developments in the field to be made. As AI becomes more and more complex, knowing what to expect from the actions of AI and how to deal with these expectations is quite important, especially when bad actors are involved.

“It was the crying, it was the sobbing. What really got to me is that she’s not a wailer. She’s not a screamer. She’s not a freak out. She’s more of an internal, try-to-contain, try-to-manage person. That’s what threw me off. It was the voice, matching with the crying,” said Jennifer Stefano on CNN.