How Does it Work?

Introduction

Our Koios platform relies on a powerful artificial intelligence (AI) model that has been honed and improved over the last two years. Relying on cutting-edge research into speech representation, speaker recognition, natural language processing, and, of course, psychology, we created the most powerful engine on the market to predict someone’s personality from their voice.

</aside>

As you surely know, our AI model leverages audio recordings to predict personality scores. While the underlying architecture is complex, it primarily works by extracting and processing two types of signals from the audio: linguistic and acoustic signals (see Figure 1 👇).

Figure 1 Magic 🔮 behind the Koios algorithm 🪄

Figure 1 Magic 🔮 behind the Koios algorithm 🪄

Decoding Linguistic Signals: What is Being Said?

The first step of our model's processing journey begins with linguistic signals. These are essentially the words and phrases used by the speaker in the audio recording. The process starts with a critical cleaning and preprocessing stage, where we eliminate any background noise, pauses, or irrelevant sounds. This way, we ensure the purity and accuracy of the data, making it free from distortions or errors that could impact the subsequent analysis.

The cleaned linguistic signals are then fed into an automatic speech recognition (ASR) layer. ASR technology is designed to convert spoken language into written text. It's a crucial component of our system as it allows us to analyse the spoken words on a granular level, examining the language usage and context in detail.

Upon the successful transcription of the audio into text, the next stage involves the extraction of linguistic and contextual features. This includes a broad spectrum of attributes such as the choice of words, the complexity of sentences, the use of active or passive voice, the presence of specific terminologies or phrases, and even the context inferred from the speech.

These features help us understand the speaker's language patterns, their semantic preferences, and their contextual usage. By analysing these elements, we can draw correlations between linguistic patterns and specific personality traits.

The culmination of this process is the creation of a linguistic dataset. This dataset contains all the linguistic features extracted from the audio, serving as one of the key input sources for our final models that predict personality scores.

The Art of Listening: How is it Being Said?

While what is being said is important, how it's being said carries equal weight. The way people speak — their tone, pitch, volume, and speed — provides a wealth of information about their personalities. That's where our acoustic signal processing comes into play.

Firstly, we convert the raw audio recording into a unified format by normalising it. This normalisation process helps us maintain uniformity, ensuring that our analysis is not biased by varying sound quality or volume levels in different recordings.

Next, like linguistic signals, acoustic signals undergo thorough cleaning and preprocessing. Here we remove any unwanted sounds like background noise or distortions that might interfere with the extraction of our desired features. This step is crucial for maintaining the quality and accuracy of the signals that we will subsequently analyse.

Once cleaned, we then extract voice biomarkers from the audio. These biomarkers are characteristics of the speaker's voice that have been scientifically validated to associate with specific personality traits. For instance, the pace at which someone speaks, the range of their vocal pitch, their speech rhythm, and even their momentary pauses could reveal telling insights about their personality.

To extend our analysis beyond these biomarkers, we also feed the audio recording through a set of pre-trained audio models. These models are designed to extract a wide array of additional acoustical features, enriching our understanding of the speaker's vocal profile.

All of these elements collectively form an acoustic dataset, a treasure trove of voice-based insights that complement the linguistic dataset. Together, these datasets provide a comprehensive and nuanced understanding of the speaker, which feeds into our final models that predict personality scores.

The Symphony of Signals: Crafting Personality Scores

After the diligent processing and extraction of both linguistic and acoustic signals, the resultant datasets serve as the input to our final models. These models represent the culmination of our unique approach to personality prediction, weaving together the linguistic and acoustic threads into a comprehensive profile of the speaker's personality.