Partner Sign-in/Register
/solutions/audio/ /en/solutions/audio/

Audio Quality

Snom Know-How: Audio

How does real-time digital audio transmission work?

One of the central basic needs of human beings is communication.

Communication by means of gestures, facial expressions, (written) signs - but above all spoken language. Because this has an advantage over gestures and signs: it consists of several components (choice of words, emphasis) - so it can convey the meaning of what is said more clearly. Even a slight pause before a word, a slightly different emphasis - and a statement is suddenly turned into its opposite. So it is not so important to see the other person in a conversation, the content is primarily conveyed through the sound and the words.

It is therefore all the more important to transmit conversations exactly as they originally sound - without delay, without distortions - just as if the speaker were sitting next to you.

And this is precisely where the first hurdle lies in telephony - and even more so in digital. Because anything is possible here - for better or for worse!




Who thinks about what actually goes on in their phone when you pick up the receiver to make a call? In this video, we would like to give you an understanding of the complicated processes involved in real-time communication. Everything starts with the spoken word. A microphone on the handset records the sound pressure of the spoken word and converts it into an amplitude, i.e. an analog signal. The amplitude represents, for example, the volume or frequency of the spoken word.

In the next step, this signal must be digitized. To do this, the previous waveform or its amplitude must be "quantized" (formed into quanta). In this process, the state of the signal is defined with a unique, specific value. For example, with a 1 or a 0 depending on the amplitude level.

In this way, the previous wave is resolved in a raster where each "deflection" in this raster resembles the shape of the previous analog amplitude.

To lose as little information as possible when digitizing the signal, the sampling rate of the quantization is crucial - i.e. how finely the raster samples the previous amplitude.

But digitization is not the end of the story - the signal must now be processed, compressed and, above all, made ready for transmission. Of course, all this must be done without interruption and in real time - and extremely efficiently.

The codec

The codec not only digitizes the signal, it also compresses it - thus reducing the file size without degrading the signal too much. At the same time, the codec decodes the signal back at the remote station and converts it back to an analog signal that we can understand.

Compression plays an important role in real-time communication. After all, the signal must arrive at the remote station in real time and without interruption. At the same time, however, compression always means a loss of quality - the original signal is cut.

The example of MP3 shows that compression is not always a bad thing. With this type of compression, signals that are masked by other, louder signals are simply cut away. This means less information and therefore smaller, more efficient files.

Codecs in IP telephony work in a similar way to create the smallest and most efficient data packets possible. Here's how it works:

The codec samples the audio signal every 125 microseconds at a sampling rate of 8,000 Hz and creates a sample. This sample is then compressed to 8bit by, for example, reducing the frequency range from the original 15,000 hearts to 3,000 - 3,400 hearts.


Real-time communication

After the signal has been compressed and optimized and also divided into small data packets, the packets must now be sent to the receiver as efficiently and quickly as possible. For this purpose, various protocols are used that are directly integrated in the SIP protocol. The SIP network protocol takes over the control and the communication session between two or more participants. The SIP protocol not only negotiates the communication modalities, other protocols such as the (S)RTP or UDP protocol are also integrated into the protocol itself.

Communication in Internet telephony must take place in real time, i.e. the previously created data packets must arrive at the recipient as quickly as possible and, above all, at the right time, be unpacked and converted back. The task of preparing the packets for transmission is performed by the so-called RTP (Real-Time Transport Protocol). Similar to the postal service, this protocol ensures that the individual data packets can be sent as efficiently as possible and with all the information needed to decode the packets later.

Once all packets are prepared, the so-called UDP protocol (User Datagram Protocol) takes over. UDP is a minimal connectionless network protocol that belongs to the transport layer of the Internet protocols. It enables the fast transmission of data packets via computer networks and the Internet.

However, the UDP protocol works only one-sided, i.e. it sends data packets no matter what and never knows whether the packets have arrived completely at the recipient. This is also the reason why the packets were packed as efficiently and small as possible in the previous steps, because if one of the packets cannot be sent in time or does not reach the recipient completely, it is considered lost.

If the data packet has arrived completely on time at the recipient, the whole process starts again from the beginning, but in reversed order. The required data packets are combined and unpacked, the digital audio signal is converted to an analog signal and output via the loudspeaker integrated in the handset.

Download our case study

Real-time audio transmission

Find a Snom Reseller

Snom professional VoIP products are sold exclusively through accredited Snom partners. Contact your local Snom Sales representative to find a reseller near you.

Contact us

Become a Snom Partner

All value-added resellers and service providers who wish to sell and install Snom products are invited to register.