About the author
Jyri Huopaniemi is the Head of Technology Licensing at Nokia Technologies
Since the launch of the first smartphone over a decade ago, R&D teams, engineers and industrial designers have been seen involved in a battle of escalating innovation with increasing speed.
Consumers have seen that the technologies in their devices change rapidly. Now they have access to computer skills and abilities that only recently were thought to be impossible to achieve in such a thin form factor. Now we use artificial intelligence daily to find the best route to work. We can stream the latest movies almost instantly.
Some of the greatest advances have been in the camera integrated in these devices. From grainy images a little over a decade ago, we can now take 4K quality images and videos, augmented with AI to produce professional quality content.
Increasingly, R&D teams have also realized the possibilities of integrating improved sensory technology into smartphones. This is evident in current trends in AR and games, but also increasingly in traditional sensors such as microphones. In the case of audio, one of today's missions is to match audio capabilities with high definition image and video capabilities on today's cameras.
Audio capabilities that unlock next-generation experiences
Beyond removing the traditional audio connector, audio innovations for smartphones have been limited in recent years.
In general, attention has focused on improving overall quality. However, the audio experience has not changed significantly since the days of the Walkman or the MP3 player: stereo at best, but often still mono, and some selection to adjust the playback.
Consumers often also rely on the purchase of external hardware to improve the clarity of reproduction. In recent years, we have seen a positive trend in smart speakers and better voice and audio quality in smartphones, which is clearly a step in the right direction.
Most of the improvements in the audio of the device have been largely limited to the reproduction of professional content. Innovative R&D teams have the opportunity to rethink the audio experience and adapt it to what can be achieved by capturing images and videos.
By integrating sensory technologies and intelligent software, device manufacturers can radically redesign the audio experience, giving users more control over how they capture audio. Let's look at two examples:
- Smart audio algorithms that allow spatial audio capture can also enable audio zoom functionality. By functioning as a telephoto lens for audio, the zoom capability allows users to isolate and approach the desired sound source, drowning out unwanted noise,
- The same technologies can also unlock the ability to dynamically track sound sources in motion, as well as automatic suppression of unwanted sounds, such as wind noise. Even the post-capture edition of the sound scene can now be implemented, giving greater control over a captured scene, creating virtually unlimited possibilities when it comes to how we tell our stories.
Imagine a father attending his son's school play. Historically, you would have to fight with poor acoustics and annoying noises from audience members, while accepting that the quality of the captured sound would be limited by your distance from the stage (not to mention the silent delivery of nervous young actors).
Today, innovative audio technologies can mitigate these circumstances to provide capabilities never before seen to users. While spatial audio capture will reproduce the sound scene during playback, it does not overcome the problem of ambient noise or allow it to approach the action. This is where zoom and audio tracking capabilities come into play.
Using the smartphone's camera interface, a user can now bring the audio along with the video, while eliminating the ambient sound of the place: shuffling chairs, conversations in the hallway and nervousness. Children in the audience. In addition to this, one can dynamically select and track the key actor, which allows capturing its performance with complete, vibrant and crisp details.
Marriage of hardware and software in the R&D stage
These capabilities are achieved through software that works with the hardware. It does not require a significant reinvention of the current form factors. But it does require a close relationship between engineers and device designers.
By working with the design team, software engineers and R&D teams can harmonize algorithms that capture spatial audio with the unique specifications of the device form factor. This close association is important, since the placement of microphones in the smartphone will contribute to the quality of the resulting applications. It will also decide what capabilities can be achieved.
While optimal placement is not always possible without compromising the form factor, this can be largely addressed in the initial stage of R&D. Using laboratory acoustic measurements, the audio algorithms that analyze and process the multiple Microphone signals can be calibrated for the specific location. This greatly contributes to preserving the integrity of the form factor and also determines what capabilities can be created.
These should also work in harmony with the computing power of the device. This may include integration with AI engines to allow the recognition of objects for sound, giving users the ability to focus the sound or eliminate distracting background noise.
While democratizing access to immersive audio is half the equation, ensuring the ease of use of these capabilities is the other essential component. An effective user interface is another fundamental asset: it must be as intuitive as the way we use video capture today. Once again, software designers must work closely with R&D and engineers to ensure that these capabilities can be easily used.
The need for truly immersive content
Device manufacturers should consider why and how people use their smartphones to communicate today. In a digital world full of social channels in which we all share our lives daily, the importance of the technology we use to capture and share key moments cannot be overstated.
This is illustrated by the fact that almost 60% of Internet users upload and share videos online today, while almost 80% of all digital video viewers consume this content through smartphones.
The delivery of new experiences should not consist in reacting to demand. It should be about setting the standard for innovation. The main focus for smartphone manufacturers should be to allow more meaningful ways to connect to digital media, whether user-generated or professional content.
The development of sensory technologies that capture the most real image of our environment is key. This is because when we are not immersed in the transmission of the latest television series, we are the narrators. The devices that allow us to create new levels of immersion, deepening connections with our family, friends and wider audiences, empower us as storytellers.
Manufacturers of original devices that understand the role of audio in the advancement of digital content will probably be one step ahead of their competitors. They will take the lead in delivering products that offer true market differentiation.
This is increasingly crucial in the future against new forms of digital content and technological trends. New mobile technologies, such as 5G, as well as the evolutionary capabilities of virtual and augmented reality are configured to unlock increasingly immersive experiences. These advanced audio technologies will be a key ingredient to deliver.
Jyri Huopaniemi is the Head of Technology Licensing at Nokia Technologies .