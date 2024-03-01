What happens when the best-known figure in the development of generative algorithms is in talks with the man who designed the iPhone and a banker obsessed with financing high-impact technological projects?

Media reports say Sam Altman, Jony Ive and Masayoshi Son are working on a project to create a ground-breaking “iPhone of AI”, a permanently connected assistant we can consult through an as yet-to-be defined interface.

Jony Ive is apparently obsessed with abandoning the screen, feeling a moral obligation to mitigate the unintended consequences of the iPhone such as app addiction: he even himself imposes screen times on his children. Masayoshi Son controls 90% of Arm Holdings, one of the leaders in the development of low-power microprocessor architectures used on many mobile devices. Finally, Sam Altman has just recently presented a new version of ChatGPT that can interact by voice and accept images as input.

What could the three be talking about? Getting rid of screens is a good idea: they require our sole attention, as the number of road accidents due to phone use testifies, never mind the annoyance of having to swerve past people staring into their screens as they walk. Given that glasses have enjoyedlittle success so far, I would imagine they will be focus on hearing, on transmitting information through sound.

In which case, headphones are not going to work, because they also tend to isolate us from the wider world. One solution would be bone conduction; devices that allow us to capture and receive information without blocking our auditory canals, that are reasonably discreet, are not widely used, despite having been on the market for a relatively long time, but mainly focused on sports.

A non-intrusive bone conduction headset that could interact with a generative AI assistant, equipped witha camera to accept images as potential inputs is an interesting, and potentially controversial proposition.Cameras are seen as an invasion of other people’s privacy, although we already have examples of glasses that incorporate them, such as Snap or Meta’s, not to mention the Google Glass and the glassholes who wore them. But the ear, in principle, is reasonably discreet, and hypothetically, with the right microphones, you could almost “whisper” to your assistant and receive a discreet response through bone conduction without interrupting what you are listening to, and without disturbing those around you.

Such a device would be a revolution: how many functions currently entrusted to the smartphone could be efficiently transferred to a sound interface: reading us messages, considering that the assistant is “smart” enough to know what to read and what not to read? How about instant messaging, giving us directions, reading us articles and news based on our criteria and interests, or even taking photos? It could create a custom-made ecosystem or platform for apps? How many features could we pack into such a device that would free us from our dependence on the smartphone, without imposing too many restrictions on movement? Does such a device necessarily have to be something connected to a smartphone, or could it be autonomous?

Of course, all this is merely speculation, joining up the dots: Altman, Ives and Son could be going in a completely different direction. But here and now, and at the level of maturity of the technology we are in, the possibility of an ear-worn, voice-activated generative assistant seems to me to have huge potential, even taking into account potential issues such as adding facial recognition or many people’s lack of critical judgment.

We may be on the cusp of a reinvention of the dominant interface of recent decades. Whether or not my speculations are correct, this is an interesting project worth pursuing.

This post was previously published on Enrique Dans’ blog.

