Skip to main content

It's Time We Prefer Generative AI on Edge Devices

GenI Nerds

Since generative AI is going to be around I’d prefer that it ran “On Device”. It should be more private, faster and would consume less cost via GPU and actual power vs. relying on everything running in data centers. So let’s pause the SHOULD-it-be-around-thing and debate HOW it should be run. Of course, we are seeing GenAI on phones but I would expand this out to edge devices that include IoT devices like single board computers, puck appliances, things that run in non-cloud server racks and consumer “things”. My argument:

More Private 1. More Private

Running GenAI on edge devices means the processing occurs locally, which dramatically increases privacy. Sensitive data does not need to be transmitted over the Internet to remote servers, reducing the risk of data breaches and unauthorized access. Since there are routine security problems and merging of multi-tenant data in cloud solutions all the time, I think this one is obvious. But also consider that we can redact edge data for the most sensitive of human data: our voice, our face and our actions.

More Responsiveness 2. More Responsiveness

Edge devices have come a long way and a fleet of new GenAI capable chips are coming in. Inferencing on the edge eliminates the transfer of rather fat streaming data to cloud-based systems. This responsiveness is crucial for applications requiring immediate feedback, such as real-time language translation, autonomous driving, and interactive user interfaces.

Total Cost of All-Cloud Based Solutions 3. Total Cost of All-Cloud Based Solutions

Deploying GenAI on edge devices can be more cost-effective in the total picture. It distributes the workload across edge devices and what will assuredly also be done in the cloud. Cloud GPU costs are only getting more expensive and until vast competition with NVIDIA happens the cost will remain high until further notice. Additionally, the decreasing price points of advanced hardware make it increasingly feasible to run sophisticated AI models locally. Just this week I saw a demo of real-time speech to text + large language model + text to speech on a sub $300 computer.

Reliability and Availability 4. Reliability and Availability

Edge devices can operate independently of Internet connectivity, ensuring continuous functionality even in environments with limited or unreliable network access. This reliability is essential for critical applications in remote locations or in scenarios where consistent connectivity cannot be guaranteed.

Final Thoughts

I’m not saying that generative AI should or would ONLY run on edge devices. I’m saying that we are at an inflection point where privacy, responsiveness and total cost of execution will push systems developers to embed generative AI workloads into edge devices. By having “front-end” tasks like speech to text, text to speech and vector-style queries of local documents (aka less intensive tasks) run on device we can solve these crucial problems.

I am a serial product creator and have started 7 technology companies to date. I go where I’m most inspired and of late that’s all about generative AI. Recently I created GenAI Nerds, a community of now 500K+ people that research and discuss the latest topics.

Comments