21 - 25 April 2024
National Harbor, Maryland, US
Conference 13058 > Paper 13058-14
Paper 13058-14

Latency-aware service placement for GenAI at the edge

On demand | Presented live 22 April 2024

Abstract

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) and Generative AI (GenAI) have emerged as front-runners in shaping the next generation of intelligent applications, where human-like data generation is necessary. While their capabilities have shown transformative potential in centralized computing environments, there is a growing shift towards decentralized edge AI models, where computations are orchestrated closer to data sources to provide immediate insights, faster response times, and localized intelligence without the overhead of cloud communication. For latency-critical applications like autonomous vehicle driving, GenAI at the edge is vital, allowing vehicles to instantly generate and adapt driving strategies based on ever-changing road conditions and traffic patterns. In this paper, we propose a latency-aware service placement approach, designed for the seamless deployment of GenAI services on these cloudlets. We represent GenAI as a Direct Acyclic Graph, where GenAI operations represent the nodes and the dependencies between these operations represent the edges. We propose an Ant Colony Optimization approach that guides the placement of GenAI services at the edge based on capabilities of cloudlets and network conditions. Through experimental validation, we achieve notable GenAI performance at the edge with lower latency and efficient resource utilization. This advancement is expected to revolutionize and innovate in the field of GenAI, paving the way for more efficient and transformative applications at the edge.

Presenter

Bipul Thapa
Univ. of Delaware (United States)
Bipul Bikram Thapa received the degree of Bachelor of Engineering in the field of Computer Engineering from Kathmandu University, Nepal, in 2019. With a strong foundation in computer engineering, he subsequently embarked on valuable industry experience that led him to the distinguished position of Senior Software Engineer at Leapfrog Technology Inc., Nepal. He is currently working towards a PhD degree at the Department of Computer and Information Sciences, University of Delaware, Newark, Delaware. His research interests include edge computing, cloud computing, distributed systems, and the Internet of Things.
Application tracks: AI/ML
Presenter/Author
Bipul Thapa
Univ. of Delaware (United States)
Author
Lena Mashayekhy
Univ. of Delaware (United States)