Topic for Discussion Sessions
High Performance Computing for Generative Artificial Intelligence and Machine Learning
In the past few years, Artificial Intelligence (AI) has been seen as quite useful for day-to-day operations in many different domains, and its use has significantly grown, with the advent of Internet of Things (IoT) and drone technologies. In past ScalPerf meetings, we have discussed the nature of the AI problems at the time.
More recently, the advancement in Generative AI, particularly Large Language Models (LLM), has lowered the barrier to its productive use by many professionals in their practices. The public has been deeply fascinated by the applications like ChatGPT. Also the scientific community (as of June 2024) has referred over 123,000 times to the original ArXiv paper introducing the Transformer. Applications are helping practitioners in many apparently disjointed disciplines like Law and computer programming.
The success of LLMs has come from the synergetic synthesis of advancements in many areas of computer science, including Machine Learning, Computer Architecture, particularly GPUs, Memory Subsystems, Linear Algebra, Stochastic Optimization, Symbolic Processing, Dynamic programming languages, etc. In parallel with these advances, LLMs and machine learning are being used to optimize computer systems and to automate tasks such as program generation from natural language descriptions. Many of these areas have been the core of ScalPerf talks and discussions for two decades.
In the 2024 meeting, we propose focusing on these technologies, with particular attention to efficient execution. Just like in scientific and technical HPC, the amount of compute power needed in these applications is vast, hence a small improvement in efficiency yields significant net benefits. Of course, in addition to HPC-like compute power, major advancements in information management such as vector data bases and closer interaction with system architecture are essential. Thus, for ScalPerf’24, we propose the following indicative, certainly not exhaustive, list of topics:
Computer and Memory Architectures, including GPUs for primary training, secondary training, and inference.
Value of Large dimensional word/token encoding (e.g. Word2Vec)
Study of the Performance of the applications like the LLM models.
Potential execution improvement for Linear Algebra.
High performance runtime for relevant linear algebra and special functions.
Run time to manage information provenance.
Novel attempts (like Kolmogorov-Arnold Networks replacing MLPs) to reduce the required computational load.
Automatic program generation from natural language descriptions
System optimization using machine learning techniques such as reinforcement learning