Since 2019, Microsoft has been diligently working on a cutting-edge artificial intelligence (AI) chip, known internally as Athena, as reported by The Information today. The company is considering the possibility of making Athena accessible for internal use and sharing it with OpenAI as soon as next year.
While industry experts believe that Nvidia might not feel threatened by these developments, they do highlight the growing imperative for hyper-scale companies to venture into creating their own bespoke silicon solutions.
AI chip advancement in reaction to a GPU shortage
The chip, akin to those crafted in-house by tech giants such as Google (TPU) and Amazon (Trainium and Inferentia processor architectures), is purpose-built for the demanding task of handling the training of large language models (LLMs). This is imperative because the expansion of advanced generative AI models is outpacing the computational capabilities required for their training, according to insights shared by Gartner analyst Chirag Dekate in correspondence with VentureBeat.
Nvidia holds an overwhelmingly dominant position in the AI chip market, boasting an estimated 88% market share, as reported by John Peddie Research. The intense competition to secure access to their premium A100 and H100 GPUs, each commanding a price tag in the tens of thousands of dollars, has given rise to what can be characterized as a GPU crisis.
Dekate elaborated, stating, “Cutting-edge generative AI models now demand hundreds of billions of parameters, necessitating exascale computational prowess. With next-generation models surpassing trillions of parameters, it comes as no surprise that leading technology innovators are exploring a diverse range of computational accelerators to expedite training while concurrently reducing the associated time and costs.”
As Microsoft endeavors to expedite its generative AI strategy while trimming expenses, it is rational for the company to forge a distinctive custom AI accelerator strategy. Such an approach, Dekate suggested, “could enable them to deliver disruptive economies of scale that go beyond the confines of traditional, standardized technology methodologies.”
Tailored AI processors cater to the demand for rapid inference.
The importance of acceleration extends to AI chips supporting machine learning inference. In this context, inference involves condensing a model into a set of weights that utilizes real-time data to generate actionable results. For instance, Compute infrastructure is utilized for inference whenever ChatGPT generates responses to natural language inputs.
Nvidia is renowned for producing potent, versatile AI chips and provides its parallel computing platform, CUDA, and its derivatives for ML training purposes, as noted by analyst Jack Gold from J Gold Associates in an email to VentureBeat. However, it’s worth mentioning that inference typically demands less computational performance. Hyperscale companies perceive an opportunity to cater to their customers’ inference needs by developing customized silicon solutions.
Jack Gold emphasized that inference is poised to become a significantly larger market than machine learning, underscoring the importance for all vendors to offer products tailored to this domain.
Microsoft’s Athena poses a limited challenge to Nvidia.
Gold stated that he doesn’t view Microsoft’s Athena as a significant threat to Nvidia’s dominant position in the field of AI/ML. Nvidia has held sway in this domain since its pivotal role in the deep learning revolution a decade ago. They’ve built a robust platform strategy and a software-centric approach, which has driven their stock value up during the era of GPU-centric generative AI.
He emphasized the importance of hyperscale companies like Microsoft developing their own AI chips tailored to their unique architectures and optimized algorithms (non-CUDA specific) as needs diversify and expand. This isn’t just about managing cloud operating costs but also offering cost-effective alternatives to a wide range of customers who may not require or prefer Nvidia’s higher-priced option. Gold anticipates that all hyperscale companies will continue to invest in developing their own silicon, not just to compete with Nvidia but also with Intel in the broader realm of general-purpose cloud computing.
Dekate also underscored that Nvidia remains a driving force in the world of extreme-scale generative AI development and engineering, and he expects Nvidia to further enhance its leadership in innovative technology, creating competitive distinctions as custom AI ASICs become more prevalent.
However, he pointed out that the last phase of Moore’s law will depend on heterogeneous acceleration, which involves GPUs and application-specific custom chips. This shift has significant implications for the broader semiconductor industry, particularly for technology providers that have yet to meaningfully address the evolving needs of the rapidly advancing AI market.