Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesBotsEarnCopy
Cerebras challenges Nvidia by launching an AI inference service

Cerebras challenges Nvidia by launching an AI inference service

CryptopolitanCryptopolitan2024/08/26 16:00
By:By Aamir Sheikh

Share link:In this post: Cerebras, an innovative chip maker, has introduced its own AI inference service. The company will use its latest Wafer Scale Engine chips, which are faster than traditional GPUs. Crebras is offering the service at a much more affordable price of 10 cents per million tokens.

Cerebras Systems announced an AI inference solution for developers on Tuesday. According to the company claims, it is a much faster inference solution, 20 times faster than Nvidia’s offerings.

Cerebras will provide access to its larger chips to run AI applications, which, according to the company, are also cheaper than Nvidia GPUs. The industry-standard Nvidia GPUs are often accessed through cloud service providers to run large language models such as ChatGPT. Getting access is usually not easy for many small firms and is expensive. 

Cerebras claims its new chips can deliver performance that is beyond GPUs

AI inference is the process of operating an already trained AI model to get an output, such as answers from chatbots and solving different tasks. Inference services are the backbone of today’s AI applications, as they rely on them for day-to-day operations to facilitate users.

Cerebras said inference is the fastest-growing segment of the AI industry as it accounts for 40% of all AI-related workloads in cloud computing. Cerebras CEO Andrew Feldman said the company’s outsized chips deliver more performance than a GPU. GPUs cannot achieve this level, he said. Feldman was talking to Reuters in an interview.

He added,

“We’re doing it at the highest accuracy, and we’re offering it at the lowest price.” Source: Reuters .

The CEO said that the existing AI inference services are not satisfactory for all customers. He told a separate group of reporters in San Francisco that the company is “seeing all sorts of interest” in faster and cost-effective solutions. 

See also Trump responds to AI-generated "Swifties for Trump" images

Until now, Nvidia has dominated the AI computing market with its gold-standard chips and Compute Unified Device Architecture (CUDA) programming environment. This has helped Nvidia to lock developers within its ecosystem by providing a vast array of tools.

Cerbras chips have 7000 times more memory than Nvidia H100 GPUs

Cerebras said its high-speed inference service is a turning point for the AI industry. The company’s new chips, which are as big as dinner plates, are called Wafer Scale Engines. They can process 1000 tokens per second, which the company said is comparable to the introduction of broadband internet.

According to the company, the new chips deliver different amounts of output for various AI models. For Llama 3.1 8B, the new chips can process as many as 1800 tokens per second, while for Llama 3.1 70B, it can process 450 tokens per second.

Cerebras is offering inference services at 10 cents per one million tokens, which is less than the ones based on GPUs. Usually, alternate approaches compromise accuracy for performance, according to industry beliefs, while the Cerebras new chips are capable of maintaining accuracy, according to company claims.

Cerebras said it will offer AI inference products in different forms. The company plans to introduce an inference service via its cloud and a developer key. The firm will also sell the new chips to data center customers and those who want to operate their own systems.

See also Meta and Spotify criticize EU privacy regulations for limiting open-source AI innovation

The new Wafer Scale Engine chips have their own integrated cooling and power delivery modules and come as a part of a Cerebras data center system called the CS-3. According to different reports , the Cerebras CS-3 system is the backbone of the company’s inference service. 

The system boasts 7000 times more memory capacity than Nvidia H100 GPUs. This also solves the fundamental problem of memory bandwidth, which many chipmakers are trying to address.

Cerbras is also working on becoming a publicly traded company. To do this, it has filed a confidential prospectus with the Securities and Exchange Commission (SEC) this month.

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.
APR up to 10%. Always on, always get airdrop.
Lock now!

You may also like

Crypto’s Coolest Climber? Troller Cat Stirs Whitelist Buzz as Non-Playable Coin and Osaka Protocol Stretch

Troller Cat charms presale hunters as Osaka Protocol and Non-Playable Coin rally. Discover which meme coin could pounce to the top next.Troller Cat: Meme Mischief Meets Deflationary MasterplanOsaka Protocol: A Cultural Play with Serious MomentumNon-Playable Coin: Turning Side Characters into StarsConclusion: The Meme Coin Race Is Heating Up

Coinomedia2025/04/22 15:33
Crypto’s Coolest Climber? Troller Cat Stirs Whitelist Buzz as Non-Playable Coin and Osaka Protocol Stretch

Bitcoin Surges Past $87.6K Despite Market Tariff Turmoil

Bitcoin hits $87.6K, its highest since April, defying declines in traditional assets post-Trump tariffs.Bitcoin Rallies to $87.6K Amid Economic UncertaintyTrump’s Tariffs Spark Flight to CryptoWhat This Means for Bitcoin Investors

Coinomedia2025/04/22 15:33
Bitcoin Surges Past $87.6K Despite Market Tariff Turmoil

Raydium LaunchLab Generates 3,760 Tokens in Days

Raydium’s LaunchLab sees 3,760 tokens created since April 16, but only 1.12% move beyond the launch phase.Raydium’s LaunchLab Sparks Token Boom — But Few ThriveA Glimpse Into Web3’s Open InnovationWhat It Means for the Solana and DeFi Community

Coinomedia2025/04/22 15:33
Raydium LaunchLab Generates 3,760 Tokens in Days