Etched is one of NVIDIA's competitors in the artificial intelligence processor market. The startup offers a different approach to creating them, similar to the production of ASICs for mining - specializing in a specific type of generative AI, namely transformers. The chips will not work with other models, but with specific ones, they will have orders of magnitude higher performance. The presented Sohu processors work with Llama 70B and are capable of processing 500,000 tokens per second. A server with 8x Sohu chips is capable of replacing 160 NVIDIA H100 processors.
Sohu is the first specialized chip for transformer models, claims Etched. Having much higher performance with them than any existing universal solutions, Sohu cannot run CNN, LSTM, SSM, or any other AI models. It is manufactured on a 4nm TSMC process.
The company says that currently every major AI product on the market (ChatGPT, Claude, Gemini, Sora) is based on transformers, and allegedly in a few years, every major AI model will run on specialized chips. This process in Etched is considered inevitable.
The Sohu processor is claimed to be more than 10 times faster and cheaper than NVIDIA's new generation Blackwell (B200) chips. One Sohu server processes Llama 70B tokens 20 times faster than an H100 server (23,000 tokens/s) and 10 times more than a B200 server (~45,000 tokens/s). The benchmarks were obtained when operating in FP8 without sparsity at 8x model parallelism with input length of 2048/output length of 128. The 8xH100 metrics were obtained with TensorRT-LLM 0.10.08 (latest version), and the 8xB200 metrics are approximate. "This is the same benchmark used by NVIDIA and AMD," say Etched.
Criticizing the universal architecture of graphic processors, Etched notes that they do not get better, they just get bigger. Over the past four years, their computational density (TFLOPS/mm²) has improved by only about 15%. The next generation graphic processors (NVIDIA B200, AMD MI300X, Intel Gaudi 3, AWS Trainium2, etc.) use two chips as one to "double" their performance. According to the startup, with Moore's Law slowing down, the only way to improve performance is specialization.
The economic justification for creating specialized chips is based on their relatively low cost compared to the costs of training and operating AI. Today, artificial intelligence models use over $1 billion for training and tens of billions during operation. At this scale, a 1% improvement would justify $50-100 million for a chip project. ASICs are 10-100 times faster than graphic processors.
“When [specialized] Bitcoin miners entered the market in 2014, it became cheaper to throw away graphic processors than to use them for mining Bitcoin. Billions of dollars are at stake, the same thing is happening with AI ... The architecture that works faster and cheaper on hardware wins.”
As the performance of models scales from $1 billion to $100 billion, the risk of testing a new architecture increases rapidly. Etched believes that efforts should be focused on improving the efficiency of transformers rather than simple scaling.
“Once Sohu (and other ASICs) hit the market, we will reach the point of no return. Transformer killers will have to work faster on graphic processors than transformers on Sohu. If this happens, we will create an ASIC for that!”
Etched, a company that has only existed for two years, was founded by Harvard alumni Gavin Ubereti (OctoML and Xnor.ai) and Chris Chu, who together with Robert Vahen and former technical director of Cypress Semiconductor Mark Ross aimed to create a chip that would do only one thing: run AI models.
Many startups and tech giants are developing chips that work exclusively with artificial intelligence models. Meta has MTIA, Amazon has Graviton and Inferentia, etc. But Etched microchips are unique in that they only work with one type of model - transformers.
“In 2022, we predicted that transformers would take over the world. Now we have reached the point in the evolution of artificial intelligence where specialized microchips that can work better than general-purpose graphic processors are inevitable - and the people making technical decisions in the world know this,” says Ubereti, CEO of Etched.
How does Sohu achieve the performance mentioned? In several ways, but the most obvious are simplified hardware-software pipelines. Since Sohu does not work with non-transformer models, the Etched team can dispense with hardware components that are not related to them, as with software.
“In short, our future customers will not be able to afford not to switch to Sohu. Companies are ready to bet on Etched because speed and cost are important for the AI products they are trying to create,” says Ubereti.
For now, Etched has no competitors that have gone this far, but the competition is already beginning. If more efficient technologies emerge or other AI models become trendy, the company says it will simply develop a new chip.
Sources: Etched, TechCrunch
Comments (0)
There are no comments for now