Computer architecture is all about balance. Nail it, and you’ve got yourself a rock-solid, one-of-a-kind platform with some seriously cool features and a lot of developers. Maybe even a $1T cap. But if you trip up even a little, you could end up wasting years of work into a pricey silicon paperweight. That’s why we need codesign.
Over the past six years, I’ve worn different hats as a hardware microarchitect, architect, and compiler lead. I’ve been in the trenches at two wildly different companies building datacenter AI accelerators: a big tech firm (Google) and a startup (SambaNova). And I’ve worked on two equally diverse AI accelerator architectures: Tensor Processing Units (TPUs) at Google’s CI2 team and Reconfigurable Dataflow Units (RDUs) at SambaNova.
This journey has hammered home my belief in the power of codesign and the need for a documentation-first mindset. It doesn’t matter if you’re in a startup hustling in a garage or a multinational corporation - agile codesign and documentation go hand-in-hand. Doing it right can seriously up the ante of your tech stack; getting it wrong will just be another way for your DSA to fail in the market.
Traditional hardware is built using the good old waterfall development method. Your benchmarks are known up front and change slowly; your ISA is stable and has decades of legacy behind it; and you have baseline performance, power, and cost numbers from your existing chip and those of your competitors. Architects plan the feature set to take advantage of new semiconductor technologies and apply new microarchitecture tricks for their target workloads.
With a competent chip team, everything hopefully flows smoothly downstream from there, and eventually you tape out. The software folks continue to fine-tune their mature compiler and runtime stack, relying upon a stable hardware roadmap, an extensive community, and plentiful reference code. The chip gets brought up, bugs are worked out, and things are scaled into production. If you got 10%-20% better performance or battery life than your competitors, congratulations, you’ve got a winner!
But let’s face it, in the world of Domain-Specific Architectures (DSAs) that need to keep up with rapid evolution in our workloads (particularly for AI) the waterfall method just doesn’t work very well anymore. New applications emerge weekly, every vendor has their own proprietary stack, and Dennard scaling is dead. You have to get clever about what applications to accelerate and be willing to make hard choices to get to market quickly.
In December 2023, do you really think that you can accurately predict today how AI will look in 2027, and that you can build the best accelerator for it with the waterfall method? Look how much progress has happened in the past year alone since the release of ChatGPT! The old rules just don’t apply, and we need to come up with some new ones. We need to be able to iterate our stacks a lot more quickly and fail quickly when we’re wrong.
That’s where good codesign methods and tooling comes into play. DSAs are not just about building a flawless piece of hardware that packs a lot of TFLOPS and GB/s — lots of people in the industry know how to do that. You’ve also got to make it as easy as pie for your compiler and runtime engineers to build a top-notch stack — and quickly — because they are supporting applications that are in constant motion. And you’ve got to tape out faster than ever. If you do this, you’ll have a chip that is easy to program and you’ll get to market faster with a higher quality stack than your competition.
In the next few posts, I’ll be describing how to drive better codesign using a documentation-as-code methodology.