Tesla’s Dojo Microarchitecture

231 points by zdw 3 years ago

TomVDB 3 years ago

With no virtual memory, no caches, and interface processors instead of direct access to external DRAM, this thing must be a programming nightmare?

Having tons of small CPUs with fast local SRAM is of course not a new idea. Back in 1998, I talked to a startup that believed it could replace standard cell ASIC design with tiny CPUs that had custom instruction sets. (I didn't believe it could: it's extremely area inefficient and way to power hungry for that kind of application. The startup went nowhere.) And the IBM Cell is indeed an obvious inspiration.

But AFAIK, the IBM Cell was hard to program. I've seen PS3 presentations where it was primarily used as a software defined GPU, because it was just too difficult to use as a general purpose processor.

Now NOT being a general purpose process is the whole point of Dojo, so maybe they can make it work. But from my limited experience with CUDA, virtual memory and direct access to DRAM is a major plus, even if the high performance compute routines make intensive use of shared memory. The fact that an interface processor is involved (how?) in managing your local SRAM must make synchronization much more complex than with CUDA, where everything is handled by the same SM that manages the calculations: your warp issues a load, it waits on a barrier, the calculations happens, sometimes in a side unit in which case you again wait on a barrier, you offload the data and wait on a barrier. And while one warp waits on a barrier, another warp can take over. It's pretty straightforward.

The Dojo model suggests that "wait on a barrier" becomes "wait on the interface processor".

modeless 3 years ago

If it only ever runs one program, and that program is an implementation of vanilla Transformers, that might be all it needs to be useful. Sufficiently large Transformers can do an incredible variety of tasks. If someone invents something better than vanilla Transformers, then they can write a second program for that.
ThrownAllTheWay 3 years ago

Also investing in a branch predictor when the intended workload doesn't seem at all scalar is a confusing choice to me. Also the 362 F16 TFLOPs sounds super impressive, except the memory bandwidth is I think 800 GB/s (or is it 5 times that? Or effectively less than that if data has to be passed along multiple hops? I'm a bit confused), which means having to do 1000 ops (or 200? or more?) on each 16 bit value loaded in. Maybe you could do that, but it feels like you'd probably end up bandwidth bound most of the time.
- the8472 3 years ago
  
  My understanding is they load in weights occasionally into sram and then pump in training data on the sides of the die and have multiple cores operate on a wavefront of data. So the cores don't compete for host memory bandwidth because the same data flows (transformed) through multiple cores.
  
  ThrownAllTheWay 3 years ago
rowanG077 3 years ago

You are right that this won't work well with any language that assumes a "normal" processor. But a small language that is written for it could be fine.
tibbydudeza 3 years ago

From my understanding the CELL was meant to be the GPU for the PS3 but Sony instead found the same issues and could not program a reasonable performing SDK using it within the time limits (MS Xbox 360) and added in a Nvidia RSX GPU.
Another oddball architecture that went nowhere.
- ethbr0 3 years ago
  
  > could not program a reasonable performing SDK using it within the time limits
  It feels like "within the time limits" has always been the problem of difficult to program for software-dependant architectures: time vs competitors.
  E.g. in the time it takes to write an intelligent compiler (IA-64), your better-resourced (because they're getting revenue from the current market) competitor has surpassed your performance via brute evolution force.
  There are use cases out there (early supercomputing, NVIDIA) where radical, development-heavy architectures have been successful, but they generally lack of competitor (the former) or iterate ruthlessly themselves (the latter).
  
  tibbydudeza 3 years ago
  
  "radical, development-heavy architectures" = niche use case
  Connection machine = only had one customer afaik (NSA)
  Transmeta - interesting technology but nobody in that market wanted to run anything besides Windows+x86.
rwallace 3 years ago

Sounds to me like a programming dream. The usual way of things these days is 'don't waste your employer's time trying to optimize; everything that can profitably be done has already been done by other people; you just have to accept that particular part of your skill set is useless'. Dojo would let you actually use a lot more of your skills.
pyinstallwoes 3 years ago

What programming when it's a model being run?

buildbot 3 years ago

It is incredibly odd tesla decided to invest in this level of engineering, even perfect hardware is like, 1/5th the battle. Now you get to stand up the entire software stack, make it performant, and hope you can compete with whatever Nvidia has in the pipeline.

torginus 3 years ago

Let me play the devil's advocate - when you have an $1T company, investing a couple million into something like this signals to investors that you have big plans for the future and have a leg up on the competition in the self-driving space.
Even if that increases the stock price by 1%, this investment pays for itself a hundred times over.
- lclarkmichalek 3 years ago
  
  Setting up a hardware platform of this kind that is useable in production costs significantly more than a couple of million
  
  ricardobeat 3 years ago
  
  Still a blip for a company that was toying with $1B in Bitcoin.
- baq 3 years ago
  
  this is more like couple hundred million TBH
  
  dragontamer 3 years ago
  
  Hundred million for the chip alone, a few hundred million for software (compiler, assembler, router, protocols, communications, load balancing, server, supercomputer utilization, power management).
  The $100-million chip might very well be the easy part of this project...
  
  sinenomine 3 years ago
  
  As a software person it baffles me how we tend to overestimate our value compared to hardware. The chip design field is really demanding, especially with deep submicron processes and with rigorous testing & verification culture that blows out of the water almost everything software organizations do.
  I doubt developing a compiler for this chip will be as costly as your estimate, especially given mature opensource compiler frameworks like LLVM (already adapted to similar TPU-like architectures) and upcoming pytorch glow, apache TVM ML-specific compiler frameworks. And especially given in our industry there are people with advanced degrees who are willing to jump through additional hoops and even accept pay cuts, just to work on a cool compiler.
  I respect the hardware for what it is and what it could enable.
jillesvangurp 3 years ago

I think they developed this during the years where all the nvidia hardware was gobbled up by crypto miners. The resulting pricing probably did not help either.
Tesla is big enough that they can afford to experiment a little. This sounds like an experiment that actually worked. Once they had it working, doing more of it and iterating on it is just business as usual for them.
- cyber_kinetist 3 years ago
  
  > This sounds like an experiment that actually worked.
  Are you sure? We haven’t even seen the chip in any physical form yet. We haven’t seen it running any code.
  
  seunosewa 3 years ago
  
  The self-driving AI they use this chip for appears to be less than impressive from my perspective. Its mistakes seem to come up in the news quite frequently.
  
  johnsimer 3 years ago
  
  Dojo’s D1 chip for training AFAIK is not being used yet. Tesla still trains with their ~7000 cluster of A100s
  And the D1 is for their training machine anyways. Not their inference chip in the cars
  
  jillesvangurp 3 years ago
  
  Are you aware of a better implementation though? It might not be perfect; it's still miles ahead of competing systems. Yes, it makes mistakes. But that's a rather poor argument to dismiss the whole strategy and technology.
  From my point of view, Tesla is converging on good enough more rapidly than anyone else in the industry. Courtesy of their AI strategy and technology; including the in house developed chips.
  
  friendzis 3 years ago
  
  > Tesla is converging on good enough
  This is a big problem, though. Any ADAS is a safety critical system and while "good enough" can help push volumes, it is extremely bad from safety perspective. If you really want to read on this u/adamjosephcook/ has some awesome writing going into far deeper detail than I could explaining this particular issue.
  
  mavhc 3 years ago
  
  In terms of most lives saved, I'd bet Tesla's AI is winning, all those people they automatically stopped from driving into rivers etc
SuperscalarMeme 3 years ago

Not to mention that in-house silicon is all about economies of scale, this is even more of a puzzling move
- kjksf 3 years ago
  
  Tesla has a path to economies of scale: they already announced that if Dojo works as expected they'll make it available to others as an AWS-style service.
  Which is brilliant: they might end up making money on this.
  AI is clearly here to stay. The demand for AI training will clearly explode in the future.
  Running training in-house is not easy or cheap. You don't just plug in 1000 NVIDIA GPUs. You need massive up-front payment for GPUs and you're basically running your own extremely energy hungry datacenter .
  Tesla might built and operate massive datacenters. They'll use as much as they need for internal needs and sell the remaining capacity to others.
  This might take 5 years but the path to do it is clear.
  
  scoopertrooper 3 years ago
  
  I don’t see how they’re going to commercialise this as a cloud compute service.
  For one, they’ve built a chip that operates in a fundamentally different way to other chips. So any other company that wanted to use it would have to invest a considerable amount of resources in building up the institutional knowledge to use it effectively.
  Additionally, the lack of virtual memory and multi-tasking support renders it pretty much impossible to divide up compute between multiple customers. So, commercialising this would require customers renting out the whole unit, which is contrary to how cloud computing usually works.
  Are there companies out there that have the capital and use cases necessary to fit into Dojo Cloud? Maybe, though not one I’ve worked for. Would they trust the stable genius currently heading up Tesla enough to make such an investment? Perhaps, but I wouldn’t, but what do I know?
  
  avianlyric 3 years ago
  
  > Additionally, the lack of virtual memory and multi-tasking support renders it pretty much impossible to divide up compute between multiple customers. So, commercialising this would require customers renting out the whole unit, which is contrary to how cloud computing usually works.
  Only if you want to subdivide the compute on each dojo chip. You can still provide multi-tenant, support by allocating entire dojo chips to a single customer at a time. Even traditional time division multi-tasking is possible as long as you’re happy to accept multi-second long time slices. Then the overhead of clearing an entire dojo chip (or batch of chips), and setting up a new application, isn’t too high.
  If you’re doing AI workloads, then none of the above are an issue. Training a large net takes days to weeks of continuous, single task computation. So selling dojo access in whole 1 hour blocks is a perfectly reasonable thing to do.
  
  kjksf 3 years ago
  
  Part of the plan is to have PyTorch compatibility.
  Dojo has it's own IR but they also have a PyTorch to Dojo compiler.
  People's opinion of Musk wont matter: either Dojo will be a capable service at a good price or it won't.
  People will use it based on merits.
  
  croes 3 years ago
  
  Reliability is an important factor here, and I don't mean technology. Things don't look so good for everything that has to do with Musk. Today like this, tomorrow like that
  
  rcMgD2BwE72F 3 years ago
  
  >Things don't look so good for everything that has to do with Musk. Today like this, tomorrow like that
  Such as? Except for FSD, his record is unmatched AFAIK when you take into account the novelty / complexity / difficulty.
  One example, and certainly his main achievement: he said Tesla would sell and produce half a million cars by 2020, back in early 2014, and they hit that number with a 93.6% precision. https://youtu.be/BwUuo6e10AM?t=156
  
  michaelt 3 years ago
  
  Some of Musk's stuff is great - other stuff isn't.
  SpaceX? Great. Starlink? Sounds neat. Tesla? Pioneered electric cars with respectable performance and range.
  But on the other hand, where's the hyperloop? Where's the affordable tunnelling? Where's the $35k Tesla - not available for order on the website, that's for sure. Where's the miniature submarine for rescuing children trapped in caves? Why has my buddy in Europe been waiting over a year for his powerwall to be delivered? Why are these norwegian tesla owners on hunger strike? Where's the full self driving, with taxi service? Why on earth would anyone want to buy Twitter?
  Makes it very difficult to know which of Musk's statements are just spitballing, which are unrealistic timescale guesses and which can be relied on.
  Getting any serious project architecturally 'locked in' to a special type of CPU you can only get from Tesla would be a bold move.
  
  croes 3 years ago
  
  It's simple, SpaceX, Tesla and Starlink are evolutions of existing technology.
  FSD, Hyperloop and such would be revolution like out of a Sci-Fi movie. They fail all because Musk like would like to have those things but in reality these things are much more complicated as he says.
  
  croes 3 years ago
  
  How is the Las Vegas tunnel going? Or the brain implant?
  He is good at marketing and developing existing technology.
  But of his announced revolutions, none works.
  
  baq 3 years ago
  
  yes, you still write checks to pay online and rocket boosters to this day are single use $100M pieces of hardware that we throw away into the ocean after each use.
  seriously. musk is shady and weird, more so in the last couple of years, but come on.
  
  croes 3 years ago
  
  It's about the perception of Musk. He built some successful companies but he is not Tony Stark as people tend him to see. He is a salesman not an genius inventor.
  Don't expect FSD in the near future and don't expect a Mars colony.
  
  baq 3 years ago
  
  FSD, especially the way he was describing it, was a blunder. the Mars city is a pipe dream... but I absolutely understand why it makes people follow him - he's the only one who set out to build a private space company with the purpose to actually get to Mars, with the side effect of completely uprooting the space launch industry. the achievements are undeniable, but it's the vision that makes the perception be so surprisingly good still. there's literally nobody else who says things like that and has the means to even try.
  
  alpaca128 3 years ago
  
  Reliability? Name one major cloud service where you can count on not getting randomly banned overnight. And yet people still use them.
  
  croes 3 years ago
  
  Like I wrote, it's not about technology but it's chief
  Next week he tweets he will take the service offline to buy AWS and then calls it off, that kind of reliability
  
  rbanffy 3 years ago
  
  > I don’t see how they’re going to commercialise this as a cloud compute service.
  The simples most obvious would be “give me your datasets and we’ll train your model”
  
  justapassenger 3 years ago
  
  I assume you never actually built any cloud infrastructure yourself. Plus Tesla (aka) Elon, well, say a lot of stuff, not always necessarily correct.
  Internal research product is super far from any actual production usage. Especially if you go against some established paradigms, that require enormous amount of effort (more than developing silicon) to build tolling around, so people can design, program, debug, monitor it.
  But that’s internal usage. Cloud is a totally different ballgame. You have to deal with thousands more requirements (and you cannot generally tell customer to do something else instead, as you can with internal teams). And customers that have operating procedures totally different from yours, 0 access to your internal knowledge and infinite less tolerance for BS answers (as you are paying customer, not a someone on the same boat).
  Building cloud is extremely hard, and there’s a reason why Google is still losing money on it.
  Plus, let’s even say that your 5 year estimate is correct, Dojo is amazing and the future of tech and they may have viable product by then. Do you think that Nvidia wont advance their AI offering by then? Google TPU will stop being developed? Or will Tesla continue investing to churn new generation of Dojo every year?
  
  rbanffy 3 years ago
  
  > You have to deal with thousands more requirements (and you cannot generally tell customer to do something else instead, as you can with internal teams).
  You can. AWS started with S3 when everyone was using databases. As long as it’s cheaper than its competition, single use-case (you won’t serve a website on these) has a market.
  
  arinlen 3 years ago
  
  > You can. AWS started with S3 when everyone was using databases.
  AWS staryed when there was no competitor.
  Google started with a ton of world-class expertise when AWS was up and running and while operating already a colossal network of server farms using special-purpose, which Tesla has none of which, and after all these years barely got a 10% market share.
  
  rbanffy 3 years ago
  
  What they want is a training engine that is cheaper than whatever AWS or Google (or anyone else) can offer. If I can point my PyTorch to it instead of an AWS GPU for less money, why not?
  
  arinlen 3 years ago
  
  > What they want is a training engine that is cheaper than whatever AWS or Google (or anyone else) can offer.
  Bold assumption, considering Tesla's hardware does not exist, the market is limited and Google has already years of providing machine learning services with special purpose hardware.
  
  robertlagrant 3 years ago
  
  What doesn't exist?
  
  tsimionescu 3 years ago
  
  Their hardware. They have, at best, 1 supercomputer (though it's not actually clear if they have more than some Dojo prototypes to me). That does not a cloud make.
  
  robertlagrant 3 years ago
  
  Ah yes, I see that now. But assuming they make the computer, they could also lease it to one or more cloud providers as a service. They don't necessarily have to build the whole thing.
  
  rbanffy 3 years ago
  
  Kind of what Cray/HPE does with Microsoft's Azure - you could get your very own Cray to run your workloads.
  Sadly, not a very interesting one running UNICOS or NOS/VE...
  
  arinlen 3 years ago
  
  > Tesla has a path to economies of scale: they already announced that if Dojo works as expected they'll make it available to others as an AWS-style service.
  "If we manage to put together a working processor, supporting hardware, OS, and possibly ad-hoc programming language, our next step is to also develop a bunch of web services to provide cloud hosting services."
  Not very credible. As if the key to offer competing cloud hosting services is developing the whole hardware and software stack.
  
  avereveard 3 years ago
  
  And network infrastructure, isolation between customers, scheduling hardware allocation, etc etc running one own data enter is quite different than inviting all sort of third parties in.
  
  doctor_eval 3 years ago
  
  Yeah, but it’s not like this is rocket science or anything.
  
  arinlen 3 years ago
  
  > Yeah, but it’s not like this is rocket science or anything.
  The key difference between this goal and SpaceX is that Elon Musk bought a private space company that already had the know-how and the market presence in a market with virtually no competitor.
  In this case, Tesla is posting wild claims about developing the whole vertical integration of the whole tech stack barely over mining semiconductor raw materials, with which goal? Competing with the likes of AWS, Google, and Microsoft, on a very niche market?
  Digging holes in the ground is hardly rocket science as well.
  
  jstclair 3 years ago
  
  He did what? The pre-existing know-how to build reusable rockets? Are you confusing SpaceX and Tesla?
  
  gitfan86 3 years ago
  
  Ever since Elon became the world's richest man people like this have showed up. I don't know if they have been misinformed or if they just want to say negative things about billionaires. But the early history of SpaceX is very well documented in the book liftoff if anyone wants to know the truth.
  
  FireBeyond 3 years ago
  
  You say that Tesla might do this for others, AWS style.
  Then talk about the upfront in house costs of setting up for GPU ops. But ignore that if an AWS style model works for you, well, AWS is already capable of giving it to you in GPUs.
- ip26 3 years ago
  
  They aren't going after economies. If you look closely at their design choices, they are building a pure scale-out vector machine unlike anything else currently on the market. I'm guessing they expect it to be head & shoulders ahead for their inhouse workload.
  
  rbanffy 3 years ago
  
  Cerebras could decide to compete in that space.
dragontamer 3 years ago

Dojo is rumored 7nm to boot. They'll be competing with a process node disadvantage. TSMC is down to 3nm and 5nm already.
AMD chips like MI250x are 5nm with tensor matrix multiplication units. NVidia Hopper will be like 4nm IIRC?
- kjksf 3 years ago
  
  During first AI day Tesla already said that they have plans and ideas for 10x improvements for Dojo v2. And I'm sure one of the improvements is to use 5nm or 3nm process.
  They've already been working on this for almost 7 years. This is not some side projects but a serious operation. They have to ship something sometime and they decided to ship this now.
  I'm confident that they know about 5nm process and chose 7nm for good reasons.
  There will be next version and next process.
  
  sbierwagen 3 years ago
  
  >And I'm sure one of the improvements is to use 5nm or 3nm process.
  Implementing a process shrink is not just scaling the masks by the appropriate percentage. It often is a completely different optical train, at a different wavelength, different pellicle, changing the refractive index of the immersion fluid, different multiple patterning. It takes months (years?) of work.
  For many applications it's worth it, but it's not at all a slam dunk. The vast majority of ASICs are designed for a particular node and never upgraded. It's the kind of crazy long-term speculative capital investment TSLA might have indulged in when the stock was at $414 a share, but it's nowhere remotely near that today.
  They're already engaging in layoffs, today. Why spend the money?
  
  rbanffy 3 years ago
  
  > vast majority of ASICs are designed for a particular node and never upgraded
  They are, if the new node is cheap enough to justify the investment in shrinking the design. In most ASICs, being faster won’t make people rip out their embedded electronics for new models.
- panick21_ 3 years ago
  
  Node is not everything, cheaper production and larger availability matters.
  You both have limited availability of the node and limited availability of the particular chip.
  Plus you are paying the margin of that company.
  
  dragontamer 3 years ago
  
  If you are running a machine at near 100% utilization (like supercomputers should), then your biggest cost will be power costs.
  So power efficiency becomes king, and process node is the best way to minimize power consumption.
etaioinshrdlu 3 years ago

As far as I can tell Tesla doesn't even use Dojo seriously yet. The real work is still done on NVIDIA hardware.
- kjksf 3 years ago
  
  Yes, per the talk Dojo is at the stage of having made first chips.
  But your comment is neither here not there.
  What do you think is a timeline for designing a chip like Dojo from scratch?
  Tesla has been working on this for almost 7 years (https://www.linkedin.com/in/ganesh-venkataramanan-99272a3/).
  They might not work fast enough for your satisfaction but can you point any other car company designing chips like Dojo?
  Hell, can you point to any company at all that can come out of the gate with a chip design competitive with the best design of NVIDIA, a company that makes nothing but chips?
  After 7 years of work they are clearly confident that this will work or else they would scarp the project instead of giving talks at conferences.
  
  FireBeyond 3 years ago
  
  > After 7 years of work they are clearly confident that this will work or else they would scarp the project instead of giving talks at conferences.
  They are clearly confident that FSD will work, to the point of promising it "this year"...
  coincidentally, they've been promising "this year" for the last 7 years. And FSD still fails in spectacular/hilarious/horrifying ways, and isn't _remotely_ close.
  
  p1esk 3 years ago
  
  Hell, can you point to any company at all that can come out of the gate with a chip design competitive with the best design of NVIDIA, a company that makes nothing but chips?
  Google.
  
  TaylorAlexander 3 years ago
  
  Sounds like Tesla is in good company then.
  
  p1esk 3 years ago
  
  Yes. In addition to Nvidia and Google it will also have to compete with Intel, Cerebras, SambaNova, Graphcore, and maybe even AMD.
  
  TaylorAlexander 3 years ago
  
  The point was that very few companies can pull off a clean sheet chip design from nothing. Google has done it, but Google is an elite company. So saying Tesla isn't the only company that has done it because Google has done it only shows that Tesla is doing very well.
  But it seems clear to me that the point isn't to compete with these other companies, but to vertically integrate a critical component of their systems. They can discard a lot of legacy concerns and focus on raw power.
  The thing is that if Tesla just uses Nvidia, like everyone else, then Tesla's stuff is only differentiated by software. Everyone uses Nvidia and it becomes hard to set themselves apart. But if Nvidia is chasing a broad customer base and has all this extra stuff to think about, then Tesla could potentially be more nimble, and produce a holistic system design that solves exactly the problems they have with no chuff. This could result in a more advanced hardware platform so their robotics products differentiate themselves with both hardware and software.
  I am also happy because I am a robotics engineer, and in my opinion we need hardware that is 1000x more powerful than today to do what we really need. Nvidia wants to move at a certain pace, but if Tesla is trying to beat them on raw power, then Nvidia will play catch up and accelerate development of more powerful systems. This is great for everyone.
  
  p1esk 3 years ago
  
  Internally Tesla can use whatever they want and it might even make sense in the long term, but if they want to sell their chips for general model training they better be much better than the future Nvidia cards they will be competing with when they start selling. Like twice faster at half the price and half the power - with perfect framework support. That last part is extremely important: if I see any mentions of anyone who changed “cuda” to “dojo” in their Pytorch code and ran into any issues, I’m not going to touch it with a ten foot pole. Just like I avoid TPUs because 2 years ago I heard people were having issues with them. And I’m the guy who has decided which hardware to spend millions of dollars on at several companies.
  
  TaylorAlexander 3 years ago
  
  Yeah I just don't think that is really the main part of their strategy. Maybe they would sell chips or boards or servers if they are already making them, but I think it is mostly about internal use so their end products have a competitive advantage as complete robots. Robotics needs HUGE advances in compute and with their own chips Tesla won't have to be dependent on a third party for their success.
  All the stuff you talked about about needing perfect support before you will touch it is something that takes a lot of work for nvidia and others, slowing them down. Tesla can ignore all that and focus on performance for their specific application, and I think this gives them the freedom to lead the pack on raw performance for their application.
  I'm not sure if you've watched the presentations for how their self driving system is trained, but basically they have a million vehicles out in the real world with camera systems on them, and they have a massive server farm that is collecting new data from vehicles all the time, and they train their neural net by running millions of scenarios in simulation and against real world data collected from all those vehicles. And they have to re-train the system all the time with new data, and run it against old data to check for regressions. So they have this huge compute requirement to keep improving their system. They think that functional self driving will revolutionize their business (setting aside the valid criticism, this is what Tesla thinks) so they need to be able to handle an ever growing compute load that they have to be running constantly. So raw compute power is critical to the success of their plan. It may not be enough, but if they certainly can't succeed without it. But their needs are very specific, and it sounds like they've found an architecture which is simpler than most nvidia chips, but has loads of power. So it sounds like they are making a good decision, based on their specific needs. It is a huge risky bet, but then that's how Musk likes to do things.
  
  p1esk 3 years ago
  
  Robotics needs HUGE advances in compute
  This is surprising to me. Robotics clearly needs huge advances in algorithms (RL or something better). Do you mean you need faster hardware to discover those algorithms?
  
  TaylorAlexander 3 years ago
  
  Oh we definitely need better algorithms too! But I’ve imagined that we’d want something like GPT-3 but for sensor experiences. So the way GPT-3 can ingest lots of text and predict a reasonable paragraph as a continuation of a prompt, we could have a system ingest lots of simultaneous sensor data in the form of LiDAR, cameras, IMU data, and touch sensor/skin sensor data, and then given the current state of those sensors it could predict what in the world is going to occur next, and use this as input to an RL system to make a choice of action. This seems to me to be both a useful system and one that could require a lot of compute. And that’s still probably not a complete AI system so there’s probably many many pieces required.
  Looking at it another way, the human brain has wayyy more compute power than any of our current portable computers (robotics really needs to do most of its compute at the edge, on robot). Every robot I’ve ever worked with has been maxing out it’s CPU and GPU and still needed more compute.
  When you look at Tesla’s hydra network for their self driving system you get an idea for what is needed in robotics, but just as we saw GPT-3 improve with network size, I suspect a lot of the base systems involved in a hydra net could improve with network size. And I suspect that there’s still more advanced stuff required when you move beyond a simple task like driving a car to a more advanced general purpose AI system. For example the Tesla self driving system doesn’t need any language centers, and we know GPT-3 and similar networks are large.
  
  p1esk 3 years ago
  
  robotics really needs to do most of its compute at the edge, on robot
  Why can't you hook your robot up to a GPU cluster? Tesla already has 7k+ A100 GPUs, the question is do they have algorithms which would clearly work better if only we could run them on 70k or 700k GPUs?
  I mean, what you say makes sense, but have people actually tried all that and realized they need bigger GPU clusters? Is there a GPT-3 equivalent model in robotics which would probably work better if we scaled it up? If not, perhaps they should start with a small proof of concept before asking for thousands of GPUs. Original Transformer --> GPT1 --> GPT2 --> GPT3.
  
  TaylorAlexander 3 years ago
  
  The problem with this is that autonomous robots need to function in the real world even without internet connectivity. For example I am designing a solar powered farming robot. We do have Wi-Fi and starlink here but Wi-Fi and internet go down. In general we think it makes most sense for it to be able to operate completely without a continuous internet connection. And take self driving cars where millisecond response times matter - those can’t rely on a continuous connection to function or someone will get killed. But as systems get more advanced it is my opinion that edge compute will be an important function. And edge compute can’t handle the state of the art networks that people are building for even text processing, let alone complete autonomous AGI.
  And no, the models I’m talking about don’t exist yet, I am speculating on what I think will be required based on how I see things. But I’m not asking for thousands of GPUs. I’m just speculating in internet comments about what robotics will need for a fully embodied AGI to function. I believe more edge compute power is needed. Perhaps 1000x more power than the Nvidia Orin edge compute for robotics today.
  
  p1esk 3 years ago
  
  Sure, I understand the need for the robot autonomy. The problem, as I see it, is that current robots (autonomous or not) suck. They suck primarily because we don't have good algorithms. Without such algorithms or models, it does not matter whether a robot is autonomous or not, or how much processing power it has. Only after we have the algorithms, the other issues you mentioned might become relevant. Finally, it's not clear yet if we need more compute than what's currently available (e.g. at Tesla) to develop the algorithms.
  p.s. I don't think AGI is needed for robotics. I suspect that even an ant's level of intelligence (together with existing ML algorithms) might be enough for robots to do almost all the tasks we might want them to do today. It's ironic that robots today can see, read, speak, understand spoken commands, drive a car (more or less), play simple video games, and do a lot of other cool stuff, but still can't walk around an apartment without falling or getting stuck, do laundry, wash dishes, or cook food.
  
  aunty_helen 3 years ago
  
  >After 7 years of work they are clearly confident that this will work or else they would scarp the project instead of giving talks at conferences.
  Being real for a second, Tesla don’t have an amazing track record of delivery on their ai software side of things.
  
  panick21_ 3 years ago
  
  Also, even if the first version didn't make financial sense, building the team and the infrastructure required to make this work will likely translate into a next generation chip.
  Part of Tesla vertical integration mantra is that you are building internal capacity.
  
  croes 3 years ago
  
  >After 7 years of work they are clearly confident that this will work or else they would scarp the project instead of giving talks at conferences.
  Doesn't the same apply to FSD? How many years is it now ready the next year?
  
  rbanffy 3 years ago
  
  Remember AI? AGI seems just around the corner since the 60’s. There we learned that the things that seemed complicated (such as beating a champion at chess) were simple and the simple things are still almost mind blowingly hard.
- cjbgkagh 3 years ago
  
  It seems like to me they took a big bet and it didn’t pay off. They may not have felt like they could rely on Nvidia to deliver the promised Tensor Core performance.
  
  kjksf 3 years ago
  
  It's a bit premature to call this a failure given they are barely at a stage of making first chips and validating them.
  You don't seem to understand the timelines of designing chips like that from scratch.
  Dojo has been in works for almost 7 years. Tesla will continue to work on this for the next 20 years. There will be Dojo v2, Dojo v3 etc. just like at SpaceX there was Falcon 1, Falcon 9, Falcon Heavy.
  This still might end up a failure but they clearly feel confident enough to talk about Dojo publicly, which wasn't the case for the first 5 years of its development.
  
  mupuff1234 3 years ago
  
  It's even more premature to predict what will happen 20 years in the future.
  
  cjbgkagh 3 years ago
  
  Thanks for explaining to me what I don’t know. Clearly you’re a Tesla fanboy.
  There is no doubt that it is an amazing piece of tech but I’m not confident Tesla will be able to pull off beating NVidia. Especially when compared to NVidias Tensor Cores and economics of scale. I don’t like their whole approach to self driving ML. I know a lot of people disagree with me so I would rather not get into it.
  I think a lot of the talk of future performance is posturing in order to get NVidia chips at cheaper prices.
  
  minhazm 3 years ago
  
  I'm sure that it's at least partially for them to have a better negotiating position with NVIDIA. But the reality is that Tesla actually has some very good expertise in chip design. Jim Keller worked there for several years and along with Pete Bannon designed their Autopilot HW3 computer which is in 1+ million cars right now. At the time that HW3 was released it outperformed what NVIDIA had to offer. That said it's not likely they'll be able to beat NVIDIA, but they may be able to beat them for their hyper domain specific use cases. Additionally NVIDIA chips are difficult to get and they're extremely expensive. Even if Tesla can get something that performs 80-90% as good as NVIDIA but at significantly lower cost then it may still be worth it.
  
  cjbgkagh 3 years ago
  
  I know these things. I think some of the dojo architecture was a reaction to their FSD chips being over optimized for resnet style models. They’re targeting extremely large models which is a new frontier for ML and in my view not the panacea they hoped it would be.
  I think with better ML they wouldn’t need so many chips, be it NVidia or Dojo.
pookeh 3 years ago

Perhaps they do this so they can carry over this tech into their cars. If you are making millions of cars per year, with tens or hundreds of chips per car, it is easy to reason doing this on your own instead of outsourcing it.

TekMol 3 years ago

The main concern mentioned in the comments seems to be that programmers will find it hard to work with this hardware.

But is that really the intention? Wouldn't it be enough if there is one program written for it, one that takes a bunch of inputs and a bunch of outputs and then creates a set of NN weights that perform well?

So the AWS style offer would not be "rent our hardware and program on it" but "rent our hardware/software stack, throw your input/output pairs at it and get back a set (billions) of NN weights that perform well".

WithinReason 3 years ago

But then why have e.g. a branch predictor instead of making it more specialised?
- amelius 3 years ago
  
  Perhaps because they will still run multiple programs, just not at the same time.
  Or because the program behaves differently in different places, and it is too big to optimize.

astrange 3 years ago

This is meant to run in a datacenter, not a car, right? There’s some value to custom ASICS if you can get it - Google seems happy with TPUs and I assume those aren’t on the latest Nvidia-competitive process.

MBCook 3 years ago

Sounds like it’s designed to crunch data in a data center to do training for models that could be shipped to the cars.
dragontamer 3 years ago

If they are spending this much money on the data center chip, they probably will port the assembly language over to an ASIC inference chip for their cars.
I honestly can't see how they make (or save) enough money with just the data center chip here.
Maybe if the inference chip is on a cheaper 12nm process or something... Maybe?
- kjksf 3 years ago
  
  It's not about saving money.
  It's about accelerating nn training and winning robotaxi market, which could make them the most profitable company ever.
  
  dragontamer 3 years ago
  
  NVidia chips can do this job. (EDIT: The job of training a neural network I should say. I don't think that's enough to get self driving, but Dojo ain't anything but a faster NN training chip anyway)
  The question is if they are saving money compared to buying an off the shelf DGX system. I really doubt it.
  
  pengaru 3 years ago
  
  > NVidia chips can do this job.
  Presumably Tesla, at the time they decided to pursue this option, thought they'd potentially have a competitive advantage with in-house designs.
  It's entirely possible that's just their hubris showing, time will tell if this was the right decision or not. After seeing the NVidia presentation announcing their latest datacenter-scale AI hardware, I'd be surprised if Tesla's in-house design is more than just a massive cost center vs. buying something from NVidia.
  But sometimes you do things that appear irrational in part to keep your talented engineers from seeking work elsewhere. Just look at NASA's SLS, how much of that is a job program in part to prevent hoards of talented folks building rockets for competing nations?
  
  kjksf 3 years ago
  
  There's nothing irrational about it. It's a big, bold, costly bet. A risky bet but not irrational.
  People doing this are not some bored Tesla engineers.
  Both the FSD chip and Dojo are staffed by chip design veterans Tesla poached from AMD, Apple and others.
  Just read those resumes:
  https://www.linkedin.com/in/peterbannon/ : this is the guy leading FSD chip
  https://www.linkedin.com/in/ganesh-venkataramanan-99272a3/ : this is the guy leading Dojo
  People like this can only be poached with a combination of great project and great salary.
  It's a team that was built from grounds up because Tesla is (also) an AI company and Musk is thinking 10 years ahead.
  
  pengaru 3 years ago
  
  > People doing this are not some bored Tesla engineers.
  The whole reason they're not bored is because they're doing this.
  You do realize we're basically in agreement, right? The more impressive the resumes the more important it is that you give them Real Work to do.
  But NVidia's state of the art in this space seems substantially better than Dojo. How much that matters in totality remains to be seen.
  
  mastax 3 years ago
  
  If Tesla wasn't designing chips, they wouldn't have any chip designers to get bored.
  
  pengaru 3 years ago
  
  But once Tesla is designing chips for their in-vehicle inference needs, they need to keep those people interested and the large-scale training side is arguably more interesting to DIY.
  
  astrange 3 years ago
  
  > It's a team that was built from grounds up because Tesla is (also) an AI company and Musk is thinking 10 years ahead.
  Well, also because Tesla has an unnaturally low cost of capital because of its meme stock status.
  
  epgui 3 years ago
  
  Tesla makes mad money. It's not a meme stock.
  
  baq 3 years ago
  
  the biggest product of Tesla is its stock, there are no ifs and buts about it. this must change soon, since that mad money is barely enough to eek out a profitable quarter.
  
  epgui 3 years ago
  
  You clearly don't read SEC filings / financials.
  For your convenience, and you will want to look mainly at 10K and 10Q forms: https://www.nasdaq.com/market-activity/stocks/tsla/sec-filin...
  
  dragontamer 3 years ago
  
  > After seeing the NVidia presentation announcing their latest datacenter-scale AI hardware
  Did we watch the same presentation? NVidia knocked it out of the park.
  Thread block cluster is obviously amazing. Routing between SMs / compute units will be far faster with this level of software abstraction, and it will be exceptionally easy to write code for. NVidia always impresses me with their advanced software techniques and clear understanding of the fundamental model of SIMD compute.
  ------
  Ignoring those software details... the important thing is GH100 will be TSMC 4nm, which is 1.5 nodes ahead of the 7nm Dojo. A significant process advantage, representing 60+% less power usage and 300% the transistor density of the older 7nm tech.
  Even if NVidia's GPU had issues, there's something to be said about just being a raw process node (or 1.5 nodes) ahead.
  
  pengaru 3 years ago
  
  > Did we watch the same presentation? NVidia knocked it out of the park.
  Perhaps I worded it poorly, I agree with you.
  My meaning was vs. NVidia's latest tech it seems like Tesla's in-house datacenter NN could be nothing more than a huge cost center without even offering an advantage over what NVidia could sell them.
  But like I said, if you have a staff of folks capable of building such things you have to keep them satisfied with practicing their craft or they leave.
  
  sufiyan 3 years ago
  
  This was the case when Google launched their TPU as well. Look where they are now
  
  p1esk 3 years ago
  
  Where?
  
  dragontamer 3 years ago
  
  Gotcha. Misread for a sec.
  
  atty 3 years ago
  
  The person you were replying to agreed with you, that it seems like Nvidia is doing a great job in the data center.
  
  panick21_ 3 years ago
  
  > NASA's SLS, how much of that is a job program in part to prevent hoards of talented folks building rockets for competing nations
  Zero. Because NASA has no problem if those engineers would work for other nation as long as it isn't Russia or North Korea and co. And that wouldn't happen anyway.
  Those people would likely work in one of the huge amount of space startups or just go to the typical ULA, BlueOrigin, SpaceX and so on.
  You make the totally wrong assumption that SLS has anything to do with rational thought. It really doesn't.
  
  jml78 3 years ago
  
  Tesla is very vertically integrated. This is just how they operate. You can make the argument that they shouldnt be so vertically integrated but it has worked for them thus far.
  
  dragontamer 3 years ago
  
  Except Tesla has 7,360 A100 GPUs.
  https://www.tomshardware.com/news/tesla-brags-about-in-house...
  So... no? They are clearly leveraging the NVidia ecosystem right now. Now maybe they have ambitions to get off of NVidia, but they're doing so in a rather asinine fashion. There's probably half-a-dozen groups trying to make a faster systolic matrix multiplication unit for the deep learning crowd. Tesla probably should have either worked with those groups and/or bought one out, for example.
  
  kjksf 3 years ago
  
  This gives me flashbacks of "advice" that Tesla should outsource manufacturing of Model 3 and focus on design.
  Tesla is (also) an AI company. Musk is thinking 10 years ahead.
  If you read the resumes of people leading FSD chip and DOJO: those are chip design industry veterans that Musk hired away from AMD, Apple and other.
  He built a stronger team than whatever startup you think he could buy.
  Those people have already been working on Dojo for over 6 years and they'll be working for next 20.
  https://www.linkedin.com/in/ganesh-venkataramanan-99272a3/ is the lead for Dojo and has been at Tesla for 6.7 years.
  Tesla can hire they best people and give them practically unlimited funding, certainly much bigger than any startup could afford.
  This is a big, bold, risky bet. Kind of like reusable rockets or going to mars.
  
  dragontamer 3 years ago
  
  Except... they already did outsource training chips and bought thousands of NVidia A100 GPUs.
  So now they'll need porting effort to move their GPU-based current design to this new, very non-GPU, instruction set.
  
  speedgoose 3 years ago
  
  It doesn’t sound like an insurmontable task for such a company.
  
  baq 3 years ago
  
  if 99% of your compute is hidden behind pytorch/tensorflow... there isn't _that_ much work
  
  dragontamer 3 years ago
  
  Sure. You only have to design a chip, design an assembly language, design a compiler, design the kernels, design a parallelization framework, design a server system to load-balance tasks, and then rework the pytorch/tensorflow code to use your new faster custom primitives that no one else has.
  -----
  Except step 1: "design a chip", is already something on the order of hundreds-of-megabucks of investment
  
  baq 3 years ago
  
  oh absolutely.
  but they're already there, looks like.
  
  dragontamer 3 years ago
  
  If they were already there, they'd have ResNet 50 benchmarks to share.
  And the other tidbit is that it has to be better than the off the shelf NVidia chip (A100 today, and GH100 in a few months).
  I think Dojo might be faster than A100, but GH100 will be 1.5 nodes ahead, that's a big process advantage...
  -------
  If GH100 is faster in practice, then all the work has been somewhat wasted.
  
  rbanffy 3 years ago
  
  > NVidia chips can do this job
  Yes, but anyone can get them - they aren’t a competitive advantage.
  
  zeristor 3 years ago
  
  If I recall from the Dojo chip announcement previously they were utilising SpaceX know how to greatly increase the cooling of the chips.
  Although the article doesn’t look to address thermal issues from my summary skim of it.
  
  FireBeyond 3 years ago
  
  The same robotaxi market that Elon promised in 2019, that would make "buying anything but a Tesla in 2020 and beyond financially irresponsible"?
  
  SneakyTornado29 3 years ago
  
  No, the data is the valuable part. Not the model weights.
- akelly 3 years ago
  
  Tesla has been shipping custom 14nm inference chips in cars since 2019. https://en.wikichip.org/wiki/tesla_(car_company)/fsd_chip

frozenport 3 years ago

   AMD pays for this flexibility by spending about 44% of “Zeppelin” die area on logic other than cores and cache. Dojo only spends 28.9% of die area on things other than SRAM and cores.

Doesn't sound worth it? You're basically spending a ton of money to fab a chip for what amounts to like a 30% training performance boost? Just buy more chips.

Sirened 3 years ago

neat :) Love seeing other people building new and weird architectures. I wonder what their actual use case for SMT was, I don't really buy the "one thread for vector, one thread for pulling data into SRAM". I'm also a bit surprised to see they didn't go for a VLIW ISA, it always seemed like these tightly integrated data processing chips were the ideal candidates since binary compatibility isn't an issue when you're building your own HW.

mrb 3 years ago

In terms of absolute compute performance per chip, perf per watt, and perf per die area, it looks like Dojo matches or surpasses the best GPUs of today: «Tesla claims a die with 354 Dojo cores can hit 362 BF16 TFLOPS at 2 GHz»

For comparison, the fastest single-chip GPU today is the AMD MI250X which has 220 "cores" (compute units) totaling 383 BF16 TFLOPS at 1.7 GHz, and that's a monster 560 watt chip.

The Dojo chip is likely under this 560 was TDP, so more efficient. And Tesla provides roughly the same compute performance, but the chip is arrayed in 61% more cores, meaning it is far more suitable to handle branchy code. Also Tesla claims the die measures only 645 mm², compared to 1540 mm² for AMD. So the wafer fabrication cost is roughly half(!) of AMD's!

If Tesla has truly managed to build that, I'm impressed.

Edit: I missed that Tesla claims "less than 600 watt" per chip. So we know its comparable or less than AMD (560 watt).

Edit 2: 25 dies are packed on a single system-on-wafer. That's 15 kW on a disc of 30 cm (12 in) of diameter. Sheesh! That must require an ungodly liquid cooling system!

Edit 3: there is more info, including rendering of the host interface card at:

https://www.servethehome.com/tesla-dojo-custom-ai-supercompu... and

https://www.servethehome.com/tesla-dojo-ai-system-microarchi...

Edit 4: found a pic of the liquid cooling system - as expected ;) https://media.datacenterdynamics.com/media/images/training_t... source: https://www.datacenterdynamics.com/en/news/tesla-details-doj... And they say the first tile was "tested last week" as of August 20th... This confirms my suspicion that the system is barely (?) functional. Also see "Venkataramanan appeared to even surprised Andrej Karpathy, Tesla’s head of AI, on stage by revealing for the first time that Dojo training tile ran one of his neural networks" from https://electrek.co/2021/08/20/tesla-dojo-supercomputer-worl...

WithinReason 3 years ago

A 2x efficiency improvement from a GPU to a specialised ASIC not particularly impressive. How much would you gain by removing the graphics related stuff from a GPU (texture pipelines, vertex processing, etc.?). In addition, they lose existing programming models like OpenCL and the compiler progress that happened in the last 10 years and have to roll their own. The amount of SW work needed to support this must be a lot to get the same ease of use out of it as GPUs. Maybe they made it more CPU like to make it easier to program?
- mrb 3 years ago
  
  FP16 AI workloads are very important to AMD and Nvidia (AMD MI250X and Nvidia A100/H100 were really designed for this), and yet Tesla leapfrogged them with a more than 2x reduction in die area, and more features (eg. out of order exec). This is what's impressive. AMD, Nvidia, and even Intel, should have been leading this, but they weren't. Seems to be a classic innovator's dilemma.
  
  WithinReason 3 years ago
  
  Let's compare with a competitor you never heard of:
  Nvidia RTX 3090: 285 Tensor-TOPs @ 350 W [1] Specialised accelerator: 100 TOPS at less than four watts [2]
  That's a 30x improvement over a GPU in power consumption. Sure, it's inference only, but training and inference are not that different.
  [1]: https://wccftech.com/nvidia-geforce-rtx-3090-geforce-rtx-308...
  [2]: https://www.imaginationtech.com/product/img-4nx-mc8/
  
  mrb 3 years ago
  
  TOPS isn't indicative of actual performance https://semiengineering.com/lies-damn-lies-and-tops-watt/ Also, this IMG chip is vaporware and they can quote an arbitrarily high TOPS/W by trading voltage & frequency with die area, therefore increasing cost beyond reasonable. I'm not surprised IMG shares no other metric.

rbanffy 3 years ago

The same way we like to talk about odd OSs, like Lisp Machines, Plan 9, Oberon, Smalltalk, I'd love to see one of these things used as a desktop CPU (since I never got to play with one of the desktoppable Xeon Phi's) just to see the weird and interesting OSs and applications that would emerge from it.

I know it must be horrendously difficult to program for general purpose use, but that's the point - figuring out what CAN be done and what an OS for it would look like.

mannyv 3 years ago

Discussions about how this will fail to be productized as a cloud service are premature. There are many, many ways to do SAAS. If the benefit is there people will find a way to use it.

In any case the problem is moving data, then the compute side. If Tesla is really going to do it they have someone working on that already. SAAS is not rocket science, and the issues are well-known.

1958325146 3 years ago

Can anyone tell if this architecture makes sense to ship in their actual cars to replace whatever is doing the computation there now? That is the only way I can see this making any sense to develop.

kjksf 3 years ago

Tesla has already designed and manufactures an FSD chip that runs nn inference in the car.
This is a chip designed to accelerate nn training in their datacenter.
The effort and expense makes perfect sense if you consider:
- the costs are spread over millions of car (3 millions, today, tens of millions in the future)
- it helps them win robotaxi market that is potentially so lucrative that the cost of developing custom chips will be peanuts
WatchDog 3 years ago

They have already developed and deployed custom silicon for their cars. There is probably some overlap in the IP between these chips, but Dojo is optimized for operating in a large data center cluster for training purposes, whereas their car chips are optimized for running pre-trained models.
epgui 3 years ago

This is not for putting in cars. This is for building giant supercomputers for research and development, inside a building (no wheels).

happytiger 3 years ago

They’re onshoring chip production and the Tesla stack needs a fab for contracts and competitiveness. Makes sense.

hello639 3 years ago

leobg 3 years ago

You made a new account to post a yawn?

slt2021 3 years ago

they want to get federal funding for chip manufacturing on US soil?

That would be typical of Elon Musk to get some federal funding $$$$$

but Elon can make it work on par if not better than nVidia, just because Tesla can hardcode and overoptimize for one single task (FSD AI), while nVidia will always be limited in keeping their chips generic enough for all purpose NN training/inference and even gaming + backwards compatibility for all historical stuff.

Tesla can work with blank slate and can reap performance benefits over there

RC_ITR 3 years ago

Neither Tesla nor Nvidia do (or want to) manufacture chips.
That's an extremely complicated thing that requires armies of PhDs (chem, EE, physics, etc.) and billions of dollars.
You can think of what this is as 'maps' that manufacturers instantiate as chips.
IDK what tools Tesla is actually using, but for context, the work that Tesla employees would do looks like this: https://www.cadence.com/en_US/home/tools/custom-ic-analog-rf...
kjksf 3 years ago

Tesla started this effort 6 years ago (https://www.linkedin.com/in/ganesh-venkataramanan-99272a3/).
So no, your conspiracy theory does not align with reality.

Animats 3 years ago

OK, but why? Is this somehow supposed to make their flaky "self-driving" work? Or what?

Certainly you can build custom chips for deep learning. It's mostly a simple inner loop, replicated many times, after all. Ideal case for ASICs. But will this actually benefit Tesla as a car company?

keepquestioning 3 years ago

There was no reason to waste money for this. You don't want to focus on two incredibly resource-intense and error-prone processes in the same company (Automotive, Semiconductor).

This is a hype project simply created to pump up the stock price.

panick21_ 3 years ago

Its so hilarious when people think everything Tesla does is some 4D chess about stock price, when in reality Musk really doesn't care about stock price at all. He literally thinks multiple years ahead, and doesn't change that strategy because some short term head winds.
> two incredibly resource-intense and error-prone processes in the same company (Automotive, Semiconductor).
If you think making these chips is excessive, you don't know Tesla. Tesla is the most vertically integrated car company in the world (maybe BYD compares).
Compared to literally designing not only their own batteries cells, but their own battery chemistries plus their own battery materials process AND their own fully up battery manufacturing factories. They started doing this a decade ago and only this year the first cars rolled of the line and being sold to costumers. That is years and years of investment, planning and internal capacity building.
In 2017 people were laughing at them for attempting to do all their own manufactriong. Now they have industry dominating automotive margin. They literally have their own materials team that makes their own aluminum for casting and have co-developed production machines that are the size of a whole house. This is now slowly being copied by other manufactures.
They also make their own glass, seats and a whole host of other things that most car companies don't make. Doing these things in-house cost billions and billions in capx.
Also, Dojo is not the only chip they make, they have other chips that are already being sold in their cars by the millions every year. Dojo is a pittance of investment compared to that and a fraction of the total investment into the overall self-driving stack. Just because you don't agree with the overall strategy (assuming from your comment) doesn't mean that they many, many billions Tesla spends isn't real, or its all some sort of stock price play.
So, if you think Dojo is about stock price, I don't know what to tell you, its just wrong.
- mgoetzke 3 years ago
  
  So many people here seem to be convinced that since NVidia is the leader in this tech all other attempts of building up something comparable is a loosing proposition, shouldn't be even attempted, it senseless, and everybody should just bow to them and buy from them.
  Weird. I though competition, even when the Leader is far ahead is a good thing and up to the Investors.
  No idea whether they can make it work, it is really hard, especially adjusting their tools, but they have written most tools themselves, so adjusting should be possible. Making those tools available to outsiders one day ? can happen, doesn't have to.
  
  sinenomine 3 years ago
  
  > So many people here seem to be convinced that since NVidia is the leader in this tech all other attempts of building up something comparable is a loosing proposition
  A non-issue if you understand the design space of TPU-like AI accelerators. The understanding shouldn't even be necessary because the major players (Google, now Amazon) already build their TPUs and have been for years.
  Nvidia tax is too much.
- keepquestioning 3 years ago
  
  panick21_ 3 years ago
  
  hello639 3 years ago
- dagmx 3 years ago
  
  Your first point is provably false because Musk thinks about stock price often, and posts on his Twitter accordingly.
  However I do agree that Dojo isn’t about the stock price though I’m sure it’s about reducing their bottom line costs long term.
  
  fastball 3 years ago
  
  I guess they meant to say he doesn't prioritize the stock price going up.
kjksf 3 years ago

There's a legion of people convinced of their superior intellect with "helpful" advice about what Musk / Tesla / SpaceX should or should no do.
After first AI day Tesla stock dropped significantly. Wall Street doesn't understand Tesla as anything else but a car company and won't understand AI until it start producing billions in revenues.
It's exceedingly poor stock pump.
They've been working on Dojo for almost 7 years. They kept it secret for the first 5.
Kind of contradicts the "stock pump" memes.
They start talking about it now because those efforts are very advanced and they want to hire more people.
- stephencanon 3 years ago
  
  It has not really been a secret, they just haven’t been talking about details much. It was pretty clear to anyone who works in related areas what they were up to.
AlotOfReading 3 years ago

While I agree that most custom silicon is a bit silly (and the costs dramatically underestimated), it's worth noting that most of the serious companies in the AV space have some some amount of custom silicon for ML either in the works or already in production. Waymo is using TPUs for example and Cruise has its own chips as well. It's the cool thing to do in SV lately. Doesn't help that Nvidia is such a toxic partner that the prospect of not working with them makes custom silicon seem almost reasonable.
mgrund 3 years ago

Given Musks public statements on goals and strategies, I think you should view it more holistically.
With Musk, everything is a step towards the end goal(s), like colonizing Mars (which might even be the only goal since it ties everything together; it’s much easier to send humanoids to prepare for colonization and it’s not like HCI seems at all a priority unlike many other AI efforts).
My guess is that with Tesla, Musk aims to create AGI in a way he can control (as you may recall, he has previously talked about fears of the tech and control is a natural strategy to handle this). The cars are just the step that enables the next, so he basically just invests in the future over short term gains, which is a very reasonable business decision.
Custom silicon is likely unavoidable if you want to lead the AGI field, so he would just be investing in the next step of the plan. Tesla Bot is a good indicator of this direction.
- tsimionescu 3 years ago
  
  > Tesla Bot is a good indicator of this direction.
  Tesla Bot was the human in a bot costume dancing on stage [0], right? If so, then yes, this is quite a good indicator of the seriousness of this direction.
  [0] https://www.youtube.com/watch?v=HUP6Z5voiS8&t=1m2s
  
  relativ575 3 years ago
  
  Urgh, they explicitly mentioned that it was a real human in the same presentation. Your obsession with bashing Tesla is unhealthy. Pick a valid criticism.
  
  tsimionescu 3 years ago
  
  Yes, I know that wasn't a lie, just a very corny joke.
  Did we ever get anything more substantive about Tesla Bot though?