The complex history of the Intel i960 RISC processor

126 points by zdw 3 years ago

adrian_b 3 years ago

As always in this blog, the article is well researched and it includes a lot of interesting information.

Nevertheless, there is an important point about 80960 that is not discussed at all. There still exists a legacy of 80960 that is incorporated in all modern Intel and AMD CPUs, which consists of the XADD (normally LOCK XADD) instruction.

Before 1980, the instruction sets of the CPUs included 3 kinds of atomic read-modify-write instructions (I mean dedicated atomic instructions, not just locked variants of some standard instruction with a memory operand): test-and-set (introduced by UNIVAC 1108 in 1965; e.g. Motorola MC68000 had TAS), swap (introduced by Edsger W. Dijkstra in 1971-10; e.g. Intel 8086 had it under the mnemonic LOCK XCHG) or compare-and-swap (both compare-and-swap and compare-double-and-swap were added to IBM System/370 in 1973; Motorola MC68020 added similar instructions in 1984).

The next innovation in atomic instructions happened in 1981, when Allan Gottlieb and Clyde P. Kruskal proposed the fetch-and-add operation for the NYU Ultracomputer project.

For the next few years, the fetch-and-add operation remained available just in that academic project, but then somebody decided to implement it in 80960.

Fetch-and-add was first documented publicly by Intel in early 1988, in the BiiN description, under the mnemonic ATADD (atomic add). The ATADD instruction was also supported by 80960MC and 80960KB, which were launched later in 1988.

One year later after 80960, on 1989-04-10, Intel launched the 80486 CPU, whose most important ISA enhancements over 80386 were extra atomic instructions, i.e. fetch-and-add from 80960 (LOCK XADD) and compare-and-swap from IBM and Motorola (LOCK CMPXCHG).

Since 80486, all Intel and AMD CPUs include the fetch-and-add instruction inherited from 80960 (whose origin was in the NYU Ultracomputer).

alexjplant 3 years ago

> Clyde P. Kruskal proposed the fetch-and-add operation for the NYU Ultracomputer project.
I had no idea that my Discrete Structures professor from undergrad had such a storied CV! Kruskal was one of my favorite professors. One thing that sticks in my mind about him was what he said when he got to the Honor Code section of the syllabus while going over it on the first day of class. Instead of giving us the customary speech (humorous, menacing, or otherwise) about catching students cheating in the past and their academic records being ruined he simply said "I trust you all implicitly to do the right thing" and moved on. He was a great teacher and was similarly engaging for the rest of the semester.
- bee_rider 3 years ago
  
  Apparently his uncle was the minimum-spanning-tree guy.
kens 3 years ago

That's an interesting piece of history. Thank you for sharing it.

quercusa 3 years ago

Great info as always, Ken.

In the early and mid-90s, Intel was pushing their Intelligent I/O initiative (aka I2O with a subscript '2'). This had a split-driver model, the bottom half of which ran on an i960. It had no performance advantage and IHVs suspected it was an Intel plan to sell 960s and take away their secret-sauce advantages.

I2O didn't provide much advantage on 1-4 processor systems but provided much of the underpinnings for the NGIO and InfiniBand I/O model.

kens 3 years ago

Yes, interest in I2O was lukewarm for the reasons you mention. I gave I2O a brief mention in the article but didn't want to go into too much of a tangent :-)
I suspect that some people at Intel really wanted to have IBM System/360 mainframe style I/O channels. There was the 8089 I/O coprocessor for the 8086. The iAPX 432 had a I/O coprocessor (43203) that worked with another microprocessor to act as an I/O channel. And then there was I2O, using an i960 as an I/O processor.
- rwmj 3 years ago
  
  I2O was interesting because for a (short) time it was considered a threat to Linux and open systems.
  https://lwn.net/1998/1008/a/rms-udi.html https://lwn.net/Articles/102772/ https://lwn.net/Articles/104420/
  
  sillywalk 3 years ago
  
  This was also the time of the Intel proposed Uniform Driver Interface
- rjsw 3 years ago
  
  The 8089 coprocessor was used in the Apricot PC [1], I had one but don't remember any documentation for the 8089 being available at the time.
  [1] https://en.wikipedia.org/wiki/Apricot_PC
- convolvatron 3 years ago
  
  it had this 'flyby dma' model for doing data transfers without going through the register file. which was pretty handy for making things like channel processors.

kens 3 years ago

I'm sure there are a bunch of people on HN who have used the i960. Author here if anyone has questions about the article.

kps 3 years ago

At the time I worked for a small compiler shop, and we had a C compiler for the i960 — https://web.archive.org/web/19971014124149/http://www.archel... — In particular I wrote the standalone instruction scheduler described under ‘Scheduler optimizations’, particularly useful for the superscalar variants.
drmpeg 3 years ago

I worked on a bunch of MPEG-2 encoder and decoder designs using the i960CF and i960RP when I was at Optivision. Here's a photo of an MPEG-2 encoder (found on E-bay) that used the i960CF and a PCI interface chip designed specifically for the i960CA/CF, the PLX PCI9060.
https://www.w6rz.net/opti.jpg
With all the interest in retro computing these days, I wish I had kept more stuff from the 80's and 90's. Here's all I have left, an i960CA-16 chip from a development board that I had replaced with an i960CF and a manual.
https://www.w6rz.net/i960ca.png
https://www.w6rz.net/i960camanual.png
- drmpeg 3 years ago
  
  I found another item that I hung onto. A CD with the i960 tools including gcc (looks like it's gcc 2.7.2). Also has a bunch of manuals in PDFs.
  https://www.w6rz.net/IQ80960RXK5_2.iso
  105,078,784 bytes.
  
  kens 3 years ago
  
  That's a nice collection of documents on the i960 RM and RN chips.
- mips_r4300i 3 years ago
  
  That's pretty neat. I enjoy seeing first-gen designs like this, before everything gets integrated down to one chip.
  No shortage of PLDs and dual port ram on there. And not one but two CL4040 encoders from C-Cube.
  What was the reason for having both a i960 lacking a FPU along a probably much more powerful TMS320? Was PCI in general a weak point of the dsp chip and a strong point of the 960?
  
  drmpeg 3 years ago
  
  The CL4040 had two CL4020 chips in the package. It took four chips to encode 1/2 D1 resolution (352x480), so this was the smallest MPEG-2 encoder possible at the time (1995).
  The TMS320 chip was the audio encoder and was only capable of MPEG-1 Layer 2 coding.
  The i960 was primarily used for bitstream multiplexing, either Transport Stream or Program Stream. The idea was to offload the host so that you could have many encoders in a cheap PC (with a big power supply).
jes 3 years ago

I worked at Applied Microsystems (now defunct). We made in-circuit emulators and did one for the 80960CA variant. I remember that processor had an undocumented (IIRC) feature called "Incremental Trace" that would allow the emulator to keep track of what the processor was doing. We used it to develop an execution trace disassembler for the processor.
I seem to remember HP/Boise was using a lot of the CA parts for use in printers. They had one of our emulators, and I remember that we once got a report of the emulator probe tip having caught fire (!) due a short or some such.
I enjoyed working with the CA part. I also worked with the 80960MX part. Now, that was an odd processor!
- kps 3 years ago
  
  HP (Panacom) in Waterloo used them in X terminals. Edit: Specifically the HP 700/RX.
- jes 3 years ago
  
  Following up my own post, I also remember the hardware guys having occasional phone calls with some guy known only as "Larry" at Intel. I'm guessing he was a microcode guy or some such. Would be interesting to know more about him.
neurotech1 3 years ago

I'm not so sure the F-18 Stores Management Processor used the i960MX. Any sources to confirm that?
There has been some interest recently in the F-22 Common Integrated Processor using the i960MX, as the USAF wishes to retire older F-22 Block 20 Aircraft.
Apparently, newer F-22 Blocks use FPGAs to provide a functional soft core of the i960MX, without having to locate long obsolete chip sources for legacy avionics designed in the 80s
- kens 3 years ago
  
  Of course I have a source :-) https://www.google.com/books/edition/Avionic_Systems_Design/...
  It was upgraded from the HAC-32 (Hughes system with the i960 MX) to PowerPC around 2002. See page I-12 of https://studylib.net/doc/11020640/navy-training-system-plan-......
  
  sillywalk 3 years ago
  
  Very interesting article. The 35(!) i960s in the F22. Do you know how they were ..distributed?
  
  neurotech1 3 years ago
  
  Interesting. Thanks for the references.
ggm 3 years ago

Labtam x-terminals and ultimately edge routers did. I briefly worked doing regional network deployment and support with them, and some board replacements in the field. Tektronix bought out their IPR. (Australian company, one of the last manufacturing IT companies doing on shore assembly)
They ran a Unix kernel and had a very good networking stack. I believe they coded early to the emerging new TCP flow control models.
rwmj 3 years ago

> The die was too large for the optical projection of the mask, with the result that the corners of the circuitry needed to be rounded off.
How big is the die in the picture?
- kens 3 years ago
  
  The die is 620x680 mil. https://archive.org/details/guidetoriscmicro0000unse/page/26...
jjtheblunt 3 years ago

Was the i960 in laserprinters?
- kens 3 years ago
  
  Yes, laser printers were one of the big markets for the i960. The Am29000 was their main competitor. (People nowadays probably don't realize how computationally intensive laser printers were.)
  
  somat 3 years ago
  
  There is an apocryphal story* where the fancy new laser printer was having strange issues and would become unresponsive for long periods of time. The sysadmin tasked with figuring it out managed to trace it down to one user who kept sending strange print jobs that took forever and never printed anything. When asked about the jobs the programmer said that the processor in the printer was much more powerful than anything else in the company so he added a postscript target to his engineering programs so they ran simulations on the printer.
  * That is, I don't remember where where I first heard it, probably a usenet post somewhere
  
  neolefty 3 years ago
  
  I heard it about Berthold K P Horn https://www.eecs.mit.edu/people/berthold-horn/, professor of machine vision. And it was extra convenient because he wanted to output the result of the computation as an image anyway, and what better way than with a high-quality printer?
  
  bee_rider 3 years ago
  
  Hmm, that almost seems like fair-game.
  
  kens 3 years ago
  
  My officemate Paul Heckbert wrote a ray tracer in PostScript after we realized that the laser printer's processor was much more powerful than our "real" computer. As you might expect, the interpreted PostScript ray tracer was extremely slow but it did eventually generate an image, I think at 32x32 resolution.
  
  NoZebra120vClip 3 years ago
  
  SPARCprinters were like WinModems of their time: they offloaded all the Display PostScript processing to the host. The printer was essentially interfaced like an external monitor; the host would do all the rendering and then read out a finished image for the printer to immediately lay down on paper.
  
  FullyFunctional 3 years ago
  
  The NeXT laser did the same.
  
  stevefolta 3 years ago
  
  I remember hearing that for a few years, the LaserWriter (or was it the LaserWriter II?) was the most powerful computer you could buy from Apple.
  
  kalleboo 3 years ago
  
  The original LaserWriter came out in 1985 with a M68000 CPU running at 12 MHz, with 512 KB working RAM and a 1 MB framebuffer.
  The Macintosh at the time had the same M68000 CPU, only running at only 8 MHz and with 128 KB of RAM (or 512 KB for the Fat Mac) which was shared with screen RAM.
  
  FullyFunctional 3 years ago
  
  That's misleading. The reason was not because it's a "laser printer" but because it was running PostScript (which is a full fledge programming language). Had the "API" been a trivial image printer (like the NeXT laser printer), then the compute requirements would have been far less intense.
  
  esafak 3 years ago
  
  Compared with dot matrix printers, sure. But more so than inkjets?

jecel 3 years ago

Great blog post. Some observations:

Ada was not originally an object-oriented language, though it eventually became so. But the 432 project predated Ada and pivoted to that language when it came out. Intel had to adapt Ada to work with objects (it was pretty close already).

An advantage of the 33 bit scheme to protect capabilities is that you can push them on the stack where they will get mixed with normal data and then pop them later such that everything still works. In the 432 style you would need two separate stacks or awkwardly work around not being able to push/pop capabilities.

The "AND NOT" instruction is interesting for using a mask to clear bits. which is why it is called "BIC" on the ARM.

While the 186 and 286 were flawed for making IBM PC compatible machines, for what they had been designed for they did a pretty good job.

brucehoult 3 years ago

>The "AND NOT" instruction is interesting for using a mask to clear bits. which is why it is called "BIC" on the ARM.
PDP-11 (1970) has BIC (dst & ~src) and BIS (dst | src) as its only boolean operations with full addressing modes, plus XOR in reg<-mem form only. VAX (1977) also has only BIC, BIS, and XOR.

pinewurst 3 years ago

I wouldn’t say BiiN sold fault-tolerant workstations. They were departmental and perhaps the next step up servers.

I remember evaluating the CA for a smaller router and found it a perfectly reasonable part/pricing especially when compared with the other options (some weird AMD 29k variant and the local reps of the Transputer cult who never accepted the non-arrival of the T9000).

kens 3 years ago

Thanks, I've updated the article.
- pinewurst 3 years ago
  
  I also vaguely remember BiiN failed due to being too slow for their price (despite their claimed MIPS) and that their Ada OS was a disaster of a software development project, not even sure if it seriously shipped.

b800h 3 years ago

With regards to BiiN, one thing I'm not familiar with was how the serial connection to, say, 80 terminals was handled on a machine like this. If I had 80 Wyse terminals (alas, I only have two), and wanted to hook them up to, for example, a Linux server, how would I go about doing that?

adrian_b 3 years ago

Thirty years ago ISA cards with 8 RS-232 interfaces were common, so connecting some 30 or even 50 terminals to a single PC was not difficult. I have used a few such computers for testing various devices with serial interfaces.
Eighty terminals on a single PC might have been more difficult, but by that time the computers were already networked, so it would have been cheaper to just use two PCs than to use some kind of terminal aggregator.
Beermotor 3 years ago

I'm going from 30+ year old memories as an intern, but from what I remember:
Usually the RS232 was just on the terminal side. Somewhere along the path it got converted to twisted pair.
All the twisted pair serial lines congregated in the server room at a punch-down box. Eighteen year old me wasn't allowed to mess with anything past that point, but from what I can remember those lines were concentrated by a multiplexer (mux) and sent on to the minicomputer.
- AnimalMuppet 3 years ago
  
  I think you're right. I remember seeing a lot of ads for muxes in the Datamation magazines I read as a kid.
  I can't give any details about them, because well, I was a kid, and the details of what they were talking about were beyond me...
icedchai 3 years ago

Back in the 90's, I remember working on an early ISP system that had a 16 port serial card, each connected to a modem. I forget who produced the card (Digi something?) There were revisions with at least 32 ports.
We moved to terminal servers pretty quick. These were dedicated devices that converted serial to IP (Livingston Portmasters, Xylogics Annex, Telebit Netblazers are the ones I remember working with.)
SoftTalker 3 years ago

You'd use a terminal server between the terminals and the linux box.
DECserver was a popular one back in the day.
https://en.wikipedia.org/wiki/DECserver
These days it might be more economical to just have something like a Raspberry Pi attached at each terminal. Then the connections to your network could all be wireless.
- NoZebra120vClip 3 years ago
  
  We didn't always use terminal servers in college. There was a cluster of 3B2s, and two were accessible to students from the terminal room. There were probably 40-50 terminals in the room, and I believe they were each hardwired to either "earth" or "wind". I don't know about the connectors or cabling which achieved that, because the servers were located on another floor of the building. But we were definitely hardwired in.
  The rest of the university had general-purpose terminal rooms that used port concentrators. These had a rudimentary CLI that enabled the user to specify a resource to connect to, so it was sort of a circuit-switch deal. Among the resources were VAXen running BSD, VMS machines, and Annex boxes which could be used for TCP/IP connections across campus, or to the outside world. These Annex boxes were also frontends for dial-up users coming in over modems.
kens 3 years ago

The BiiN systems supported RS-232, RS-422, and TTY current loop. I assume you plugged a serial interface board into the system that provided multiple serial ports. But it might have been an external terminal concentrator (or establishment controller in IBM lingo) that hosted the terminals.
- b800h 3 years ago
  
  I'm still unclear on this - I notice from online that Linux, for example, supports up to 256 TTY devices by default. How might one go about actually physically hooking up that many devices these days? Is a terminal concentrator still available?
  
  marcus0x62 3 years ago
  
  In the mid 90’s we used Cyclades multi-port serial cards to do this. These were PCI cards that connected to one or more breakout boxes that contained (in our case) 16 RS-232 ports each. We had 48 total ports connected to a 60mhz Pentium with 32MB of RAM.
  This was a decidedly low budget solution, even at the time. There were also commercial offerings from vendors like Cisco and Livingston that provided a terminal server in a dedicated appliance form factor.
  edit the breakout boxes look like this: https://149707953.v2.pressablecdn.com/wp-content/uploads/imp...
  
  kens 3 years ago
  
  Here's an example of a PCI card that supports 16 RS-232 devices: https://www.amazon.com/StarTech-com-PCI-Express-Serial-Card/... (not an affiliate link)
  
  johnklos 3 years ago
  
  Back then there were dedicated boards with large numbers of serial ports. These days you can get PCI(e) serial boards with, say, 16 serial ports.
  Personally, I'm using a Commodore A2232 seven port serial board in my Amiga 4000, which has its own 65CE02 CPU to offload work from the main CPU.
  
  icedchai 3 years ago
  
  What are you using it for? When I was a teenager, I dreamed of getting one of those boards so I could have a multi-line BBS on my Amiga!
  
  johnklos 3 years ago
  
  I use it to have serial consoles on a number of other machines for remote administration. The sessions get started automatically on boot with tmux and cu, so it's very convenient and easy - much easier than having a ton of USB-serial adapters hanging off of a newer machine.
  
  yjftsjthsd-h 3 years ago
  
  If you asked me to do it, I'd probably see how many serial/USB adapters I could plug into how many USB hubs attached to a single box. I can't see any reason it wouldn't work, and the parts are cheap enough, and not even particularly obscure. I suspect the failure point would be something in the stack not expecting to have to operate like that and having insufficient performance, but this is all speculative.
  
  EvanAnderson 3 years ago
  
  Perle continues to make the IOLAN device concentrators (they were Chase brand when I used them in the 90s). Today they’re marketed as “device servers” for connecting console ports on embedded devices to IP networks but you could still serve dumb terminals from them just fine.
  
  convolvatron 3 years ago
  
  16 port isa cards were a thing. apparently they are still made. not fond memories.
Spooky23 3 years ago

DEC or IBM terminal servers. When I was an intern, we migrated ~45k terminals from the legacy serial to IP.

eep_social 3 years ago

> Even though the SB didn't support memory management or objects, Intel didn't remove that circuitry.

I am curious about this detail. Was this due to production line convenience, an early example of binning, or probably something else entirely?

kens 3 years ago

I asked various people about why the KA/KB/MC/XA all used the same die even though they supported vastly different functionality. I couldn't get a solid answer, but several people thought that Intel probably used binning (downgrading chips if the more advanced functionality didn't work).
The SB was the next version, which used the old die layout with 16-bit bus circuitry wrapped around the outside. There wasn't any Sx version that used memory management or objects, so binning doesn't explain the unnecessary circuitry here. The SB was followed by the SA, which finally threw away the unused circuitry. The layout moved a few functional blocks around, so they did some work, but for the most part the die remained the same.
My guess is that chip design was difficult and fairly manual at the time, so it was very hard and time-consuming to make different versions. By the time of the Hx and Jx versions in the mid-1990s, I think layout was much more automated. Looking at the die photos, I don't see structural consistency from one die to the next. It looks like they pushed the button and got something completely different each time.
- dfox 3 years ago
  
  Somewhat famously the i686 designers boasted about the amount of automation used in the design (and the DEC's marketing department got a lot of traction from presenting Alpha as “hand designed”), so it makes sense to assume that earlier intel chips were mostly hand designed.

mumbel 3 years ago

https://github.com/mumbel/ghidra_i960

Added basic support for i960 in ghidra. Didn't have use myself, but some of the Sega model 2 seemed interested. To some degree I think they used it for some of the House of the Dead remake

h2odragon 3 years ago

the bond wires on only 2 sides thing is so wild. Perhaps the original use was supposed to be in clusters of 4? How many of these things did the NSA eat, i wonder?

kens 3 years ago

I think the bond wire pattern was just for layout convenience: if the buses end up on one side, why run wiring around to the other side?
As far as the NSA, I haven't come across any mention of the NSA using the chip, but I guess that doesn't say much :-) The chip was popular with the military and it seems like to would be good for NSA-style applications (processing large quantities of data). The i960 has bit instructions that would be useful for the NSA such as scan for the most-significant set bit in a word, although I don't think it has population-count which the NSA likes.