The SPARC, or Scalable Processor ARChitecture, is a RISC architecture designed at UC Berkeley in the mid 80s, primarily by David Patterson. The original version was called the RISC I, which is thought to be the first VLSI RISC CPU design.
The design was picked up by Sun, who was shopping around for a new CPU to create a new line of workstations with, replacing the Motorola 68k processors that the early Sun machines were built on. The first commercial SPARC-based machine was the Sun-4, released in 1987, and Sun continues to market and sell systems built on variations of the SPARC architecture, though they also sell machines based on various x86 and x86-64 CPUs.
The SPARC is in most respects a classic load/store RISC architecture: 32 integer registers, 32 floating point registers, and hardcoded /dev/null register. The first widely used version, v7, didn't even include a multiply instruction, just a simple multiply helper instruction from which a compiler could synthesize a multiplication. This was a case of the SPARC being a little too RISC for it's own good; for certain operations, specifically public key crypto, you need to do a lot of multiplications, which is why opening up an SSH or SSL connection with a SPARCv7 machine (or a later machine running code compiled for v7) takes forever, as the CPU grinds through the multiplications. This particular failing was fixed in the v8 architecture; in fact the only changes between v7 and v8 were the addition of multiplication and division instructions.
One interesting thing about the SPARC design is register windows. The basic idea is that the system can provide an nearly arbitrary number of actual hardware registers, and it cycles through sets of them during function calls (this is only visible to the operating system; user space code only ever sees 32 registers at a time). During design time, simulation had suggested that this would allow for better performance, but these simulations were flawed, because they did not take into account context switching. A typical machine will perform context switches dozens or hundreds of times a second, and as the number of registers increase, the context time becomes more and more expensive.
The SPARC v9 architecture adds 64-bit registers and operations, and was marketed as the UltraSPARC by Sun. This is an evolutionary upgrade, and in fact aside from the larger address space few programs take advantage of the features v9 offers. The other additions include adding more floating point registers as well as 128-bit floating point operations. In current systems the quadword floats are not implemented in hardware, but are emulated by the operating system, so the performance hit negates most of the advantages. Some of the changes were made to support more advanced operating system constructs or to support high-end hardware with hundreds or thousands of CPUs, so they won't be seen outside the implementation of the kernel or libc. One interesting addition was VIS, which adds a handful of SIMD instructions, but outside of Sun-written software nobody uses it. Compared to MMX/SSE, VIS is extremely limited, more in line with the Alpha's MAX extension.
SPARCv9 has the deficiency that while most instructions were extended to support 64-bit operations, there is no way to multiply two 64-bit words to get a 128-bit result. This is an oddity among 64-bit machines; nearly every other 64-bit architecture in existence supports this operation, and I have no idea why the SPARCv9 designers thought doing this was a good idea. This negatively affects the performance of crypto algorithms, especially public key crypto like RSA and Diffie-Hellman. This is because these operations rely on performing multiplication of large (typically 128 bit to 4096 bit) integers. If one integer is n words long, and the others is m, basic multiplication algorithms take about n*m operations. If you double the word size, you halve the number of words needed, so the multiplication takes only (n/2)*(m/2), or (n*m)/4, operations. This four-fold speedup is pretty noticeable, but this is difficult to do on the SPARC because of the lack of a full-word multiply instruction. You can synthesize such a multiplication out of 4 32-bit multiplies, but this hurts quite a bit; after normalizing for clock speed, a SPARCv9 is several times slower than an Alpha or Opteron at public key crypto. Someone at work theorized that this was so Sun would make more money selling SSL accelerator cards.
Since 1989 the actual SPARC instruction set architecture has been defined by SPARC International, a consortium of SPARC vendors, which is primarily led by Sun and Fujitsu, but also includes OEMs such as Texas Instruments, LSI, and Tadpole.
An amusing side note: Tim May, the raving paranoid cypherpunk, made the following claim in 1993
Sun Microsystems was ordered by the NSA to redesign their chips to capture keys, which is why the SPARC processor was introduced. SPARC stands for "Sun Processor Allowing Remote Capture."
Personally, I'm more worried about the orbital mind control lasers.
Sources:
http://www.cs.berkeley.edu/~pattrsn/bio.html
http://www.sparc.com/history.html
http://www.sparcproductdirectory.com/history.html
I would like to thank OldMiner for pointing out an error in an earlier version of this writeup.