How Cache Sizes Affect Yields
When looking back at the K6-III, the yields were relatively low. This is partly due to the much larger die size (recall because of the massive number of transistors that 256kb of on-die L2 cache uses. "Backup" cache-lines can be added to allow the disabling of defective cache-lines in the cache, but AMD didn't use much, because adding additional backup cache-lines increases die size. So yields were relatively low, and to salvage totally defective L2 caches, AMD just disabled it, and sold it as a K6-2 if the core was fine.
Intel has taken a different approach with the P3 Coppermine. If one half of the cache is bad, but the other is fine, Intel just uses a laser to fuse the second half of the cache, and disable it. This saves the good part, and allows Intel to sell it as a Celeron. While they do this, they often just disable half of it anyway to get a Celeron to sell. This is actually cheaper than designing a chip with half the L2 cache, because Intel does not have to create different masks and such for another chip.
One would think that AMD would do with their Duron and Athlon the same as Intel does with their Celeron and P3 Coppermine. However, they did not. When disabling a section of the cache, it also cuts the associativity proportionally with the fraction that was disabled. Thus the Celeron is only 4-way associative while from the same core. This reduces hit rate, and that's not something that AMD was willing to do. Also, simply disabling a part of a cache merely to have a product another market isn't an efficient use of fabrication capacity. This is something that AMD has to deal with much more than Intel does, because AMD has only two fabs, while Intel has many more.
What this amounts to is that the Duron and the Thunderbird have different cache sizes, yet the same architecture, latencies, and associativities. AMD implements more redundant cache-lines than they did on their K6-III series, so yields are much higher, plus the sizes of the Thunderbird is about the same size as the K6-III due to the smaller process technology (with the Duron being about 20mm^2 smaller).
An Aside for What Might Have Been...
While the quote by Paul DeMone wasn't made with RISE (this company attempted to join the x86 PC world, but failed miserably) in mind, it is interesting to apply this concept to the mp6, their first x86 processor. The mp6 (which, interesting enough, has been introduced "for the first time" twice! [for reference, I read about that at JC's - http://www.jc-news.com/pc]) was introduced with a miserly 16kb of L1 cache. The last time that was found on an x86 processor was the original Pentium and the Cyrix 6x86. RISE did, however, have plans to add 256k of L2 cache on-die, and they even came out with a socket370 version (they had a GTL+ bus license from the manufacturer, as they run a fabless model) which never made it to market. The reason that this is so interesting, is because they followed Paul DeMone's statement about having a low latency L1, they got carried away as they didn't have the large and fast L2 cache to back it up.
The core of the mp6 (without going into the gory details, which can be found here and here) had the ability to execute many instructions at once (it had a fully pipelined FPU even before the Athlon came out). But it was stuck with its tiny L1 cache, which, though it had an incredibly low latency (1 cycle!!!), was supported only by whatever cache was on the motherboard until the second incarnation of the mp6, the mp6II, could come out with integrated L2. This is why they claimed their PR rating (yes, it used it...) scaled so well with the FSB: the FSB was also the speed at which the L2 cache operates at for socket 7 motherboards. If RISE had gotten the version out with L2 cache on-die, perhaps they would have had a very fast solution per clock, but alas, they did not, and they have been relegated to the appliance market (perhaps not a bad thing considering its growth potential) because of poor performance (see here).
PR rating and dismal cache architecture aside, it had another problem, namely the clock speed (which is indeed what necessitated the PR rating, must like AMD's K5 and Cyrix's 6x86 core derivatives). Given that it was bandwidth-starved, the addition of the L2 cache could have allowed it to continue to use the PR rating, but as it was, a small company often has great difficulty in getting odd bus speeds to become standard (look at Cyrix - they did it, but it took a while, and never truly became standard).