Hyper Pipelined - One of the secrets to a high-performing CPU is to pipeline the execution of instructions. While this increases latencies, and has other negative side-effects, it greatly increases the throughput of a processor, by allowing it to execute issue instructions every cycle. The Pentium 4 is, by Intel, considered to have "Hyper pipelined technology." The Pentium III and Athlon, by comparison are merely "Super pipelined." All this means is that the Pentium 4 breaks the execution of the instructions into smaller pieces, so as to allow the processor to reach higher clockspeeds.
Better Branch Prediction - The longer the pipeline, the more it hurts when dealing with branches in code, because the penalty for guessing wrong is greater. On the Pentium III, the Branch Prediction Unit had an efficiency of about 90-91%. Because the Pentium 4 has a pipeline length that is twice that of the Pentium III (even more if there is a Trace Cache miss), the penalty for guessing wrong in branches is greater than on the Pentium III. To help combat this, Intel improved the Branch Prediction Unit of the Pentium 4 over that of the Pentium III.
Rapid Execution Engine - The "Arithmetic Logic Units" (ALU) run at twice the base clock speed of the processor. This means that a 1.7ghz Pentium 4, which has two double-pumped ALUs, running at 3.4ghz! The most basic, and most used instructions, add, subtract (which is mathematically addition), Logical AND, and Logical OR, can be completed in one ALU cycle, which means that they complete in the time of one-half the base frequency.
Trace Cache - The Trace Cache, as implemented in the Pentium 4, has a storage capacity of approximately 12,000 micro-ops (RISC-ish instructions that are simpler to execute, and of uniform length, unlike x86 instructions). This has the benefit of lowering the branch penalty that the Pentium 4 would otherwise have, due to the fact that in the case of a Trace Cache hit, the instruction doesn't have to be decoded all over again. The other major benefit is that code is stored contiguously in the order that it was executed.
Quad Pumped FSB (100mhz, 400megatransfers) - The Pentium 4 has bandwidth like no other x86 CPU. A major reason for the massive bandwidth is due to the fact that the Pentium 4 has a quad pumped front side bus. Though it's not as good as a real 400mhz FSB (the latencies aren't as low as a theoretical 400mhz FSB), a quad pumped FSB provides 3.2Gigabytes per second of bandwidth. When paired with 2 channels of PC800, main memory matches perfectly with the bandwidth of FSB, unlike the Pentium 3 and the i840 chipset, which has the same theoretical main memory bandwidth, but is bottlenecked by a much slower, 1Gigabyte per second front side bus.
Advanced Transfer Cache - As stated above Intel's Pentium 4 is a bandwidth hog. As such, Intel opted to beef up the L2 cache even beyond that of the remarkable "Advanced Transfer Cache" used in the Pentium III (256Kb, 256-bit wide, 8-way associative, non-and blocking - for more info, see http://www.systemlogic.net/articles/00/10/cache ). The Pentium III's Advanced Transfer cache was 256bits wide, but could only transfer data every other cycle. The Advanced Transfer Cache of the Pentium 4 transfers data every cycle, thus providing twice the bandwidth per clock cycle over the Pentium III, resulting in a whopping 54Gigabytes per second of bandwidth to the core when at 1.7ghz. Even if a Pentium III managed to get up to 1.7ghz (which would be unbelievably difficult), it would only manage about 27Gigabytes per second of bandwidth from the L2 cache.
Hardware Prefetcher - Part of the reason why the Pentium 4 needs such huge amounts of main memory bandwidth is due to the fact that it implements a hardware-based prefetcher. The Pentium III introduced integer instructions that allowed a programmer to call instructions into a cache before it is actually needed. The only downside of this method is that it requires software support, which isn't always easy to come by, and it also can introduce slight code-bloat. The hardware prefetcher requires no such software support, and automatically fetches some instructions before they are needed. This uses bandwidth, and if there isn't enough, can actually cause main-memory bandwidth contention. The massive main memory bandwidth afforded by the Quad Pumped FSB allows the hardware prefetcher to basically "trade" excess bandwidth for lower average latencies.
iSSE2 - Similar in concept to MMX, 3dNow! and iSSE, iSSE2 is a SIMD technology. This means that if the same operation needs to be performed on multiple data elements, with SIMD, only one instruction is needed. The difference here is that iSSE2 provides double-precision floating point and 64-bit integer SIMD, using 8 new, 128-bit wide registers specifically for the technology. It, like its predecessor, is also fully IEEE compliant (meaning you get the results that one should expect according to a universal standard), while 3dNow! is not fully IEEE compliant (meaning the results from using 3dNow floating point operations can be ever-so slightly off).
Thermal protection - The Pentium 4 has an internal thermal diode which will throttle back the CPU if it gets exceedingly hot (such as if the fan dies on the heatsink). This protects the CPU from overheating as a Thunderbird Athlon would. Also, if the temperature exceeds 135 degrees Celsius, the CPU will shut itself down, so that it won't burn out. The Palomino core has a thermal diode as well, so future Athlons shouldn't have the same growing pains that the Thunderbird.
>> Should You Buy Now?