SLCentral - Your logical choice for computing and technology
  • Home
  • Search
  • Forums
  • Hardware
  • Games
  • Tech News
  • Deals
  • Prices
  • A Guru's World
  • CPU/Memory Watch
  • Site Info
  • Latest News
    Corsair TX750W Power Supply Unit Review
    Businesses For Sale
    Shure E530PTH Earphones Review
    Guide to HDTVs
    Cheap Web Hosting
    >> Read More
    Latest Reviews
    Corsair TX750W Power Supply Unit - 4-/-0/2008
    Shure E530PTH Earphones - 9-/-0/2007
    Suunto T6 Wrist Top Computer - 1-/-0/2007
    Suunto X9i Wristwatch - 9-/-0/2006
    Shure E3g Earphones - 5-/-0/2006
    >> Read More
    SL Newsletter
    Recieve bi-weekly updates on news, new articles, and more

    SLCentralArticlesTech Explanations Oct 19th, 2019 - 2:09 PM EST
    Intel Pentium 4: In-Depth Technical Overview
    Author: Paul Mazzucco
    Date Posted: August 3rd, 2001


    For the last couple years, there has been a serious ongoing battle between the top x86 manufacturers for performance dominance in the consumer PC market. In this article, we discuss the technical background with respect to the Intel Pentium 4 Processor, and how it achieves its performance levels, and how this will affect the future battles for x86 supremacy. It also discusses how varying code-types affected the Pentium 4's design points.

    Hyper Pipelined

    Modern x86 microprocessors have been increasing the length of their pipelines since the 486. The reason is not because having a longer pipeline increases performance all by itself - it in fact decreases performance (discussed in the next section). Pipelining actually decreases the amount of work done each cycle, because the workload is spread out over more stages. However, pipelines allow the greater throughput, and allow greater clock speeds (because each stage is less complex, each one can run faster - think of it as an assembly line). The Pentium 4 stretches the work over a staggering 20 stages! All this in the never-ending quest for more MHz.

    It will be explained later why the number of stages can sometimes be as many as 28.

    Branch Prediction

    The greater the number of stages, the more stages that have to be "flushed" (cleared out) if there is a branch mispredict. When a processor encounters a conditional statement (such as an if/else statement), rather than simply waiting for the answer to the condition, modern processors use what's called "Branch Prediction." This means that if the processor guesses right, it will have saved a lot of time (the time it takes for the processor to computer the condition). However, if the processor guesses wrong, it means that it has to flush all the work it's already done, and then start it all over. In a worst-case scenerio, the penalty for a mispredict is 19 cycles! This is greater than the Pentium III's, but this should be no surprise as it has a longer pipeline.

    On the Pentium III, Intel had a Branch Prediction Unit, which had an average accuracy of about 90% when predicting branches. This seems pretty good, at first glance, no? I'd love to simply be able to guess right (with prior knowledge about how my guesses turned out, of course) 90% of the time on true or false exams! However, in processors with long pipelines, 90% simply isn't good enough. Intel has stated that approximately 30% of real world performance is thrown out the window due to times when the processor guesses wrong. Given that the penalty for the Pentium 4 is potentially longer than that of the Pentium III, it should come as no surprise that Intel opted to improve the efficiency of their Branch Prediction Unit! As such, Intel has increase the number of entries in the history table eight-fold over the Pentium III (this happens to be the size of the history table of the K6-x family, though AMD has stated that it was overkill for a chip with a short pipeline)!

    Intel has claimed that they have reduced the misrate of the Branch Prediction Unit on the Pentium 4 by 30% over that of the Pentium III. Given that the Pentium III had an average prediction rate of about 90%, this means that the Pentium 4's branch prediction rate is somewhere around 94% (because 30% of 100-90% ~= 4%). Missing a branch is so costly on a processor with such a long pipeline that it was quite necessary to avoid guessing wrong as much as possible.

    In the next section we'll see how the Pentium 4's Trace Cache helps to alleviate some of the issues with having long pipelines.

    >> The P4's Caches

    Article Options

    Post/View Comments   Post/View Comments
    Print this article   Print This Article
    E-mail this article   E-Mail This Article
    Article Navigation

    1. Introduction/Hyper Pipelined/Branch Prediction
    2. The P4's Caches
    3. Bandwidth And The Line-Sizes
    4. Hardware Prefetch/Some Of The "Guts"
    5. iSSE2
    6. Thermal Protection
    7. Conclusion
    8. Bibliography

    Did you like this interview?
    Browse the various sections of the site
    Reviews, Articles, News, All Reviews...
    Reviews, Articles, News...
    Regular Sections
    A Guru's World, CPU/Memory Watch, SLDeals...
    Forums, Register(Free), Todays Discussions...
    Site Info
    Search, About Us, Advertise...
    Copyright 1998-2007 SLCentral. All Rights Reserved. Legal | Advertising | Site Info