Slashdot Mirror


Overclocker Pushes Intel Core i7-7700K Past 7GHz Using Liquid Nitrogen (hothardware.com)

MojoKid writes from a report via HotHardware: If you've had any doubts of Intel's upcoming Kaby Lake processor's capabilities with respect to overclocking, don't fret. It's looking like even the most dedicated overclockers are going to have a blast with this series. Someone recently got a hold of an Intel Core i7-7700K chip and decided to take it for an overclocking spin. Interestingly, the motherboard used is not one of the upcoming series designed for Kaby Lake, but the chip was instead overclocked on a Z170 motherboard from ASRock (Z170M OC Formula). That bodes well for those planning to snag a Kaby Lake CPU and would rather not have to upgrade their motherboard as well. With liquid nitrogen cooling the processor, this particular chip peaked at just over 7GHz, which helped deliver a SuperPi 32M time of 4m 20s, and a wPrime 1024M time of 1m 33s. It's encouraging to see the chip breaking this clock speed, even with extreme methods, since it's a potential relative indicator of how much headroom will be available for overclocking with more standard cooling solutions.

21 of 139 comments (clear)

  1. Most depressing thing I've read all week by phantomfive · · Score: 5, Interesting

    If I recall correctly, the first time someone got over 8 Ghz was back in ~2004, over a decade ago. I know clock speed isn't everything, but parallelism will only get you so far. I really hope before we get to 5nm chips, we can get some 20 Ghz clock speeds. The amount of work you'll be able to do on a single thread will be amazing.

    --
    "First they came for the slanderers and i said nothing."
    1. Re:Most depressing thing I've read all week by Anonymous Coward · · Score: 4, Informative

      If I recall correctly, the first time someone got over 8 Ghz was back in ~2004, over a decade ago. I know clock speed isn't everything, but parallelism will only get you so far. I really hope before we get to 5nm chips, we can get some 20 Ghz clock speeds. The amount of work you'll be able to do on a single thread will be amazing.

      The only thing inherently inefficient about parallel computing is the inefficiency created by the overhead required to keep the software consistent and coherent. The real problem with multi-core computing is very little software is written in such a way that it can run on multiple CPUs. Hell, my professors were saying that in college 15 years ago, and it's still true today.

    2. Re:Most depressing thing I've read all week by phantomfive · · Score: 5, Interesting

      The only thing inherently inefficient about parallel computing

      Not all things are parallelizable. Sometimes you must wait for part A to finish before part B can begin. The obvious example is anything requiring user input (like games). Another example is databases.....when they wan to maintain ACID, they must do some things sequentially (which is unfortunately a huge bottleneck). See also, Amdahl's law.

      --
      "First they came for the slanderers and i said nothing."
    3. Re:Most depressing thing I've read all week by phantomfive · · Score: 3, Insightful

      Not everyone is putting their crap on AWS. Some of us still like our crap local. Not everything is 'in the cloud' or even 'web based.'

      --
      "First they came for the slanderers and i said nothing."
    4. Re:Most depressing thing I've read all week by Ramze · · Score: 4, Interesting

      Intel gave up on increasing clock speeds way back when they hit 4Ghz. They hit a wall, and they're done, so I wouldn't expect them to revisit it. That's when they went to multi-core. Every computer does better with dual core over single core. Most do better with quad core than dual core. (because even if a single program isn't compiled for multi-core, different programs can be assigned different cores). With VR tech and GPUs added to the cores, multi-core is likely going to continue to be the area of development for some time. As always, expect new physics and graphics extensions as well as codecs.

      Multi-core means managing the power and speed of each core individually and allowing some to power down while ramping up one or two to keep the thermal and power envelopes within tolerances. The biggest metric for Intel is performance per Watt -- as data centers are concerned about power usage for the machines and the air conditioning systems.

      I don't think there is enough of a market for enthusiasts that want 20 Ghz clock speeds for Intel to bother even doing the research for new materials to pull that off... assuming it's even possible without extreme cooling.

    5. Re:Most depressing thing I've read all week by phantomfive · · Score: 2

      Intel gave up on increasing clock speeds way back when they hit 4Ghz

      Well yeah, they 'gave up' because the technology wouldn't allow them to increase the clock speed. Halfing the die size didn't give an automatic speed boost like it had in previous generations.

      But if the next halfing of the die size does give a speed boost, do you think Intel will reject it? Of course not. A doubling of clock speed (all else equal) gives you better performance than adding an extra core.

      --
      "First they came for the slanderers and i said nothing."
    6. Re:Most depressing thing I've read all week by Ramze · · Score: 5, Insightful

      This is true, but the CPU isn't the bottleneck for your examples. For user input (especially games), the user is the bottleneck. Games largely benefit from parallelization for rendering graphics. The logic isn't the bottleneck, and the latency for the response of user input is imperceptible to the human. For most instances, the RAM and CPU are waiting on the human and already have everything loaded to respond to the human. If a human's choice requires the loading of a different zone, the game could even predict which zone would load and pre-load a zone without human input, but dump it if the input wasn't what was predicted. Still, it's the I/O for the disk that's the bottleneck, not the CPU.

      As for databases, the biggest bottleneck is the storage medium. Depending on the database and how it's divided, one can even run many tasks on the same database simultaneously so long as the tables don't interact. Ramping up the CPU speed does little to nothing if the I/O to the storage medium of the database is slow b/c the db won't unlock the region of the database for the next transaction until the last transaction is written at least to a buffer if not the final storage medium.

      For that example, the best way to improve DB processing is to add RAM, add cache, and increase the clock speed of both.... if possible, even let entire tables if not the full database to exist in RAM and only write to disk periodically as a save-state. Even DDR4 2400 RAM only operates around 1.2 Ghz, though with access on rising and falling edge, it's effectively 2.4 ghz. What is your 20 Ghz CPU going to do with 10 cycles between every read and write to RAM ? Current Intel CPUs have a 4 stage pipeline. Even with a sizable cache at a higher speed, it's going to choke on the RAM latency... especially for large sequential database transactions. RAM is already hot enough to fry eggs on, so it'll be until the next RAM replacement tech comes out before we see some real boosts there. Maybe in a year or two.

      I'm curious what exactly you'd like to run at 20 Ghz through the general purpose CPU registers that can't be done better/faster with extensions using specialized hardware. For instance, x265 HEVC video playback can really heat up a CPU to nearly 100% usage, but if it has x265 decoding hardware, the CPU barely breaks 1% playing the same video on a similar CPU architecture and speed. Seems if you have a single thread that you need to have repetitively run at very high speeds, you'd rather have a FPGA or some other hardware to accommodate whatever you're trying to do rather than a general purpose cpu.

    7. Re:Most depressing thing I've read all week by Joce640k · · Score: 2

      Learn C#6 and make your code multithreaded too. It isn't going away, and people like me will eat your lunch in the coming year. Nothing else scales.

      I don't know whether to LOL or facepalm.

      People like me have already eaten that particular lunch, many years ago.

      --
      No sig today...
    8. Re:Most depressing thing I've read all week by Anonymous Coward · · Score: 2, Informative

      Frequency is not an actual measure of CPU speed, as the average number of instructions per cycle were way smaller in 2004 (on a P4 CPU) than it is today.

      Today, processors have much smaller pipelines (14 vs Prescott's 31). Pipelines contain instructions scheduled to be executed, being in different stages of executions (eg: decoded, having their necessary input ready, having their result computed in a temporary storage but not yet committed). To run a cpu efficiently, you need to keep the pipeline full. When a branch instruction (eg: "is the value of register CX 0"?) is encountered, the cpu will predict it's outcome ("no, it's not 0") and starts loading and decoding the instructions for that path in the branch to keep the pipeline full. If CX turns out to be 0, it will have to discard the entire pipeline and start to fill up with the path for the "yes, it's 0" case. Shorter pipelines reduce this penalty, resulting in a larger number of average instructions executed per cycle.

      Branch prediction also improved greatly, so you don't need to pay the already smaller penalty that many times.

      The number of cycles the execution of a single instruction takes also shrunk. An integer division took 34 cycles on a Prescott, takes 23-26 cycles on a Skylake. For simpler instructions the relative difference is even bigger, eg a MOV was at least 2 cycles for a Prescott, now it is usually 1 cycle.

      The average amount of data processed by an instruction also improved with new instructions (eg AVX, AES-NI).

      The cumulated result of these improvements (more data per instruction, more instruction per cycle, less cycles wasted in pipelines) are many times faster CPUs at the same clock rate, even without considering multiple cores.

    9. Re:Most depressing thing I've read all week by Opportunist · · Score: 2

      Ever heard the phrase "size doesn't matter, it's knowing how to use it"? Same for raw CPU speed.

      First of all, branch prediction and pipelining has become way better in those past 10 years. Pipelines are much shorter today, and coupled with near perfect branch prediction, this alone speeds up the CPU by a factor 2 to 3. The reason for this is simply that a branch (a conditional jump, to be exact) used to mean that the CPU had to dump everything it had in its instruction pipeline and start anew from where the jump lead to. Worst case that were 30-40 cycles wasted. Per jump. I think I needn't explain how this can slow a CPU down, e.g. in tight loops where you have a handful of instructions that should (and now do) take a handful of cycles but took like 10 times as long to complete.

      That is, by the way, one of the reasons why encryption algorithms are so much faster on contemporary CPUs than they were on older models and why key sizes have quadrupled at least since 2005 to still be considered "secure".

      And of course, as many have pointed out already, the CPU isn't really the bottleneck in the current PCs. Applications we use today of course need calculations, but they are heavily dependent on periphery, if nothing else, RAM. And that is already dead slow compared to CPUs. Let's not even think about mass storage where access speeds are still measured in milliseconds. Ponder for a moment: You have a CPU running at 3 GHz, at 3 BILLION instructions per second. And you have a hard drive with an access time of 3ms (which is ... let's say faster than anything I've seen in a HDD). The CPU would have to wait for over 9000 cycles before the HD could even start to answer its request. We haven't even read anything yet, we have just waited for the HDD to spin to the point where we could start thinking about reading something from it.

      So no, the CPU isn't really the bottleneck in a contemporary PC. If you want to speed it up, get the rest of the box in gear.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    10. Re:Most depressing thing I've read all week by Kjella · · Score: 2

      I'm curious what exactly you'd like to run at 20 Ghz through the general purpose CPU registers that can't be done better/faster with extensions using specialized hardware.

      Nothing, obviously. I'll just submit my DFS4ME (do funky shit for me) instruction to Intel and I'm sure they'll put it in the next stepping or create a special batch just for me. I can even pay $50 extra, though I need it next week. I'll also reverse engineer and patch that proprietary binary I got to use the new instruction, that totally won't be any work or void any support. Or I could buy that 20 GHz machine and have everything magically work much, much faster. Nah, I'll just do the first one.

      --
      Live today, because you never know what tomorrow brings
    11. Re:Most depressing thing I've read all week by m.dillon · · Score: 2

      Paging to a hard drive doesn't really work in this day and age, the demands of the VM system (due to the commensurant increase in scale of modern machines) are well in excess of what one or two HDDs can handle.

      However, virtual memory works quite well with a SSD. Sure, the SSD isn't as fast as memory, but the scale works similarly to how cpu caches vs main memory scaling works. Its back in the ballpark so the system as a whole works quite well.

      It depends on the workload of course... browsers are particularly bad if they exceed available ram but its primarily because browsers fragment their memory space badly. Firefox, for example, with a 4GB VSZ, will keep 3GB in core no matter what due to fragmentary access, even though it might only be using 1GB worth of memory in its actual accesses.

      For example, one of our bulk builders has 128GB of ram and roughly 200GB of SSD swap configured. In order to configure enough paralllelism to keep the 48 cores fully loaded at all times a portion of the build (when it gets to the larger C++ projects) will require more than 128GB of ram and start eating into the swap space to the tune of another ~100GB or so. However, the cpus are still able to load to 100% because there is enough parallelism to absorb the relatively fewer processes blocked on page-in.

      Similarly, my little chromebook with 4G of ram has 16GB of SSD swap configured (running DragonFly of course, not running Chrome), and has no problem with responsiveness despite digging into that swap quite extensively.

      So virtual memory does, in fact, work well. And it will work well in most use cases when configured properly with a SSD as backing store. One can also go beyond the SATA SSD and throw in a NVMe SSD for swap, which is even faster (~3GBytes/sec reading for a cheap one). Given that main memory typically has 25-50 GByte/sec of bandwidth, that's only a 8x to 16x difference in speed.

      -Matt

    12. Re:Most depressing thing I've read all week by m.dillon · · Score: 2

      The main bottleneck for a modern cpu are main memory accesses. What is amazing is that all of the prediction and the huge (192+) number of uOPS that can be on the deck at once is able to absorb enough of the massive latencies main memory accesses cause to bring the actual average IPC back towards roughly ~1.4. And this is with the cache misses causing *only* around ~6 GBytes/sec worth of main memory accesses per socket (with a maximum main memory bandwidth of around 50 GBytes/sec per socket, if I remember right).

      Without all of that stuff, a single cache miss can impose several hundred clock cycles of latency and destroy average IPC throughput.

      So, for example, here is a 16 core / 32 thread - dual socket E5-2620v4 @ 2.1 GHz system doing a bunch of parallel compiles, using Intel's PCM infrastructure to measure what the cpu threads are actually doing:

      http://apollo.backplane.com/DF...

      Remember, 32 hyperthreads here so two hyperthreads per core. Actual physical core IPC (shown at the bottom) is roughly 1.39. At 2.1-2.4 GHz this system is retiring a total of 55 billion instructions per second.

      In this particular case, being mostly integer math, the bottleneck is almost entirely memory-related. It doesn't take much to stall out a core. If I were running FP-intensive programs instead it would more likely be bottlenecked in the FP unit and not so much on main memory. Also note the temperature... barely ~40C with a standard copper heatsink and fan. Different workloads will cause different levels of cpu and memory loading.

      -Matt

    13. Re:Most depressing thing I've read all week by ChrisMaple · · Score: 2

      Intel is sill cheaping out on the thermal interface goo. People who want top performance are having to de-lid the CPUs and use good stuff.

      In order to save 10 cents on a $200+ CPU, Intel is deeply cutting potential performance and increasing the thermal stress on the chips. More thermal stress means shorter time to failure.

      --
      Contribute to civilization: ari.aynrand.org/donate
  2. umpf... by 4im · · Score: 2

    I googled for "highest cpu clock speed" and got e.g. http://valid.x86.fr/records.html

    It seems this is a far cry from what's been done elsewhere, with numbers there showing over 8.5GHz.

    Anyway, my criteria are rather low-energy, low-noise computers than extreme clock frequencies, even if I can make use of them.

  3. Meanwhile, the bus speed... by inflex · · Score: 4, Informative

    ... lumbers along at 100MHz still.

    Time we started working on that side of the hardware some more.

  4. Slightly offtopic by Artem+S.+Tashkinov · · Score: 2
    Let me copy this comment since I believe it's kinda relevant.

    Kaby Lake is an embarrassment and should have never seen the light of day - at least not as a "new" architecture - it's anything but "new". The CPU core is completely the same, the GPU core is minimally improved. Intel should have released it as SkyLake with XX50 postfix (6750/6650/etc) because it's what it is.

    I'm quite sure it's Intel's marketing department and investors who insisted that Intel should release something "new". I want to believe Ryzen will pan out to give a good blow to Intel and then we'll have some real improvement/competition.

  5. Re:Back in Reality by wbr1 · · Score: 2

    Wait for Zen. Should be good. Intel's repeated 5% gains are not enough to drive an upgrade cycle for me or anyone I recommend to. 3-4Gen is still more than adequate for most tasks.

    --
    Silence is a state of mime.
  6. Computers are still too slow by Bruinwar · · Score: 2

    My dream (fantasy?!) is for computers (& networks!!) that are faster than me. It should be waiting on me, not me waiting on it. All day long I wait on my computer. They are not even close to being fast enough. It doesn't help at all that the software seems to be slower than ever.

    --
    SLOWER TRAFFIC KEEP RIGHT
    1. Re:Computers are still too slow by yuvcifjt · · Score: 2

      Erm, pretty stupid comment if you just think for a few seconds...
      Unless you're encoding video or 3d-rendering all-day-long, for most people, the CPU is almost always idle for everyone else and just wasting power waiting for human input.

      Even when compiling code, or rendering a web page, or doing a database transaction, the CPU wakes a couple of cores for a few seconds before it returns to almost idle.

      And by the way, if you see the CPU meter at 100%, it likely that it's not the CPU that's the bottleneck, but rather the memory (or the bus), especially when gaming.

  7. Re:parallelism vs raw clock speed by m.dillon · · Score: 2

    The unfortunate result of VLIW was that the cpu caches became ineffective, causing long latencies or requiring a much larger cache. In otherwords, non-competitive given the same cache size. This is also one of the reasons why ARM has been having such a tough time catching up to Intel (though maybe there is light at the end of the tunnel there, finally, after years and years). Even though Intel's instruction set requires some significant decoding to convert to uOPS internally, it's actually highly compact in terms of the L2/L3 cache footprint. That turned out to matter more.

    People often misinterpret the effects of serialization. It's actually a matrix. When a portion of a problem has to be serialized it winds up adding latency, but requiring a portion of a problem to be serialized does not necessarily mean that the larger program cannot run with parallelism. There will often be many individual threads each having to run serially which in aggregate can run in parallel on a machine, and in such cases (really the vast majority of cases) one can utilize however many cores the machine has relatively efficiently.

    This is true for databases, video processing, sound processing, and many other work loads. For example, if one cannot parallelize video compression on a frame by frame basis that doesn't mean that one cannot use all available cpus by having each cpu encode a different portion of the video.

    Same with sound processing. If one is mixing 40 channels in an inherently serialized process this does not prevent the program from using all available cpus by having each one mix a different portion of the overall piece.

    For databases there will often be many clients. Even if the query from one particular client cannot be parallelized, if one has 1000 queries running on a 72-core system one gains scale from those 72 cores. And at that point it just comes down to making sure the caches are large enough (including main memory) such that all the cpus can remain fully loaded.

    -Matt