Sunday, June 10, 2012

Pin It


Get Gadget

Explained: Nvidia Kepler GeForce GPU architecture explained


High above the sun-baked streets of San Francisco, in a hotel bar - which bears more than a passing resemblance to the Emperor's throne room on the unfinished Death Star - I find myself talking to a vice president at Nvidia (who shall remain nameless).



When challenged that Kepler was essentially just a die-shrink of the Fermi architecture, his reaction was not the one I expected. The amount of alcohol imbibed by us both may have had something to do with the vehemence with which he countered the argument. Suffice to say, he referred to the previous generation in less than glowing terms - derogatory terms that probably shouldn't find their way onto a site as innocent or polite as TechRadar.

He's right though; while there are definite similarities to the last Nvidia graphics technology, Kepler is still a different beast to Fermi, and is much more than just the actual hardware too. But lets cover that die-shrink first.

Like AMD, Nvidia has taken the plunge and chosen a 28nm production process for this generation of its graphics cards. That means that, compared with the GTX 580, this new GTX 680 can achieve a far smaller die size and still manage to cram in more transistors than ever before.

We're talking about the GTX 680's GK104 GPU as a 295mm2 chip, compared with the 520mm2 GF110 in the GTX 580. That's a smaller chip than the GTX 560, with another 500 million transistors more than the GTX 580. It's also a far smaller chip than AMD's 28nm Tahiti GPU at 352mm2, although AMD is claiming to have packed over an extra billion transistors into its top-end chip.

From those simple figures, you could easily infer that the Nvidia architecture is considerably less power-hungry than either its predecessor or the competition, and you'd be right. The GTX 680 is actually a sub-200w card, operating at its base clock of 1,006MHz at 195W under full load, while the GTX 580 and HD 7970 are 244W and 230W cards respectively.

So many shaders...


Epic's purdy Samaritan demo shows what the GTX 680 can do

Those numbers might look impressive on the surface, but what's actually going on inside and why are we now talking about a GPU's base clock as if it were a CPU?

Despite the ever-bombastic Gearbox CEO Randy Pitchford referring to it as 'a simulation CPU' because of the advanced PhysX and Apex effects, this is still very much a gamer's GPU. But Nvidia has taken more than a leaf out of Intel's book - more on that later.

First of all, let's take a look at the make up of the GK104 GPU. Like Fermi, the Kepler GPU is made up of multiple CUDA cores jammed into multiple Streaming Microprocessors (SMs). These SMs act like simple processors, each concurrently taking on an action, making for impressive parallelism a cornerstone of GP-GPU computing.

But these are no longer called plain ol' SMs. Oh no, they're now called SMXs, which by today's nomenclature probably stands for Streaming Microprocessor Xtreme. But compared with the old SMs of the Fermi days, they could easily be deemed 'Xtreme'.

Previously each contained a total of 32 CUDA cores; in Kepler that figure stands at a whopping 192. Even with half the SM blocks of the GTX 580, you're still looking at 1,536 CUDA cores/shaders spread out over 8 SMXs.

Nvidia is claiming a 2x improvement in terms of performance/watt compared with the GTX 580. That figure seems a little conservative considering the GTX 680 comes with three times the CUDA cores, but while they are physically identical to the cores in the Fermi architecture, they are clocked much slower. In fact, they're half as fast because Nvidia has decided not to have a separate shader clock (which has historically been set twice as fast as the GPU clock). Instead we have one solitary base clock covering everything.

Boostin'

And there it is again - that base clock. For every Kepler-based graphics card we're now going to be quoting two separate frequencies; one is the base clock and the second is the Boost clock.

Nvidia has been totally honest and admitted that it copied the idea from Intel - that it's "standing on the shoulders of a great company," as Drew Henry, general manager of GeForce's desktop unit puts it. So we now have Turbo Boost for GPUs - the snappily titled GPU Boost.

In previous GPU generations, the final clockspeed was determined by the worst-case scenario of the application power usage, which meant typically taking the most power-hungry app around and setting the clockspeed to match that power draw. But the draw on the GPU varies massively between different programs. Just taking Fermi as an example, the power required for apps could vary by as much as 50 per cent, so on lower-powered apps there's a lot of GPU headroom not being used.

GPU Boost analyses the amount of power an application is using and boosts the GPU frequency with the amount of extra headroom it has at its disposal. It's also completely application independent - you won't need a new Kepler profile when a game gets released to take advantage of GPU Boost - it's all based on measurements coming directly from the GPU's sensors in real time. Kepler can dynamically alter its clock and voltage every 100ms; essentially every few frames the GPU has a decision to make on clocks and voltage settings.

Since this is just the first generation of GPU Boost, you can expect that to be done more quickly over time. This auto-overclocking doesn't mean an end for traditional overclocking though. "GPU Boost is like overclocking's little buddy… our GTX 680 is a monster overclocker," says Tom Petersen, Nvidia's director of technical marketing.

Each of Nvidia's partner card manufacturers has been given all the API goodness with which to prepare its own overclocking apps, so far we've had chance to try EVGA's Precision tool. There are two sliders, one for altering the base clock and another for allowing the GPU to use extra power - up to just over 260W. GPU Boost will still work when you've slammed both those sliders to the maximum too. You can see just how far in the GTX 680 review.

Just as interesting is the frame rate targeting. You can set the software to target a particular frame rate. This is only used to set a target, not to force the GPU to overclock to get there, but if you give the GPU the frequency headroom and the capability to draw as much power as it can then it will push the clocks up to ensure it hits your specific frame rate.

This is arguably more interesting at the mid-range than the high-end. The prospect of being able to tell your £200 card to hit 30 frames per second in all games will give even the least savvy PC gamer console-like performance with minimal effort. It's a tantalising prospect and shows that Nvidia is really focusing this release on the gamer, not the GPU compute all-rounder that Fermi had been touted as.

View the original article here

No comments:

Post a Comment