Hardware Considerations For Redshift
Redshift is a CUDA application which means it currently only works with NVidia GPUs. From the gaming-grade GPUs, we recommend the last-gen TitanX Pascal 12GB or the GTX1070/GTX1070Ti/GTX1080/GTX1080Ti GPUs. Or the current-gen RTX2070, RTX2080 or RTX2080Ti GPUs. From the professional-grade GPUs, we recommend the last-gen Quadro P5000, P6000, GP100, GV100 GPUs or the next-gen Quadro RTX GPUs. With the exception of the Quadro GV100 and Quadro RTX6000/RTX8000 (which are the fastest GPUs available on the market today), there are no performance differences between the GeForces and the Quadros as far as Redshift is concerned. The Quadros can render viewport OpenGL faster compared to the GeForces but that doesn't affect Redshift’s rendering performance. The one key benefit Quadros have over GeForces is that they often have more onboard VRAM. As an example, the only NVidia GPUs offering 24GB of VRAM are the Quadro M6000, Quadro P6000 and Quadro RTX6000. The Quadro GV100 offers up to 32GB per GPU while the Quadro RTX8000 offers 48GB of VRAM. That’s a lot of VRAM! :-)
With Redshift, it’s possible to mix GeForce and Quadro GPUs on the same computer.
One important difference between GTX GPUs and Titan/Quadro/Tesla GPUs is TCC driver availability. TCC means "Tesla Compute Cluster". It is a special driver developed by NVidia for Windows. It bypasses the Windows Display Driver Model (WDDM) and allows the GPU to communicate with the CPU at greater speeds. The drawback of TCC is that, once you enable it, the GPU becomes ‘invisible’ to Windows and 3d apps (such as Maya, Houdini, etc). It becomes exclusive to CUDA applications, like Redshift. Only Quadros, Teslas and Titan GPUs can enable TCC. The GeForce GTX cards cannot use it. As mentioned above, TCC is only useful for Windows. Linux operating systems doesn't need it because the Linux display driver doesn’t suffer from latencies typically associated with WDDM. In other words, the CPU-GPU communication on Linux is, by default, faster than on Windows (with WDDM) across all NVidia GPUs, be it GTX cards or Quadro/Tesla/Titan.
Considering that, at the time of writing, a single TitanX costs about twice as much as a GTX1080, a question users often ask is “which one is better? a single TitanX or two GTX1080?”. Well, in terms of raw computing power, two GTX1080s will beat a single TitanX. But if the scenes you’ll be rendering are polygon-heavy (more than 150m unique polygons), we recommend getting a 11-12GB GPU or greater. Please see the next section regarding VRAM and its benefits.
If you install multiple GPUs on the same computer, Redshift will render faster. Having multiple GPUs requires special motherboard/CPU/setup considerations which will be outlined later in this document.
Do you need more VRAM? If so, Titan/Quadro/Tesla is the right choice for you
Do you need TCC (i.e. faster rendering on Windows)? If so, Titan/Quadro/Tesla is the right choice for you
If you don’t need either of the above, multiple GTX GPUs (for the same cost) will offer more raw compute power
How much VRAM (i.e. video RAM) is enough and what difference does it make to performance?
NVidia GPUs come in configurations of 4GB/6GB/8GB/11GB/12GB/24GB/48GB of VRAM. It’s safe to assume that future GPUs will feature even more VRAM. So what’s the right amount of VRAM for a particular user?
The general rule of thumb for Redshift is “the more VRAM, the better”. However, videocards with more VRAM are also more expensive. The text below explains how Redshift uses VRAM so that users can make an informed decision when choosing a GPU.
Redshift is efficient when it comes to VRAM utilization. It is able to fit around 20-30 million unique triangles within approximately 1GB of video memory. If a scene contained 300 million triangles, Redshift would normally need approximately 10GB of VRAM. But even GPUs with 8GB of VRAM can render such high poly scenes with Redshift due to its out-of-core architecture (please see our online FAQ for an explanation of “out of core”). However, excessive out-of-core data access can sometimes introduce considerable performance penalties. It is, therefore, preferable to have plenty of VRAM available when rendering high-poly scenes.
Redshift’s out-of-core tech doesn’t cover all possible types of data. Currently, Redshift cannot store volume grids (such as OpenVDB) in an out-of-core fashion. This means that scenes using many hundreds of megabytes of OpenVDB data might require a GPU with more VRAM, otherwise the frame rendering will be aborted.
Another benefit of having lots of VRAM available is Redshift’s “automatic memory management” feature. If your scene isn’t using too many polygons, you can enable the “automatic memory management” setting and allow Redshift to render even faster. The setting is in Redshift’s “Memory” tab. It improves rendering performance by allowing Redshift to communicate less frequently with the CPU. For a more thorough explanation of this setting, please refer to Redshift’s online documentation or the forums.
Yet another benefit of having lots of VRAM is being able to run multiple GPU applications at once. Apps like Maya’s OpenGL viewport, Chrome (the web browser) and Windows itself can consume considerable amounts of VRAM and leave little memory for Redshift to work with. This is, obviously, less of an issue on GPUs containing plenty of VRAM. For users that can’t afford a GPU with lots of VRAM, a possible workaround is installing an additional (cheaper) GPU which will be used for everything except Redshift. Then, the remaining GPU(s) can be disconnected from the monitor(s) and will, therefore, have the entire VRAM available for rendering with Redshift. Disconnecting a GPU from the monitor is called “headless mode”.
The topic of VRAM capacity is often the deciding factor between going for a more expensive 11-12GB GPU versus a cheaper 8GB one.
Finally, it should be noted that the VRAM of multiple GPUs is not combined together! I.e. if you have a 4GB GPU and an 8GB GPU installed on your system, these do not add up to 12GB! Each GPU can only use it’s own VRAM. This might change in the future, though with the introduction of NVLink. NVLink is a “bridge” that can connect two GPUs together so that they can share each other’s memory. This comes at a performance penalty which might or might not be considerable in some cases. Redshift does not support NVLink today but we are planning (and have started work) on implementing it.
Are you going to be using an extra GPU for OpenGL/2D rendering? If not, then prefer a GPU with more VRAM
More VRAM can also mean faster rendering
Are you going to be rendering heavy (150+ million scenes or lots of OpenVDB or particles)? If so, prefer a GPU with more VRAM
VRAM is not combined across multiple GPUs
A cost-effective solution for speeding up rendering is adding more GPUs to your computer. This is one of the reasons why GPU rendering is more cost-effective compared to CPU rendering solutions. Adding an extra GPU (or more!) is cheaper compared to buying a whole extra computer plus software licenses - including Redshift licenses!
If you’re building a computer for Redshift today and anticipate adding more GPUs to it in the future, we recommend choosing a motherboard that has 4 PCIe3.0 x16 slots or more. Please note that some motherboards will claim to have 4 PCIe3.0 x16 slots but their specifications will say something like (x16, x16), (x8, x8, x8, x8). This means "if you have two GPUs, they’ll both run at x16 speed but if you have 4 GPUs, each will run at x8 speed". In other words, even though the motherboard has 4 slots, they can't all be running at full x16 speed at once.
Do you absolutely need (x16, x16, x16, x16)? No! Redshift will run just fine with (x8, x8, x8, x8) but there can be some cases where x16 speed would help the performance a bit. This includes DeepEXR rendering or rendering scenes that perform a lot of out-of-core rendering, i.e. cases when the GPU needs to access CPU memory. Even in those cases, don't expect a huge performance difference between a x16 and a x8 slot. Future NVidia GPUs (Pascal) will be able to use CPU memory more efficiently so, in those cases, the extra x16 speed might make more of a difference. In other words, planning for PCIe x16 is more of a future-proofing choice at the moment than a real-world-benefit one.
Please note that, even if the chosen motherboard claims lots of PCIe x16-capable slots, you need an appropriate CPU to achieve this performance! (see below)
If you’ll be adding multiple GPUs per computer, prefer a motherboard with multiple fast PCIe x16 slots
We recommend CPUs that have plenty of single-threaded performance. It’s better to have a CPU with fewer cores but more GHz than more cores and low GHz. I.e. an 8 core 2.5GHz CPU will be considerably worse for Redshift compared to a 6 core 3.5 GHz CPU. We recommend CPUs with operating frequencies of 3.5GHz or more.
Not all CPUs can drive 4 GPUs at full PCIe x16 speed. CPUs have a feature called "PCIe lanes" which describes how fast data can be communicated between the CPU and the GPU. Some CPUs have fewer PCIe lanes than others. For example, the Core i7-5820K 3.3GHz has 28 PCIe lanes while the i7-5930K 3.5GHz has 40 PCIe lanes. This means the 5930K can drive more GPUs and at higher speeds. We recommend CPUs with more PCIe lanes. We do not recommend Core i5, Core i3 or lower-end CPUs.
When you have multiple CPUs on the same motherboard (like with Xeons), the PCIe lanes from the CPUs are combined. Dual-Xeon systems can easily drive 8 GPUs at full speed.
Redshift cares more about GHz than number of cores
If you’ll be installing multiple GPUs, look at a higher-end Core i7
If you’ll be installing more than 4 GPUs, you might want to look into dual-Xeon solutions
Avoid i5, i3 and lower-end CPUs
External GPU enclosures
The only external enclosure we’ve ever tested Redshift with was the Cubix Xpander Elite and it performed great! We tested it with 1, 2, 3 and 4 GPUs at once. We found it to be stable and, very importantly, we weren’t able to measure a performance hit compared to installing the GPUs directly on the computer’s motherboard. GPU expanders can be beneficial if your computer doesn’t have enough PCIe slots and also when you want your GPUs to be portable.
Please note that not all external enclosures might be suitable for Redshift! Some might introduce PCIe communication latencies which could negatively affect Redshift’s performance! We recommend testing Redshift with your chosen enclosure before buying, even if a different GPU renderer might run great with it! Redshift’s software architecture requires the GPU to communicate more frequently with the CPU compared to other GPU renderers so the performance (latency) of the enclosure is very important!
Please observe the Wattage requirements of your CPU/GPU and choose an appropriate PSU. Installing 4 GPUs in a computer might require a 1000W PSU - or an even more powerful one! Lower-quality PSUs or PSUs without sufficient power might cause GPU instability and crashes, let alone GPU damage!
Please note that having 4 GPUs in one computer will generate considerable amounts of heat so make sure the case is well-cooled/ventilated. If ventilation isn't sufficient, the GPUs might do thermal-throttling and down-clock themselves in order to not burn out. Throttling/down-clocking means slower rendering! And high heat, of course, means a shorter lifespan for your electronics. So cooling is important!
Multiple GPU scaling
When rendering with Redshift and multiple GPUs, you have two options: you can render a single frame using all your GPUs or you can render multiple frames at once using a combination of the GPUs.
Rendering a single frame with all available GPUs can, in some cases, produce a non-linear performance gain. For example: 4 GPUs might not render exactly 4 times faster compared to rendering with 1 GPU. They might render something like 3 times faster. That’s because there’s a certain amount of per-frame CPU processing involved which cannot be sped up by adding extra GPUs.
To better explain this, consider the following example. Let’s assume that extracting scene data from Maya (which happens solely on the CPU) takes 10 seconds and rendering takes 60 seconds to execute with 1 GPU. So the total rendering time is 70 seconds. Now, if you were to add another 3 GPUs (for a total of 4 GPUs), you would be dividing the 60 seconds of pure rendering time by 4 which is 15 seconds. But you wouldn’t be dividing the 10 seconds of extraction time at all, because all of that is done on the CPU! So the total rendering time would be 10 seconds + 15 seconds = 25 seconds vs the original 70 seconds. I.e. 3 times faster instead of 4.
There are other cases where more GPUs don't help, such as loading data from disk. To make things even worse, certain CPU processing stages are single-threaded. This means that installing a CPU with many cores also wouldn’t help!
The solution to the above problem is to render multiple frames at once. If a computer has 4 GPUs, you could render two frames at once, each frame using 2 GPUs. The reason why this helps is because, when you render multiple frames at once, you’re forcing your CPU to do more work (extract multiple frames at once, for example) which, quite often, improves the CPU-GPU performance ratio.
Some render managers (like Deadline) support this Redshift feature out-of-the-box. In Deadline the feature is called "GPU affinity". Alternatively, if you’re not using a render manager and prefer to use your own batch render scripts, please read this forum post for info on how to render from the command line and use a subset of GPUs: https://www.redshift3d.com/forums/viewthread/1713/. This is basically what Deadline and other render managers do behind the scenes to select GPUs in Redshift.
To get best multi-GPU scaling performance, render multiple frames at once
We recommend having at least twice as much memory as the largest GPUs installed on the system. I.e. if the system is using one or more TitanX 12GB, the system should have at least 24GB of RAM.
If you’ll be rendering multiple frames at once (as explained in the previous section), the memory should be multiplied accordingly. I.e. if rendering 1 frame needs 16GB, rendering two frames simultaneously will need roughly 32GB.
If you’ll be installing multiple GPUs per computer, add plenty of CPU RAM
We recommend using fast SSD drives. Redshift automatically converts textures (JPG, EXR, PNG, TIFF, etc) to its own texture format which is faster to load and use during rendering. Those converted textures are stored in a local drive folder. We recommend using an SSD for that texture cache folder so that, during rendering, the converted texture files can be opened fast. Redshift can optionally not do any of this caching and simply open textures from their original location (even if that is a network folder), but we don't recommend this. For more information on the texture cache folder, please read the online documentation.
Prefer SSDs to mechanical hard disks
Network and NAS
Redshift can render several times faster than CPU renderers. This means that the burden on your network can be higher too, just like it would be if you were adding lots more render nodes! As mentioned above, Redshift caches textures to the local disk so it won't try to load textures through the network over and over again (it will only do it if the texture changes). However, other files (like Redshift proxies) are not locally cached so they will be accessed over the network repeatedly. Fast networks and networked-attached-storage (NAS) typically work fine in this scenario.
However, there have been a few cases where users reported extremely low performance with certain NAS solutions. Since there are many NAS products available in the market, we strongly recommend thoroughly testing your chosen NAS with large Redshift proxies over the network. For example, try exporting a large Redshift proxy containing 30 million triangles or so (a tessellated sphere would do), save it in a network folder and then try using it in a scene both through a network path and also through a local file - and measure the rendering performance difference between the two.
Rendering with Redshift is like rendering with lots of machines. It might put a strain on your network.
Thoroughly test your network storage solution! Some of them have performance issues!