Icare3D: Fermi output queues and L2 write combining experiments

Fermi output queues and L2 write combining experiments

Jun 1, 2010 at 9:35 AM Labels: CUDA

A guy from Los Alamos compared the performances (between Tesla 2 and Fermi) of output queues using atomic-add on an integer index per queue. First result : 16x speedup on Fermi !
http://forums.nvidia.com/index.php?showtopic=170125

Its is supposedly thanks to the coalescing of atomic operation that may be done in the L2 cache.

He also did another experiment to see if the L2 cache allows combining writes from different blocks into global memory, and it appears to be the case when you have consecutive blocks writing to the same cache line at the same time. Result: 3.25x speedup on Fermi.
http://forums.nvidia.com/index.php?showtopic=170127

Icare3D

Research, Computer Graphics and GPU

Search:

Pages

Fermi output queues and L2 write combining experiments

0 Comments for "Fermi output queues and L2 write combining experiments"

Post a Comment

About

Blog Archive

Labels

Blog links

Favorite websites

Recent Comments

Followers