When I searched how to estimate GPU performance I found this answer on stackoverflow, which contains the following code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import os import sys import tensorflow as tf import time n = 8192 dtype = tf.float32 with tf.device("/gpu"): matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype)) matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype)) product = tf.matmul(matrix1, matrix2) # avoid optimizing away redundant nodes config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))) sess = tf.Session(config=config) sess.run(tf.global_variables_initializer()) iters = 10 # pre-warming sess.run(product.op) start = time.time() for i in range(iters): sess.run(product.op) end = time.time() ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications elapsed = (end - start) rate = iters*ops/elapsed/10**9 print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n, elapsed/iters, rate,)) |

After Nvidia released a bunch of new generation GPUs I wanted to compare their performance.

To measure fp16 performance dtype was changed to tf.float16.

To benchmark matrix multiplication in tensorflow 2 compatibility mode was used. It can be enabled by replacing

1 |
import tensorflow as tf |

with

1 2 |
import tensorflow.compat.v1 as tf tf.disable_v2_behavior() |

So final results for tensorflow 2.4.0 are in table:

GPU | fp32 performance | fp16 performance |

RTX 2080 | 10877.23 G ops/sec | 42471.64 G ops/sec |

V100 | 14743.50 G ops/sec | 89348.57 G ops/sec |

RTX 3090 | 35958.73 G ops/sec | 69669.73 G ops/sec |

A100 | 79158.13 G ops/sec | 232681.81 G ops/sec |

RTX 4090 | 80802.89 G ops/sec | 162852.21 G ops/sec |