RTX 3090 vs A100 tensorflow performance

When I searched how to estimate GPU performance I found this answer on stackoverflow, which contains the following code:

After Nvidia released a bunch of new generation GPUs I wanted to compare their performance.
To measure fp16 performance dtype was changed to tf.float16.
To benchmark matrix multiplication in tensorflow 2 compatibility mode was used. It can be enabled by replacing


So final results for tensorflow 2.4.0 are in table:

GPUfp32 performancefp16 performance
RTX 208010877.23 G ops/sec42471.64 G ops/sec
V10014743.50 G ops/sec89348.57 G ops/sec
RTX 309035958.73 G ops/sec69669.73 G ops/sec
A10079158.13 G ops/sec232681.81 G ops/sec

Yuriy Nazarov on GithubYuriy Nazarov on Twitter
Yuriy Nazarov
Software engineer
Love machine learning