Category Archives: hardware


RTX 3090 vs A100 tensorflow performance

When I searched how to estimate GPU performance I found this answer on stackoverflow, which contains the following code:

After Nvidia released a bunch of new generation GPUs I wanted to compare their performance.
To measure fp16 performance dtype was changed to tf.float16.
To benchmark matrix multiplication in tensorflow 2 compatibility mode was used. It can be enabled by replacing


So final results for tensorflow 2.4.0 are in table:

GPUfp32 performancefp16 performance
RTX 208010877.23 G ops/sec42471.64 G ops/sec
V10014743.50 G ops/sec89348.57 G ops/sec
RTX 309035958.73 G ops/sec69669.73 G ops/sec
A10079158.13 G ops/sec232681.81 G ops/sec
RTX 409080802.89 G ops/sec162852.21 G ops/sec

hardware linux

How to disable NVIDIA card in X server

I actively using CUDA capable cards from NVIDIA for machine learning applications.
When GPU is under heavy load by ml tasks it’s almost impossible to do anything in graphical interface because screen refreshes too slow.
So I switched to integrated video controller. It’s not solved the issue with low refresh rate right away because X server still using NVIDIA card.
After some trials and errors I found a quick and dirty solution: disable graphics card driver during boot to prevent X server to use a card.
To accomplish this you need to blacklist driver by adding a following line:

to /etc/modprobe.d/blacklist.conf
And don’t forget to update initramfs:

After system boot you can load driver manually by following command:

AVR electronic hardware

Wireless Arduino thermometer with data logger

There is a bunch of stuff in my closet, so it requires only time and appropriate mood to build some hardware device. And there is a summer around and weather is rather hot, so it’s not a surprising idea to build a thermometer.
DS1820 is a great thermometer and I started with it, but I have a spare HDPM01 barometer module with built-in thermometer, so let’s use it.
rfm-hdpm01-pinout read more »

CPU hardware

Single threaded performance got stuck

I fell in love with md5 hash algorithm because it can detect some very interesting characteristics of system which I want to benchmark. Almost all computations which need to be performed during computation of md5 hash sum are lying in critical path. It means that it’s almost impossible to parallelize md5 hash sum computation. And I’m not talking about execution in multiple threads, but about instruction level parallelization(superscalar and vector computing). So this feature excluding any new modern tricks used in CPU cores(like out-of-order execution and specialized instruction sets) out of equation and makes it perfect single thread benchmark.
Let’s see some numbers:
Calculate md5(10GiB of zerroes) on i5-760(Turbo frequency: 3.33 GHz, launch date Q3’10)(with Ubuntu 14.04)

And then do the same on i7-6700(Turbo frequency: 4.0 GHz, launch date Q3’15)(with Ubuntu 15.10)

So we have 140 and 155 MB/s per GHz respectively. It is 10.7% performance boost after 5 years of CPU evolution. And it looks so frustrating.
p.s. Yep, I know that CPU now much smarter than 5 years ago and have rich set of specialized instruction sets(like AES-NI which is responsible for +2200% ghash calculation speed). But any software developer should be ready for that fact that unparallelizeable algorithms execution will not become faster for even a bit in near future.

CPU hardware power

Few words about power management

Not so long time ago I’ve faced with problem: on the same linux distributive some machines use Inte Turbo Boost but some others didn’t.
So… To investigate this problem I’ve read enough article about power management and want to summarize key aspects below.
Holy Grail of power management is ACPI(Advanced Configuration and Power Interface). It describes sleep(Sx), processor(Cx) and performace(Px) states.
Performance states came to replace legacy throtling(Tx) states.

  • S5(“Soft-Off”) All hardware is in off state.
  • S4(“Suspend to disk”) S5 + bootloader can determine this state + WOL available
  • S3(“Suspend to RAM”) RAM state preserved as well as S3 capable devices
  • S2(“Standby”) Almost the same as S3
  • S1(“Power On Suspend/Stopgrant”) All power on, but Hard Drives and not S1 capable devices is in off state, CPU is stopped.
  • S0 The system is turned on. Cx states is S0 substates
    • C3(“Sleep”) cache not preserved
    • C2(“Stop-Clock”) all preserved but clock is off
    • C1(“Halt”) all preserved but CPU do nothing
    • C0 is operating state. Px states is C0 substates
      • Pn Minimum frequency
      • P1 Maximum base frequency
      • P0 TurboBoost enabled

So… About TurboBoost issues solution. It was just a BIOS bug(or feature? Who knows?) that doesn’t moved cpu to P0 state on some boards and does on another.