balenaOS on the Advantech AIR-020X

Last week I posted the review of the Advantech AIR-020X with which I used to create the labSentinel 2 system.

I remarked that the hardware was great, however the software support and update capability of the system was severly lagging behind for an "industrial floor, always on" type of machine. Luckily, thats exactly what balena has been created for.

Even better, their environment already support Nvidia Jetson devices - also Nvidia Jetson Xavier NX modules. With the AIR-020X being a really nice carrierboard (and housing) for this module, I went to work.

Spoiler Alert - it works!

Installing balenaOS on the AIR-020X

1.) I setup an Ubuntu 20.04 LTS machine, installed npm and setup jetson-flash

2.) I went to https://www.balena.io/os and downloaded the latest NVIDIA JETSON XAVIER NX DEVKIT EMMC image (2.107.10) in the development version.

3.) Unzip the file after setting up jetson-flash and getting your AIR-020X into recovery mode. This means opening the bottom of the case by unscrewing the 4 philipps head screws, connecting the Micro USB port of the AIR-020X with your Ubuntu host computer, applying power to the AIR-020X, but do not yet press the power switch.

4.) There is foil/recovery switch next to the Micro USB connector and LAN port. You need to press and hold this switch and at the same time press the power on button of the unit for about 4 seconds.

5.) On Ubuntu, run lsusb | grep Nvidia - this should return a similar line to this

Bus 003 Device 005: ID 0955:7023 NVIDIA Corp. APX

Import is the ending "APX", which means it is in recovery mode.

6.) Now you can start the flash process

user@balenaTest:~/jetson-flash$ ./bin/cmd.js -f ./jetson-xavier-nx-devkit-emmc-2.107.10-v14.4.4.img -m jetson-xavier-nx-devkit-emmc

The .img value points to the unzipped image file, the -m tells the jetson-flash tool that we are running a Xavier NX system and want to install balenaOS on the internal eMMC module.

7.) This will now start the process which will take some minutes and also ask you for your sudo password. At the end you should see something like this:

[ 255.8670 ] Flashing completed

[ 255.8670 ] Coldbooting the device
[ 255.8696 ] tegrarcm_v2 --ismb2
[ 255.9454 ]
[ 255.9502 ] tegradevflash_v2 --reboot coldboot
[ 255.9530 ] Bootloader version 01.00.0000
[ 255.9984 ]
*** The target t186ref has been flashed successfully. ***
Reset the board to boot from internal eMMC.

8.) As soon as you reboot the device, you will be greeted with the balenaOS logo and can use it as any other balenaOS device.

Adding the AIR-020X to a fleet

If you want to use it e.g. in a fleet, I would recommend creating a new one with the device type Nvidia Jetson Xavier. This is important to allow sample projects to correctly work, as its basically the same thing as the more specialized version "jetson-xavier-nx-devkit-emmc" - but most demo projects just implement the former one :).

To now join the installed device onto your new fleet, download and install balenaCLI - login to your balena Cloud account and do a balena scan using balenaCLI to find your AIR-020X on the network.

-
  host:          56e1ef3.local
  address:       192.168.178.112
  osVariant:     development
  dockerInfo:
    Containers:        1
    ContainersRunning: 1
    ContainersPaused:  0
    ContainersStopped: 0
    Images:            1
    Driver:            overlay2
    SystemTime:        2023-03-10T14:05:52.568438957Z
    KernelVersion:     4.9.253-l4t-r32.7
    OperatingSystem:   balenaOS 2.107.10
    Architecture:      aarch64
  dockerVersion:
    Version:    20.10.17
    ApiVersion: 1.41

After that, you can easily join this device with

.\balena join 192.168.178.112
? Select fleet <yourFleetNameToSelect>
? Check for updates every X minutes 10
[Success] Device successfully joined balena-cloud.com!

... and voila, its online!

What does work?

Testing GPIO pins with a multimeter

The AIR-020X has a lot of custom GPIO chips, 2x RS485/RS232 interface, 1x CANbus interface, a second network interface and even a NVMe. Luckily, everything just works out of the box.

- HDMI works
- USB works
- onboard network card (dmesg + dhcp test, gets ip / works)
[   29.231807] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx

- 2nd network card (dmesg + dhcp test, get ip / works)
[  104.307175] igb 0004:05:00.0 enP4p5s0: igb: enP4p5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
- NVMe is recognized (lsblk)
nvme0n1      259:0    0 119.2G  0 disk
|-nvme0n1p1  259:1    0    96G  0 part
|-nvme0n1p2  259:2    0    64M  0 part
|-nvme0n1p3  259:3    0    64M  0 part
|-nvme0n1p4  259:4    0   448K  0 part
|-nvme0n1p5  259:5    0   448K  0 part
|-nvme0n1p6  259:6    0    63M  0 part
|-nvme0n1p7  259:7    0   512K  0 part
|-nvme0n1p8  259:8    0   256K  0 part
|-nvme0n1p9  259:9    0   256K  0 part
|-nvme0n1p10 259:10   0   300M  0 part
`-nvme0n1p11 259:11   0  22.8G  0 part
- can bus interface is auto loaded on boot (see ifconfig -a)
can0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          NOARP  MTU:16  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:63
- gpio/dio, works, but bit3 does sadly not work
( more info: http://ess-wiki.advantech.com.tw/view/File:AIR-020-nVidia_GPIO.docx )
Pin Number AIR-020X AIR-020T AIR-020N
GPIO bit1 	393 	269 		38
GPIO bit2 	421 	425 		149
GPIO bit3 	265 	411 		65
GPIO bit4 	424 	264 		168
GPIO bit5 	418 	476 		202
GPIO bit6 	436 	396 		246
GPIO bit7 	417 	337 		169
GPIO bit8 	268 	338 		194

# set bit 1 as GPIO pin
echo 393 > /sys/class/gpio/export
# get value 0=low, 1=high
cat /sys/class/gpio/gpio393/value
# set direction out or in
echo out > /sys/class/gpio/gpio393/direction
# get direction
cat /sys/class/gpio/gpio393/direction
out
# set value on out pin
echo 1 > /sys/class/gpio/gpio393/value

test:
# 265, bit3 did not work on export
echo 393 > /sys/class/gpio/export
echo 421 > /sys/class/gpio/export
echo 265 > /sys/class/gpio/export
echo 424 > /sys/class/gpio/export
echo 418 > /sys/class/gpio/export
echo 436 > /sys/class/gpio/export
echo 417 > /sys/class/gpio/export
echo 268 > /sys/class/gpio/export

echo out > /sys/class/gpio/gpio393/direction
echo out > /sys/class/gpio/gpio421/direction
echo out > /sys/class/gpio/gpio265/direction
echo out > /sys/class/gpio/gpio424/direction
echo out > /sys/class/gpio/gpio418/direction
echo out > /sys/class/gpio/gpio436/direction
echo out > /sys/class/gpio/gpio417/direction
echo out > /sys/class/gpio/gpio268/direction

echo 1 > /sys/class/gpio/gpio393/value
echo 1 > /sys/class/gpio/gpio421/value
echo 1 > /sys/class/gpio/gpio265/value
echo 1 > /sys/class/gpio/gpio424/value
echo 1 > /sys/class/gpio/gpio418/value
echo 1 > /sys/class/gpio/gpio436/value
echo 1 > /sys/class/gpio/gpio417/value
echo 1 > /sys/class/gpio/gpio268/value
- com ports, running as RS-232 or RS-485 (not tested, but recognized)
( more info: http://ess-wiki.advantech.com.tw/view/AIR-020-RS-485 )
root@56e8bf3:/# ls /dev/ | grep ttyTH
ttyTHS0 <- COM1
ttyTHS1 <- COM2
ttyTHS4

More info to the hardware can be found in the Advantech Wiki.

GPU Demos

Last but not least I want to point you towards the nice balena Jetson tutorial which can be found here.

It will help you getting started with Jetson samples that are hosted here.

In the end I was able to also get CUDA acceleration to work and see this smoke demo:

Nothing like some GPU accelerated smoke

With that I am closing this post. It was surprisingly easy to get this device to work - the only thing left would be to get it to boot and to work from its internal NVMe storage, but other than that its a nice tool for working with GPU workloads like Edge Impulse.

Advantech AIR-020X Review

Normally, I am not getting review units. This is due to the fact that I am only hosting this small weblog, along some conference talks - and most companies would probably be better off to send their units along someone with a reach of Linus Tech Tips, or similar.

On the other hand - when I get the possibility to do a review, it can be a bit worrisome for the companies as well, as I am a very honest person. I have been working in tech for some time now and had the honor to build stuff which went to space - and came back to tell the tale. I know what I want in a unit - and what could be a problem.

With this out of the way, I was one lucky winner of the Advantech Edge AI Challenge 2022 and got an AIR-020X-S9A1 unit at no charge to be able to realize my labSentinel 2 project. By doing this project I learned a bit about the box and thought it would not be a bad idea to share my ideas with the readers of my blog - and also Advantech, so that they can improve upon their product. This review is not paid for, reflects my own thoughts and I got the mentioned unit for my project - the review was not a part of that deal. With that out of the way, lets get started.

The hardware

The AIR-020X comes very well packaged - having its own foam jacket which will save it from all but the most horrible abuse from postal services. Not that it would matter: The roughly 14 cm x 12 cm x 4,5 cm compact unit weighs in at nearly 850 gr and is built sturdy and robust - like a tank:

The most obvious part of the unit is its heatsink, which it does put to good use - but more on that topic later. Along with the computer itself comes a chinese printed starting guide and a short USB A to Micro B cable, which will be needed to factory reset and reflash the unit.

All in all, the AIR-020X is an impressive unit, including an Nvidia Jetson Xavier NX module with 8 GB RAM, 16 GB onboard eMMC, 128 GB M.2 Flash, 2x RS232/422/485, 1x CANbus, 1xDIO ("GPIO"), 2x 1 Gbit ethernet, 1x Fullsize mPCIe with nano SIM holder, 1x 4k HDMI Output, 2x USB 3.0 Type A, 1x USB Type C. The unit is powered by a 12-24 V DC power supply, which is an optional accessory.

Being an industrial unit, it uses an industrial type connector for power, which is an HT5.08 2 pole type:

As this connector is also not part of the base package and the USB C connector does not accept power delivery (and neither works in Display Port Mode) - it becomes a bit harder to power up the unit after receiving it. Finding a usable power supply within the sizable voltage range of 12 - 24 V (e.g. from an old Laptop) is fairly easy, but without the connector - it becomes a dead end until the next delivery is there. It would be useful to at least include one connector with the base unit. The usb cable is a nice addition, but could be left out (even though its very high quality) - along with the chinese manual. This could be replaced with a small card with direct links to the english and chinese PDF versions of the manual.

Opening up the unit reveals the internals - but not without a fight:

The used screws are perfectly fixed to the structure by using blue loctite - a touch I cannot recommend enough for the vibration resistance of the overall unit - but the screws themselves are made from extremely soft metal, so that - using the correct screwdriver - I stripped nearly all screws and had really issues removing all of them. Somehow this problem seems to exist for all the external black screws, the internal silver ones were of a lot higher quality. In my case I fixed the issue by replacing the screws with new ones and never had an issue anymore with them.

The internal structure is very well laid out, raising the M.2 drive onto a pole to keep it a bit further from the heat source / Xavier NX module which is just sitting on the other side of the PCB and directly sandwiches with the big heatsink.

Very welcome are also the addition of the two Raspberry Pi Style Camera connectors, although they are a bit hidden by the serial console cables. I understand that the unit should be as closed as possible for the use in factories, but I would have loved to see two small slits (possibly even with some IP/EMC gaskets to allow for protective shielding of those entry points) so that cameras on the outside of the case can be easily attached.

The mPCIe slot gives the system an additional expansion slot for e.G. UMTS or LoRaWAN modules and also the internal CR2032 cell for the RTC is a small but valuable detail.

The AIR-020X has some mounting points available on both system sides for additional wall mounting rails. Looking at the mounting points and the obvious use of the AIR-020 series in lab and factory settings, the inclusion of a DIN rail mount as available accessory could prove very useful to directly mount this small computer into an electrical cabinet.

The software

Booting up the system greats one with a very familiar picture: Ubuntu 18.04 is running on the machine in form of a tailored version of Nvidia Jetpack. This version by Advantech is only using the eMMC of the Xavier NX module to start the bootloader, but the actual data is kept on the M.2. This is a great idea for the longevity of the eMMC on the (currently hard to find) Xavier NX module - but comes with the drawback of additional needed customization other than "only" the PCB, included hardware, drivers and other changes made by Advantech in comparision to an Nvidia Developerboard for the same module.

This is a problem I also learned the hard way: I realized that the board was delivered with L4T 32.5.2 - not the current 32.7.x (JetPack 4.6.1) - so I updated this by hand. Just to have the board bootloop. This was the moment I took a closer look to the online presence of Advantech and the manual - just to learn that the recovery process was neither described, nor was the download of the image available. I got the needed recovery file as well as the documentation (which also included vital information on how to use the DIO (GPIO), RS422 and CANbus interface) and as able to restore the board to working order. Obviously there were multiple problems with this: First, the online available manual should contain all needed information regardings settings, ports, recovery, etc - secondly, the current (and maybe even last) images also need to be available online on their website - with checksums to be able to deploy these images safetly.

I also voiced my concerns regarding the high impact security issues / CVEs found in 32.5.2 - which would make the use of AIR-020 series an absolute liability in a production environment. I am glad to report that Advantech reacted to these concerns with providing a beta version of a new JetPack 4.6.1 Image. A short time afterwards, Advantech did add some information to their wiki:

On the download page you can find the AIR020A2AIM20UIV00004 entry for the Jetson NX JetPack 4.6.1 from 2022-07-20. This links to a Dropbox folder containing a the latest image (AIR020A2AIM20UIV00004_194.tar.gz / 2022-09-16).

With this latest image I was able to upgrade the AIR-020X to JetPack 4.6.1 and even do and apt upgrade to upgrade to L4T 32.7.2, at the time the latest L4T. However, this did not go as planed: After doing the upgrade and rebooting the device, it got caught in a bootloop. This bootloop kept on repeating for about 10 minutes until the device mysteriously started then working and came back on without issues. Obviously this would not be a graceful upgrade and did instill some concerns why this was a reproducible issue.

I am glad to report that Advantech has provided the latest image - which will eliminate several security issues. However, the changes needed in the manual as well as the provision of the recovery images (now via Dropbox?) and the secure provision of security updates to the unit remain. Maybe Advantech would think about starting to use balena.io to handle these issues?

Verdict

The Advantech AIR-020X is an extremely capable unit in a small form factor, sturdy built and highly reliable. Even with the latest JetPack 4.6.1 and abuse of the formerly not available 20 Watt mode I could not get this unit to heat up too much in my testing with labSentinel 2. There is still enough headroom available to use it in any kind of environment, which makes it a perfect choice for labs and factories - if Advantech can tackle the presented issues. Especially the ones regarding timely and secure availability of security patches and software updates. This also means availability of these images, fast adaption after release of official Nvidia updates and all needed documentation in one manual for public download. With these exceptions and some small kinks, Advantech is so close to building the perfect unit for their envisioned use case. I really hope they can close that last (security/software/manual) gap to an otherwise nearly perfect hardware - and with that create an recommendable product.

Edit: balenaOS

I got balenaOS working on the device - see here.

USB C power for the Nvidia Jetson Nano 4 GB dev board

The best way to power a Jetson Nano 4 GB dev board is by using a center positive, 5 V and at least 4 A barrel connector type power adapter. However, these are often bulky and not the best travel companion - while USB C power bricks are becoming more common and the relevant USB C sockets are getting build into nearly every device (maybe yours too, Apple?).

So I set out to build a USB C power adapter for the Jetson board.

By using an inexpensive USB C "trigger" combined with two 5V@3A step-down converters this did actually work.

The trick is setting the USB C trigger to request 20 V and using the 5 V converters in parallel to step-down the 20 V to 5 V - and then feeding the resulting voltage again in parallel to the barrel plug, like so.

For the curios among you now asking why I did not just set the trigger to 5 V and used it all alone: I tried this first, but it did not work. It was not able to provide enough current to support the operation of the Jetson at "MAXN mode" - it was constantly coming up with Overcurrent protection messages if pushed too hard.

I am happy with the result and shortend the wires after testing, putting everything into a neat small form factor.

With this change I can finally replace my old Jetson Nano power supply with something smaller than this chunky unit which I was gifted back in the day by the awesome Morlac :).

labSentinel 2

About nearly a year ago, I wrote the labSentinel project for my Nvidia Jetson AI Specialist certification. The basic idea of the project is to be able to supervise old Lab Equipment which does not poses any kind of log output or interface other than a graphical user interface, running on an Windows 3.11 / 95 / NT - maybe even XP system. I solved this issue by using a video grabber attached to a Jetson Nano and "out-of-band" grabbing the screen output of the experiment computer. I then learned good and bad system states via Nvidias Inference tools and finally got the system to report via MQTT as soon as something did go wrong. (As a "test system" I designed a flashy GUI application to try to mimic the old interfaces - specifically thinking about a lab power supply with multiple outputs - and the ability to simulate errors.)(https://developer.nvidia.com/embedded/community/jetson-projects#labsentinel / https://github.com/nmaas87/labsentinel)

While the project did work, there was still a lot left to be desired:

  • The system did capture the complete screen in full size. Running inference on a 1024x768 or even higher resolution picture is not efficient and has a high failure rate.
  • Training, testing and improving the model was time consuming and did not yield the precision and results I was hoping for.
  • The system could differentiate between "good" and "error" states - however if an error occurred, I would have loved to get more information - "reading the GUI" and its output. For example in the lab power supply use case, getting the specific voltages of the different lines to see which line failed or what is wrong - maybe even with the possibility to cross check if the detected error is an error in the first place
  • While the Nvidia Jetson Nano Development Board is an awesome tool for development, it is not hardend enough / suited for a lab or even factory floor environment.

These were all points I wanted to address, but as time was lacking - I did not take up the project again - until the start of this year Advantech and Edge Impulse started their Advantech Edge AI Challenge 2022. They wanted to know about specific use cases and how to solve them with factory hardend Jetson products (e.g. Advantechs AIR-020 series) and Edge Impulse Studio.

Well, that reminded me of the first labSentinel - and I thought I'd give it a try. As luck would have it, I actually was one of the two lucky guys who were picked to be able to realize their project. Advantech sent me one of their AIR-020X boards (review is here :)) and I was good to go:

Let me introduce you to labSentinel 2:

Build from the ground up, it does solve the above mentioned issues:

  • The actually GUI window is found and extracted from the "full size Desktop screenshots" via OpenCV 2 - and resized to 320x320 pixels to neatly fit the inference model
  • All model training, testing and optimization is done with Edge Impulse, which makes handling a breeze
  • If an error is detected and included OCR module using tesseract can extract text from predesignated / labeled areas on the non-resized GUI and sent this information along with the MQTT alert
  • The AIR-020X board is more than robust enough for all normal lab and factory floors

All source code is freely available with a demo project and documentation on Github ( https://github.com/nmaas87/labSentinel2 ) and also a video instruction on how to use it ( https://youtu.be/KEN_HT20exs )

Thanks again to Gary Lin (Advantech) as well as Louis Moreau and David Tischler (Edge Impulse) for their support :)!

Update: I added a Review to the Advantech AIR-020X and got balenaOS working on it.

CUDA and Tensorflow in Docker

In this howto we will get CUDA working in Docker. And - as bonus - add Tensorflow on top! However, please note that you'll need following prereqs:

GNU/Linux x86_64 with kernel version > 3.10
Docker >= 1.9 (official docker-engine, docker-ce or docker-ee only)
NVIDIA GPU with Architecture > Fermi (2.1)
NVIDIA drivers >= 340.29 with binary nvidia-modprobe

We will install the NVIDIA drivers in this tutorial, so you should only have the right kernel and docker version already installed, we're using a Ubuntu 15.05 x64 machine here. For CUDA, you'll need a Fermi 2.1 CUDA card (or better), for tensorflow a >= 3.0 CUDA card...

Which Graphicscard Model do I own?
lspci | grep VGA
sudo lshw -C video

Output i.e.:

product: GF108 [GeForce GT 430]
vendor: NVIDIA Corporation

You should lookup on google if it works with cuda / Fermi 2.1, i.e. on https://developer.nvidia.com/cuda-gpus

GeForce GT 430 - Compute: 2.1

Ok, that one works!

I got additional infos from: https://www.geforce.com/hardware/desktop-gpus/geforce-gt-430/specifications

CUDA and Docker?

You can find out more about that topic on https://github.com/NVIDIA/nvidia-docker

Getting it to work will be the next step:

Download right CUDA / NVIDIA Driver

from http://www.nvidia.com/object/unix.html
I choose Linux x86_64/AMD64/EM64T, Latest Long Lived Branch version: 375.66, but please check in the description of the file, if your graphics card is supported!

After Download, install the driver:
chmod +x NVIDIA-Linux-x86_64-375.66.run
sudo ./NVIDIA-Linux-x86_64-375.66.run

It will ask for permission, accept it. If it gives info that the nouveau driver needs to be disabled, just accept that, in the next step, it will generate a blacklist file and exit the setup. Afterwards, run

sudo update-initramfs -u

and reboot your server. Then, rerun the setup with

sudo ./NVIDIA-Linux-x86_64-375.66.run

You can check the installation with

nvidia-smi

and get an output similar to this one:

Mon Jul 24 09:03:47 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 430      Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   40C    P0    N/A /  N/A |      0MiB /   963MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

which means that it worked!

Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
Test nvidia-smi from Docker
nvidia-docker run --rm nvidia/cuda nvidia-smi

should output:

Using default tag: latest
latest: Pulling from nvidia/cuda
e0a742c2abfd: Pull complete
486cb8339a27: Pull complete
dc6f0d824617: Pull complete
4f7a5649a30e: Pull complete
672363445ad2: Pull complete
ba1240a1e18b: Pull complete
e875cd2ab63c: Pull complete
e87b2e3b4b38: Pull complete
17f7df84dc83: Pull complete
6c05bfef6324: Pull complete
Digest: sha256:c8c492ec656ecd4472891cd01d61ed3628d195459d967f833d83ffc3770a9d80
Status: Downloaded newer image for nvidia/cuda:latest
Mon Jul 24 07:07:12 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 430      Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   40C    P8    N/A /  N/A |      0MiB /   963MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

Yep, you got it working in Docker!

Running an interactive CUDA session isolating the first GPU
NV_GPU=0 nvidia-docker run -ti --rm nvidia/cuda
Input our first Hello World program
echo '#include <stdio.h>
// Kernel-execution with __global__: empty function at this point
__global__ void kernel(void) {
// printf("Hello, Cuda!\n");
}
int main(void) {
// Kernel execution with <<<1,1>>>
kernel<<<1,1>>>();
printf("Hello, World!\n");
return 0;
}' > helloWorld.cu
Compile it within the Docker container
nvcc helloWorld.cu -o helloWorld
Execute it...
./helloWorld
and you get,...
Hello, World!

Congrats, you got it working!

Encore, Tensorflow
Getting Tensorflow to work is straight forward:
nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu

It will output something like:

Copy/paste this URL into your browser when you connect for the first time, to login with a token:
http://localhost:8888/?token=d747247b33023883c1a929bc97d9a115e8b2dd0db9437620

you should do that 🙂

Then enter the 1_hello_tensorflow notebook and run the first sample:

from __future__ import print_function
import tensorflow as tf
with tf.Session():
    input1 = tf.constant([1.0, 1.0, 1.0, 1.0])
    input2 = tf.constant([2.0, 2.0, 2.0, 2.0])
    output = tf.add(input1, input2)
    result = output.eval()
    print("result: ", result)

by selecting it and clicking on the >| (run cell, select below) Button.
This worked for me:

result: [ 3. 3. 3. 3.]

however... sadly not the GPU was calculating the results as shown by the Docker CLI:

Kernel started: 2bc4c3b0-61f3-4ec8-b95b-88ed06379d85
[I 07:31:45.544 NotebookApp] Adapting to protocol v5.1 for kernel 2bc4c3b0-61f3-4ec8-b95b-88ed06379d85
2017-07-24 07:32:17.780122: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 07:32:17.837112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-24 07:32:17.837440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GT 430
major: 2 minor: 1 memoryClockRate (GHz) 1.4
pciBusID 0000:01:00.0
Total memory: 963.19MiB
Free memory: 954.56MiB
2017-07-24 07:32:17.837498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-24 07:32:17.837522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2017-07-24 07:32:17.837549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] Ignoring visible gpu device (device: 0, name: GeForce GT 430, pci bus id: 0000:01:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.

So, CUDA >= 3.0 devices only for tensorflow 🙁 - but, it still works, as it is using the CPU (however, not as fast as it could :/)

Infos taken from:

https://github.com/NVIDIA/nvidia-docker
https://developer.nvidia.com/cuda-gpus
https://hub.docker.com/r/tensorflow/tensorflow/