Planet Collabora

August 16, 2017

Simon McVittie

DebConf 17: Flatpak and Debian

The indoor garden at Collège de Maisonneuve, the DebConf 17 venue
Decorative photo of the indoor garden

I'm currently at DebConf 17 in Montréal, back at DebConf for the first time in 10 years (last time was DebConf 7 in Edinburgh). It's great to put names to faces and meet more of my co-developers in person!

On Monday I gave a talk entitled “A Debian maintainer's guide to Flatpak”, aiming to introduce Debian developers to Flatpak, and show how Flatpak and Debian (and Debian derivatives like SteamOS) can help each other. It seems to have been quite well received, with people generally positive about the idea of using Flatpak to deliver backports and faster-moving leaf packages (games!) onto the stable base platform that Debian is so good at providing.

A video of the talk is available from the Debian Meetings Archive. I've also put up my slides in the DebConf git-annex repository, with some small edits to link to more source code: A Debian maintainer's guide to Flatpak. Source code for the slides is also available from Collabora's git server.

The next step is to take my proof-of-concept for building Flatpak runtimes and apps from Debian and SteamOS packages, flatdeb, get it a bit more production-ready, and perhaps start publishing some sample runtimes from a cron job on a Debian or Collabora server. (By the way, if you downloaded that source right after my talk, please update - I've now pushed some late changes that were necessary to fix the 3D drivers for my OpenArena demo.)

I don't think Debian will be going quite as far as Endless any time soon: as Cosimo outlined in the talk right before mine, they deploy their Debian derivative as an immutable base OS with libOSTree, with all the user-installable modules above that coming from Flatpak. That model is certainly an interesting thing to think about for Debian derivatives, though: at Collabora we work on a lot of appliance-like embedded Debian derivatives, with a lot of flexibility during development but very limited state on deployed systems, and Endless' approach seems a perfect fit for those situations.

[Edited 2017-08-16 to fix the link for the slides, and add links for the video]

August 16, 2017 08:50 PM

July 20, 2017

memcpy.io - Robert Foss

Android: NXP i.MX6 on Etnaviv Update

Since the last post a lot work has gone into upstreaming and stabilizing the etnaviv on Android ecosystem. This has involved Android, kernel and Mesa changes. Many of which are available upstream now. A How-To for getting you up and running on an iMX6 dev board is available here.

Improvements

Modifiers support

Modifiers support has been accepted into Mesa, GBM and gbm_gralloc. Modifiers were mentioned in a previous post.

Etnaviv driver support for Android

Patches enabling the etnaviv Mesa driver being built for Android have now landed upstream.

Stability on Android

A number for small stability issues present while running Android on i.MX6 hardware have now been fixed, and the platform is now relatively stable.

Performance diagnostics

We have a decent understanding that the platform is slow when running the desktop and other apps that have multiple surfaces due to rendering using CPU instead of GPU.

Etnaviv improvements

Etnaviv performance and feature set have both been increased since Mesa v17.1.

EGL support

A number of games using EGL have been successfully run on Android, some minor graphical issues still remain, but overall games run well and fast.

Thanks

This work is built on efforts by a lot people:

This post has been a part of work undertaken by my employer Collabora, and has been funded by Zodiac Inflight Innovations.

by Robert Foss at July 20, 2017 10:00 PM

July 12, 2017

Alexandros Frantzis

vkmark: more than a Vulkan benchmark

Ever since Vulkan was announced a few years ago, the idea of creating a Vulkan benchmarking tool in the spirit of glmark2 had been floating in my mind. Recently, thanks to my employer, Collabora, this idea has materialized! The result is the vkmark Vulkan benchmark, hosted on github:

https://github.com/vkmark/vkmark

Like its glmark2 sibling project, vkmark’s goals are different from the goals of big, monolithic and usually proprietary benchmarks. Instead of providing a single, complex benchmark, vkmark aims to provide an extensible suite of targeted, configurable benchmarking scenes. Most scenes exercise specific Vulkan features or usage patterns (e.g., desktop 2.5D scenarios), although we are also happy to have more complex, visually intriguing scenes.

Benchmarking scenes can be configured with options that affect various aspects of their rendering. We hope that the ease with which developers can use different options will make it painless to perform targeted tests and eventually provide best practices advice.

A few years ago we were pleasantly surprised to learn that developers were using glmark2 as a testing tool for driver development, especially in free (as in freedom) software projects. This is a goal that we want to actively pursue for vkmark, too. The flexible benchmarking approach is a natural fit for this kind of development; the developer can start with getting the simple scenes working and then, as the driver matures, move to scenes that use more advanced features. vkmark has already proved useful in this regard, being an valuable testing aid for my own experiments in the Mesa Vulkan WSI implementation.

With vkmark we also want to be on the cutting edge of software development practices and tools. vkmark is a modern, C++14 codebase, using the vulkan-hpp bindings, the Meson build system and the Catch test framework. To ensure a high quality codebase, the core of vkmark is developed using test-driven development.

It is still early days, but vkmark already has support for X11, Wayland and DRM/KMS, and provides two simple scenes: a “clear” scene, and a “cube” scene that renders a simple colored cube based on the vkcube example (which is itself based on kmscube). The future looks bright!

We are looking forward to getting more feedback on vkmark and, of course, contributions are always welcome!


by afrantzis at July 12, 2017 04:21 PM

June 27, 2017

memcpy.io - Robert Foss

Performance debugging Linux graphics on Mesa

GALLIUM_HUD

GALLIUM_HUD is a feature that adds performance graphs to applications that describe various aspects like FPS, CPU usage, etc in realtime.

It is enabled using an environment variable, GALLIUM_HUD, that can be set for GL/EGL/etc applications. It only works for Mesa drivers that are Gallium based, which means that the most drivers (with the notable exception of some Intel drivers) support GALLIUM_HUD.

See GALLIUM_HUD options:

export GALLIUM_HUD=help
glxgears

Android

If you're building Android, you can supply system-wide environment values by doing an export in the init.rc file of the device you are using, like this.

# Go to android source code checkout
cd android

# Add export to init.rc (linaro/generic is the device I use)
nano device/linaro/generic/init.rc
export GALLIUM_HUD cpu,cpu0+cpu1+cpu2+cpu3;pixels-rendered,fps,primitives-generated

Linux

If you're using one of the usual Linux distros, GALLIUM_HUD can be enabled by setting the environtment variable in a place that it loaded early.

# Add export to /etc/environment
nano /etc/environment 
export GALLIUM_HUD cpu,cpu0+cpu1+cpu2+cpu3;pixels-rendered,fps,primitives-generated

Thanks

This post has been a part of work undertaken by my employer Collabora.

by Robert Foss at June 27, 2017 10:00 PM

GALLIUM_HUD: Debug Mesa Graphics Performance

GALLIUM_HUD

GALLIUM_HUD is a feature that adds performance graphs to applications that describe various aspects like FPS, CPU usage, etc in realtime.

It is enabled using an environment variable, GALLIUM_HUD, that can be set for GL/EGL/etc applications. It only works for Mesa drivers that are Gallium based, which means that the most drivers (with the notable exception of some Intel drivers) support GALLIUM_HUD.

See GALLIUM_HUD options:

export GALLIUM_HUD=help
glxgears

Android

If you're building Android, you can supply system-wide environment values by doing an export in the init.rc file of the device you are using, like this.

# Go to android source code checkout
cd android

# Add export to init.rc (linaro/generic is the device I use)
nano device/linaro/generic/init.rc
export GALLIUM_HUD cpu,cpu0+cpu1+cpu2+cpu3;pixels-rendered,fps,primitives-generated

Linux

If you're using one of the usual Linux distros, GALLIUM_HUD can be enabled by setting the environtment variable in a place that it loaded early.

# Add export to /etc/environment
nano /etc/environment 
export GALLIUM_HUD cpu,cpu0+cpu1+cpu2+cpu3;pixels-rendered,fps,primitives-generated

Thanks

This post has been a part of work undertaken by my employer Collabora.

by Robert Foss at June 27, 2017 10:00 PM

June 14, 2017

Sjoerd Simons

Debian armhf VM on arm64 server

At Collabora one of the many things we do is build Debian derivatives/overlays for customers on a variety of architectures including 32 bit and 64 bit ARM systems. And just as Debian does, our OBS system builds on native systems rather than emulators.

Luckily with the advent of ARM server systems some years ago building natively for those systems has been a lot less painful than it used to be. For 32 bit ARM we've been relying on Calxeda blade servers, however Calxeda unfortunately tanked ages ago and the hardware is starting to show its age (though luckily Debian Stretch does support it properly, so at least the software is still fresh).

On the 64 bit ARM side, we're running on Gigabyte MP30-AR1 based servers which can run 32 bit arm code (As opposed to e.g. ThunderX based servers which can only run 64 bit code). As such running armhf VMs on them to act as build slaves seems a good choice, but setting that up is a bit more involved than it might appear.

The first pitfall is that there is no standard bootloader or boot firmware available in Debian to boot on the "virt" machine emulated by qemu (I didn't want to use an emulation of a real machine). That also means there is nothing to pick the kernel inside the guest at boot nor something which can e.g. have the guest network boot, which means direct kernel booting needs to be used.

The second pitfall was that the current Debian Stretch armhf kernel isn't built with support for the generic PCI host controller which the qemu virtual machine exposes, which means no storage and no network shows up in the guest. Hopefully that will get solved soonish (Debian bug 864726) and can be in a Stretch update, until then a custom kernel package is required using the patch attach to the bug report is required but I won't go into that any further in this post.

So on the happy assumption that we have a kernel that works, the challenge left is to nicely manage direct kernel loading. Or more specifically, how ensure the hosts boots the kernel the guest has installed via the standard apt tools without having to copy kernels around between guest/host, which essentially comes down to exposing /boot from the guest to the host. The solution we picked is to use qemu's 9pfs support to share a folder from the host and use that as /boot of the guest. For the 9p folder the "mapped" security mode seems needed as the "none" mode seems to get confused by dpkg (Debian bug 864718).

As we're using libvirt as our virtual machine manager the remainder of how to glue it all together will be mostly specific to that.

First step is to install the system, mostly as normal. One can directly boot into the vmlinuz and initrd.gz provided by normal Stretch armhf netboot installer (downloaded into e.g /tmp). The setup overall is straight-forward with a few small tweaks:

  • /srv/armhf-vm-boot is setup to be the 9p shared folder (this should exist and owned by the libvirt-qemu user) that will be used for sharing /boot later
  • the kernel args are setup to setup root= with root partition intended to be used in the VM, adjust for your usage.
  • The image file to use the virtio bus, which doesn't seem the default.

Apart from those tweaks the resulting example command is similar to the one that can be found in the virt-install man-page:

virt-install --name armhf-vm --arch armv7l --memory 512 \
  --disk /srv/armhf-vm.img,bus=virtio
  --filesystem /srv/armhf-vm-boot,virtio-boot,mode=mapped \
  --boot=kernel=/tmp/vmlinuz,initrd=/tmp/initrd.gz,kernel_args="console=ttyAMA0,root=/dev/vda1"

Run through the install as you'd normally would. Towards the end the installer will likely complain it can't figure out how to install a bootloader, which is fine. Just before ending the install/reboot, switch to the shell and copy the /boot/vmlinuz and /boot/initrd.img from the target system to the host in some fashion (e.g. chroot into /target and use scp from the installed system). This is required as the installer doesn't support 9p, but to boot the system an initramfs will be needed with the modules needed to mount the root fs, which is provided by the installed initramfs :). Once that's all moved around, the installer can be finished.

Next, booting the installed system. For that adjust the libvirt config (e.g. using virsh edit and tuning the xml) to use the kernel and initramfs copied from the installer rather then the installer ones. Spool up the VM again and it should happily boot into a freshly installed Debian system.

To finalize on the guest side /boot should be moved onto the shared 9pfs, the fstab entry for the new /boot should look something like:

virtio-boot /boot  9p trans=virtio,version=9p2000.L,x-systemd.automount 0 0

With that setup, it's just a matter of shuffling the files in /boot around to the new filesystem and the guest is done (make sure vmlinuz/initrd.img stay symlinks). Kernel upgrades will work as normal and visible to the host.

Now on the host side there is one extra hoop to jump through, as the guest uses the 9p mapped security model symlinks in the guest will be normal files on the host containing the symlink target. To resolve that one, we've used libvirt's qemu hook support to setup a proper symlink before the guest is started. Below is the script we ended up using as an example (/etc/libvirt/hooks/qemu):

vm=$1
action=$2
bootdir=/srv/${vm}-boot

if [ ${action} != "prepare" ] ; then
  exit 0
fi

if [ ! -d ${bootdir} ] ; then
  exit 0
fi

ln -sf $(basename $(cat ${bootdir}/vmlinuz))  ${bootdir}/virtio-vmlinuz
ln -sf $(basename $(cat ${bootdir}/initrd.img))  ${bootdir}/virtio-initrd.img

With that in place, we can simply point the libvirt definition to use /srv/${vm}-boot/virtio-{vmlinuz,initrd.img} as the kernel/initramfs for the machine and it'll automatically get the latest kernel/initramfs as installed by the guest when the VM is started.

Just one final rough edge remains, when doing reboot from the VM libvirt leaves qemu to handle that rather than restarting qemu. This unfortunately means a reboot won't pick up a new kernel if any, for now we've solved this by configuring libvirt to stop the VM on reboot instead. As we typically only reboot VMs on kernel (security) upgrades, while a bit tedious, this avoid rebooting with an older kernel/initramfs than intended.

by Sjoerd Simons at June 14, 2017 01:10 PM

Debian Jessie on Raspberry Pi 2

Apart from being somewhat slow, one of the downsides of the original Raspberry Pi SoC was that it had an old ARM11 core which implements the ARMv6 architecture. This was particularly unfortunate as most common distributions (Debian, Ubuntu, Fedora, etc) standardized on the ARMv7-A architecture as a minimum for their ARM hardfloat ports. Which is one of the reasons for Raspbian and the various other RPI specific distributions.

Happily, with the new Raspberry Pi 2 using Cortex-A7 Cores (which implement the ARMv7-A architecture) this issue is out of the way, which means that a a standard Debian hardfloat userland will run just fine. So the obvious first thing to do when an RPI 2 appeared on my desk was to put together a quick Debian Jessie image for it.

The result of which can be found at: https://images.collabora.co.uk/rpi2/

Login as root with password debian (Obviously do change the password and create a normal user after booting). The image is 3G, so should fit on any SD card marketed as 4G or bigger. Using bmap-tools for flashing is recommended, otherwise you'll be waiting for 2.5G of zeros to be written to the card, which tends to be rather boring. Note that the image is really basic and will just get you to a login prompt on either serial or hdmi, batteries are very much not included, but can be apt-getted :).

Technically, this image is simply a Debian Jessie debootstrap with a extra packages for hardware support. Unlike Raspbian the first partition (which contains the firmware & kernel files to boot the system) is mounted on /boot/firmware rather then on /boot. This is because the VideoCore expects the first partition to be a FAT filesystem, but mounting FAT on /boot really doesn't work right on Debian systems as it contains files managed by dpkg (e.g. the kernel package) which requires a POSIX compatible filesystem. Essentially the same reason why Debian is using /boot/efi for the ESP partition on Intel systems rather the mounting it on /boot directly.

For reference, the RPI2 specific packages in this image are from https://repositories.collabora.co.uk/debian/ in the jessie distribution and rpi2 component (this repository is enabled by default on the image). The relevant packages there are:

  • linux: Current 3.18 based package from Debian experimental (3.18.5-1~exp1 at the time of this writing) with a stack of patches on top from the raspberrypi github repository and tweaked to build an rpi2 flavour as the patchset isn't multiplatform capable :(
  • raspberrypi-firmware-nokernel: Firmware package and misc libraries packages taken from Raspbian, with a slight tweak to install in /boot/firmware rather then /boot.
  • flash-kernel: Current flash-kernel package from debian experimental, with a small addition to detect the RPI 2 and "flash" the kernel to /boot/firmware/kernel7.img (which is what the GPU will try to boot on this board).

For the future, it would be nice to see the Raspberry Pi 2 support out of the box on Debian. For that to happen, the most important thing would be to have some mainline kernel support for this board (supporting multiplatform!) so it can be build as part of debians armmp kernel flavour. And ideally, having the firmware load a bootloader (such as u-boot) rather than a kernel directly to allow for a much more flexible boot sequence and support for using an initramfs (u-boot has some support for the original Raspberry Pi, so adding Raspberry Pi 2 support should hopefully not be too tricky)

Update: An updated image (20150705) is available with the latest packages from Jessie and a GPG key that's not expired :).

by Sjoerd Simons at June 14, 2017 01:10 PM

June 13, 2017

Helen Koike

NVMe: Officially faster for emulated controllers!

The Doorbell Buffer Config command

When I last wrote about NVMe, the feature to improve NVMe performance over emulated environments was just a living discussion and a work in progress patch. However, it has now been officially released in the NVMe Specification Revision 1.3 under the name "Doorbell Buffer Config command", along with an implementation that is already in the mainline Linux Kernel! \o/

You can already feel the difference in performance if you compile Kernel 4.12-rc1 (or later) and run it over a virtual machine hosted on Google Compute Engine. Google actually updated their hypervisor as soon as the feature was ratified by the NVMe working group, even before it was publicly released.

There were very few changes from the original proposal, I.e. opcodes, return values and now fancy names; the buffers (as described in my last post) are now called Shadow Doorbell and EventIdx buffers.

In short, the first one mimics the Doorbell registers in memory, allowing the emulated controller to fetch the Doorbell value when convenient instead of waiting for the Doorbell register to be written. For its part, the EventIdx provides a hint given by the emulated controller to tell the host if the Doorbell register needs to be updated (in case the emulated controller is not fetching the Doorbell value from the Shadow Doorbell buffer). You can check section 7.13 of the specification for an example of usage.

Results

The following test results were obtained in a machine of type n1-standard-4 (4 vCPUs, 15 GB memory) at Google Cloud Engine platform with Kernel 4.12.0-rc5 using the following command:

$ sudo fio --time_based --name=benchmark --runtime=30 \ --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 \ --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=1 \ --rw=randread --blocksize=4k --randrepeat=0

Results (in Input/Ouput Operations per Second):Without Shadow Doorbell and EventIdx buffers: 43.9K IOPSWith Shadow Doorbell and EventIdx buffers: 184K IOPSGain ~= 4 times

Screenshot - Without Shadow Doorbell and EventIdx buffers

Screenshot - With Shadow Doorbell and EventIdx buffers

Enjoy your enhanced numbers of IOPS! :D

by Helen M. Koike Fornazier (noreply@blogger.com) at June 13, 2017 04:12 PM

June 01, 2017

memcpy.io - Robert Foss

Android: NXP i.MX6 Buffer Modifier Support

With modifier support added to Mesa and gbm_gralloc, it is now possible to boot Android on iMX6 platforms using no proprietary blobs at all. This makes iMX6 one of the very few embedded SOCs that needs no blobs at all to run a full graphics stack.

Not only is that a great win for Open Source in general, but it also makes the iMX6 more attractive as a platform. A further positive point is that this lays the groundwork for the iMX8 platform, and supporting it will come much easier.

What are modifiers used for?

Modifiers are used to represent different properties of buffers. These properties can cover a range of different information about a buffer, for example compression and tiling.

For the case of the iMX6 and the Vivante GPU which it is equipped with, the modifiers are related to tiling. The reason being that buffers can be tiled in different ways (Tiled, Super Tiled, etc.) or not at all (Linear). Before sending buffers out to a display, they need to have the associated tiling information made available, so that the actual image that is being sent out is not tiled.

How was support added?

Support was added in two places; Mesa and gbm_gralloc. Mesa has had support added to many of the buffer allocation functions and to GBM (which is the API provided by Mesa, that gbm_gralloc uses).

gbm_gralloc in turn had support added for using a new GBM API call, GBM_BO_IMPORT_FD_MODIFIER, which imports a buffer object as well as accompanying information like modifier used by the buffer object in question.

Getting up and running

Currently the modifiers work is in the process of being upstreamed, but in the meantime it can be found here. If you'd like to test this out yourself a How-To can be found here.

Thanks

This work is built on the efforts of a lot people:

This post has been a part of work undertaken by my employer Collabora, and has been funded by Zodiac Inflight Innovations.

by Robert Foss at June 01, 2017 10:00 PM

Android: iMX6 buffer modifier support

With modifier support added to Mesa and gbm_gralloc, it is now possible to boot Android on iMX6 platforms using no proprietary blobs at all. This makes iMX6 one of the very few embedded SOCs that needs no blobs at all to run. Not only is that a great win for Open Source in general, but it also makes the iMX6 more attractive as a platform.

Currently the modifiers work is in the process of being upstreamed, but in the meantime it can be found here. If you'd like to test this out yourself I maintain a how-to.

What are modifiers used for?

Modifiers are used to represent different properties of buffers. These properties can cover a range of different information about a buffer, for example compression and tiling.

For the case of the iMX6 and the Vivante GPU which it is equipped with, the modifiers are related to tiling. The reason being that buffers can be tiled in different ways (Tiled, Super Tiled, etc.) or not at all (Linear). Before sending buffers out to a display, they need to have the associated tiling information made available, so that the actual image that is being sent out is not tiled.

This of course raises the question "Why use tiling at all?", to which the short answer is power efficiency, which is very desirable in the embedded as well as the mobile space.

How was support added?

Support was added in two places; Mesa and gbm_gralloc. Mesa has had support added to many of the buffer allocation functions and to GBM (which is the API provided by Mesa, that gbm_gralloc uses).

gbm_gralloc in turn had support added for using a new GBM API call, GBM_BO_IMPORT_FD_MODIFIER, which imports a buffer object as well as accompanying information like modifier used by the buffer object in question.

Thanks

This work is built on efforts by a lot people:

This post has been a part of work undertaken by my employer Collabora.

by Robert Foss at June 01, 2017 10:00 PM

May 04, 2017

Sebastian Reichel

I2C kernel driver testing using virtme

The last few days I worked on the MCP23017 kernel driver and wondered about a good method to test my changes in a comfortable way. Fortunately I built myself an i2c-tiny-usb adapter some time ago, which is supported by mainline Linux. Thus any system with USB host support could be used for testing the above chip. My minimal test-setup can be seen in the image below. Basically I supplied 5V, Ground, SCL & SDA from the adapter to MCP23017, connected the low-active reset pin to 5V and the address-selection pins to Ground.

I2C-Tiny-USB with MCP23017

The obvious platform would be my development notebook, but rebooting for every test is a annoying. Kernel developers working on scheduler e.t.c. often use qemu for testing, so I wondered if I could do the same by forwarding the USB device. Also I do not want to spend time for configuring a rootfs for the emulated system. Thankfully Andy Lutomirski wrote virtme, which takes care of this by using the normal rootfs. I created a Debian package, which can be found on the following git repository at collabora.

$ cd ~/src/collabora/linux
$ # edit .config for usage with qemu (enable VIRTIO & 9P related options)
$ make -j4
...
Kernel: arch/x86/boot/bzImage is ready  (#13)
  MODPOST 3326 modules
$ # ensure your user has permissions for USB device, so that qemu can grab it
$ virtme-run --kdir ~/src/collabora/linux --qemu-opts -usb -usbdevice host:0403:c631
kernel boot messages...
[    3.044289] virtme-init: udevd not found
[    3.126059] usb 1-1: New USB device found, idVendor=0403, idProduct=c631
[    3.127210] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    3.127837] usb 1-1: Product: i2c-tiny-usb
[    3.128336] usb 1-1: Manufacturer: Till Harbaum
[    3.131808] i2c-tiny-usb 1-1:1.0: version 1.05 found at bus 001 address 002
[    3.138173] i2c i2c-0: connected i2c-tiny-usb device
virtme-init: console is ttyS0
root@(none):/# i2cdetect -l
i2c-0   i2c         i2c-tiny-usb at bus 001 device 003  I2C adapter
root@(none):/# i2cdetect -y 0
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: 20 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --                       
root@(none):/# echo mcp23017 0x20 > /sys/bus/i2c/devices/i2c-0/new_device
[   71.762916] i2c i2c-0: new_device: Instantiated device mcp23017 at 0x20
root@(none):/# cat /sys/kernel/debug/gpio 
gpiochip0: GPIOs 496-511, parent: i2c/0-0020, mcp23017, can sleep:
root@(none):/# echo 496 > /sys/class/gpio/export
root@(none):/# echo 497 > /sys/class/gpio/export
root@(none):/# echo 498 > /sys/class/gpio/export
root@(none):/# echo out > /sys/class/gpio/gpio497/direction
root@(none):/# echo out > /sys/class/gpio/gpio498/direction
root@(none):/# echo 1 > /sys/class/gpio/gpio498/value
root@(none):/# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 496-511, parent: i2c/0-0020, mcp23017, can sleep:
 gpio-496 P0.0 (sysfs       ) in  lo   
 gpio-497 P0.1 (sysfs       ) out lo   
 gpio-498 P0.2 (sysfs       ) out hi 
root@(none):/# cat /sys/kernel/debug/regmap/0-0020/registers
00: fff9
02: 0000
04: 0000
06: 0000
08: 0000
0a: 0808
0c: 0000
0e: 0000
10: 0000
14: 0004
# emulating short power-loss by pulling reset to low for some seconds
root@(none):/# cat /sys/kernel/debug/regmap/0-0020/registers
00: fff9
02: 0000
04: 0000
06: 0000
08: 0000
0a: 0808
0c: 0000
0e: 0000
10: 0000
14: 0004
# that's not what is in the registers after power-loss, its from regmap
# cache, since regmap does not know about the power-loss.
root@(none):/# cat /sys/kernel/debug/gpio
[  170.011018] mcp230xx 0-0020: restoring reg 0x00 from 0xffff to 0xfff9 (power-loss?)
[  170.098648] mcp230xx 0-0020: restoring reg 0x05 from 0x0000 to 0x0808 (power-loss?)
gpiochip0: GPIOs 496-511, parent: i2c/0-0020, mcp23017, can sleep:
 gpio-496 P0.0 (sysfs       ) in  lo   
 gpio-497 P0.1 (sysfs       ) out lo   
 gpio-498 P0.2 (sysfs       ) out lo

So my patch moving to regmap based caching work as expected and surpass the original driver, since it can detect & fix context-loss scenarios in debugfs :) While those are quite unlikely to appear on production hardware, MCP23xxx is commonly used by hobbyists and their hardware is often less stable.

As a by-product of my work on mcp23xxx I also had to fix i2c-tiny-usb, though. Before kernel 4.9 the driver worked flawlessly, but since then it will complain, that the transfer buffers are not DMA capable. A patch for that has been sent and will hopefully be backported.

All in all I must say the virtme based testing was really nice and I hope I can use it again.

– Sebastian

May 04, 2017 01:30 PM

May 03, 2017

Helen Koike

Collabora Contributions to Linux Kernel 4.11

Linux Kernel v4.11 was released, and 9 different Collabora developers contributed a total of 44 patches, an increase of 5 patches from version 4.10. The majority of Collabora's work this time was around fixes and clean ups in the DRM. In addition to our contributions as authors, Collabora also added 22 Reviewed-by tags for patches reviewed by our engineers. You can learn more information about the v4.11 merge window in LWN.net's extensive coverage: part 1, part 2 and part 3.

Now here is a look at the specific changes made by Collaborans. To begin with, Enric Balletbo fixed an issue and improved documentation of IIO regarding sensors and also added support for several buses and peripherals for the Toby-Churchill SL50 board. Romain Perier added an ASoC machine driver for Rockchip rk3288-based boards that have an HDMI and analog audio output, and also added support for slave mode in the Everest Semi ES8328 audio codec, while including ES8388 as a compatible device in the ES8328's codec driver.

For his part, Tomeu Vizoso fixed the sink display error in DRM EDID when no deep color is available for Rotel RSX-1058 and also fixed several issues and code cleanups regarding CRC in DRM and integrated the new CRC debugfs API in i915. Gabriel Krisman Bertazi, who wrote a great article recently on tracing the user space and Operating System interactions, made several improvements to DRM by adding documentation, cleaning up the code and fixing several issues, including allowing QXL build when FBDEV_EMULATION is disabled.

Lastly, Daniel Stone, Collabora's Graphics Lead, fixed an important issue in DRM regarding the use of Atomic State in legacy ioctls, while Fabien Lahoudere cleaned up the code for the Epson RTC removing an unnecessary spinlock, and Robert Foss fixed a copy of uninitialized memory in Ethernet qed code.

Here is the complete list of Collabora contributions:

Daniel Stone (1):

Enric Balletbo i Serra (10):

Fabien Lahoudere (1):

Gabriel Krisman Bertazi (19):

Gustavo Padovan (1):

Martyn Welch (1):

Romain Perier (4):

Tomeu Vizoso (6):

Robert Foss (1):

Reviewed-by:

Gustavo Padovan (8):

Emil Velikov (5):

Gabriel Krisman Bertazi (3):

Robert Foss (3):

Daniel Stone (2):

Tomeu Vizoso (1):

by Helen M. Koike Fornazier (noreply@blogger.com) at May 03, 2017 01:17 PM

April 26, 2017

memcpy.io - Robert Foss

Android: Getting up and running on the iMX6

Since the hardware very much matters this is going to be divided into a few parts, the common steps and the hardware specific ones.

This post is a bit of a living document and will be changed over time, and if you have any questions about it, please reach out through email (robert.foss at collabora.com) or irc (tomeu or robertfoss on #dri-devel on freenode).

Changelog:
2017-04-27: build_android.sh, setup_sdcard.sh: Added -b [device] support to build_android.sh and setup_sdcard.sh
2017-05-02: build_android.sh: Don't write to SD-card without -b option
2017-05-04: Switch git repo urls to shared repository
2017-05-09: Add compiler installation to apt-get
2017-05-09: Re-ordered some instructions
2017-05-10: Add java installation to apt-get
2017-05-19: Change paths to android-etnaviv instead of client
2017-05-19: Change kernel branch
2017-05-19: Add -d, device flag and info about it
2017-05-19: Add sabrelite board info
2017-06-03: Add more apt-get packages
2017-06-10: Add lzop apt-get packages

Common steps

# Install dependencies
sudo apt install u-boot-tools gcc-arm-linux-gnueabihf openjdk-8-jdk android-tools-fsutils python-mako git-core gnupg flex bison gperf build-essential zip curl zlib1g-dev gcc-multilib g++-multilib libc6-dev-i386 lib32ncurses5-dev x11proto-core-dev libx11-dev lib32z-dev ccache libgl1-mesa-dev libxml2-utils xsltproc unzip lzop

mkdir /opt/android
cd /opt/android
repo init -u https://android.googlesource.com/platform/manifest -b android-7.1.1_r28
cd /opt/android/.repo
git clone https://customer-git.collabora.com/git/android-etnaviv/android_manifest.git local_manifests -b android-etnaviv
repo sync -j10

mkdir /opt/imx6_android
cd /opt/imx6_android

# Fetch Kconfig, bootloaders and some scripts
git clone https://customer-git.collabora.com/git/android-etnaviv/android-etnaviv.git .

# Fetch the Linux Kernel
git clone https://customer-git.collabora.com/git/android-etnaviv/linux.git -b android-etnaviv

# This will destroy all data on /dev/mmcblk0 and
# create boot/system/cache/data partitions
./setup_sdcard.sh -b /dev/mmcblk0

Hardware: iMX6 Sabre

Build Android and Linux

# Build android, the kernel, and flash it onto an SD-card
# Run build_android with the correct -d flag
./build_android.sh -b /dev/mmcblk0 -d imx6q-sabre
./build_android.sh -b /dev/mmcblk0 -d imx6qp-sabre

Start Android

The SD-card can now be put into the SD3 slot and the device can be restarted.

Hardware: iMX6 Sabrelite

Build Android and Linux

# Build android, the kernel, and flash it onto an SD-card
# Run build_android with the correct -d flag
./build_android.sh -b /dev/mmcblk0 -d imx6q-sabrelite

Start Android

The micro-sd card can now be put into the micro-sd slot and the device can be restarted.

Hardware: RDU2

Build Android and Linux

# Build android, the kernel, and flash it onto an SD-card
# Run build_android with the correct -d flag
./build_android.sh -b /dev/mmcblk0 -d imx6q-zii-rdu2
./build_android.sh -b /dev/mmcblk0 -d imx6qp-zii-rdu2

Install the bootloader

# Depending if you have a >=13" version of the RDU2
# use the imx6qp, if <13" then use the imx6q

IMX6_TYPE=imx6q
IMX6_TYPE=imx6qp
BAREBOX="zodiac/barebox-zii-${IMX6_TYPE}-rdu2.img"

# Flash bootloader to SD-card
dd if=${BAREBOX} of=/dev/mmcblk0 bs=1k
sync

# Put SD-card in the middle-most slot on the RDU2

# Install lrzsz, since it is used for a ymodem upload
sudo apt install lrzsz

# Connect to serial device /dev/ttyUSB2 and
# /dev/ttyUSB3 with minicom
# The numbering assumes the RDU2 is the only serial
# serial device connected
sudo minicom -s
    +------------------------------------------+
    | A -    Serial Device      : /dev/ttyUSB3
    | B - Lockfile Location     : /var/lock
    | C -   Callin Program      :
    | D -  Callout Program      :
    | E -    Bps/Par/Bits       : 115200 8N1
    | F - Hardware Flow Control : No
    | G - Software Flow Control : No
    |
    |    Change which setting?
    +------------------------------------------+

# Connect to Quark console on /dev/ttyUSB3
# Set boot SD-card as boot source 
#HostBoot s 0
reset

# Restart device, connect to barebox loaded just loaded
# from the SD-card on /dev/ttyUSB2
pic_setwdt 0 60
loady

# Using the minicom quickly initiate a ymodem file
# of the same barebox image you wrote to the SD-card
# Be quick, the upload will timeout after a few seconds

# Write the bootloader to SPI NOR
erase /dev/m25p0.barebox
# Depending on your RDU2 type flash one of the following
cp barebox-zii-imx6q-rdu2.img /dev/m25p0.barebox
# Or
cp barebox-zii-imx6qp-rdu2.img /dev/m25p0.barebox

# Connect to the Quark console on /dev/ttyUSB3 again
# Set SPI NOR as the boot source
#HostBoot s 2
reset

# Connect to the barebox console on /dev/ttyUSB2 again
# Edit configuration to automatically boot from mmc:
sedit /env/config
export global.boot.default=/env/boot/mmc
export global.bootm.image=/mnt/mmc1.0/android_zImage
export global.bootm.initrd=/mnt/mmc1.0/android_ramdisk.img.gz
export global.bootm.oftree=/mnt/mmc1.0/imx6qp-zii-rdu2.dtb
export global.linux.bootargs.base="console=ttymxc0,115200 console=tty0 rw rootwait ip=dhcp buildvariant=userdebug debug ignore_loglevel root=/dev/mmcblk0p2 rootfstype=ext4 rootwait init=/init printk.devkmsg=on verbose enforcing=0 androidboot.selinux=permissive drm.debug=0x00"

sedit /env/boot/mmc
#!/bin/sh
detect mmc1
mkdir -p /mnt/mmc1.0
automount -d /mnt/mmc1.0 'mount /dev/mmc1.0 /mnt/mmc1.0'
bootm

pic_setwdt 0 60     # Disable watchdog

exit

Start Android

The SD-card created in the common steps can now be put into the middlemost slot and the device can be restarted.

Thanks

This work is built on efforts by a lot people:

  • Pengutronix who's been doing i.MX6 platform work.
  • Christian Gmeiner, Wladimir Van Der Laan, and the other etanviv developers.
  • Rob Herring at Linaro for getting the ball rolling with AOSP for Zii.
  • Andrey Smirnov for driver support for the RDU2 such as i.MX6 PCI, ARM PL310 L2 Cache controller, RTC, and other i.MX6qp driver fixups.

This post has been a part of work undertaken by my employer Collabora.

by Robert Foss at April 26, 2017 10:00 PM

April 23, 2017

memcpy.io - Robert Foss

Android: Getting up and running on the iMX6

Since the hardware very much matters this is going to be divided into a few parts, the common steps and the hardware specific ones.

Common steps

mkdir /opt/android
repo init -u https://android.googlesource.com/platform/manifest -b android-7.1.1_r28
cd /opt/android/.repo
git clone git://git.collabora.com/git/user/robertfoss/android_manifest.git local_manifests -b etnaviv-android
repo sync -j75

mkdir /opt/imx6_android
cp /opt/imx6_android
git clone git://git.collabora.com/git/user/robertfoss/linux.git -b imx_rdu2_v4.11-rc3

# The mkimage tool is used even if you're not
# using u-boot it as a bootloader
sudo apt install u-boot-tools

# Fetch Kconfig, bootloaders and some scripts
git clone git://git.collabora.com/git/user/robertfoss/rdu2.git .

# This will destroy all data on /dev/mmcblk0 and
# create boot/system/cache/data ext4 partitions 
./setup_sdcard.sh /dev/mmcblk0

# Build android, the kernel, and flash it onto an SD-card
./build_android.sh

Hardware: iMX6 Sabre

Select correct dtb file for u-boot

# Uncomment the correct dtb file for your platform
nano uboot_android_boot.scr
setenv fdt_file imx6q-sabresd.dtb
#setenv fdt_file imx6qp-sabresd.dtb

# Run build script again, to make sure boot.scr
# is created and moved to the SD-card
./build_android.sh

Start Android

The SD-card can now be put into the middlemost slot and the device can be restarted.

Hardware: RDU2

Install the bootloader

# Depending if you have a >=13" version of the RDU2
# use the imx6qp, if <13" then use the imx6q

IMX6_TYPE=imx6q
IMX6_TYPE=imx6qp
BAREBOX="zodiac/barebox-zii-${IMX6_TYPE}-rdu2.img"

# Flash bootloader to SD-card
dd if=${BAREBOX} of=/dev/mmcblk0 bs=1k
sync

# Put SD-card in the middle-most slot on the RDU2

# Install lrzsz, since it is used for a ymodem upload
sudo apt install lrzsz

# Connect to serial device /dev/ttyUSB2 and
# /dev/ttyUSB3 with minicom
# The numbering assumes the RDU2 is the only serial
# serial device connected
sudo minicom -s
    +------------------------------------------+
    | A -    Serial Device      : /dev/ttyUSB3
    | B - Lockfile Location     : /var/lock
    | C -   Callin Program      :
    | D -  Callout Program      :
    | E -    Bps/Par/Bits       : 115200 8N1
    | F - Hardware Flow Control : No
    | G - Software Flow Control : No
    |
    |    Change which setting?
    +------------------------------------------+

# Connect to Quark console on /dev/ttyUSB3
# Set boot SD-card as boot source 
#HostBoot s 0
reset

# Restart device, connect to barebox loaded just loaded
# from the SD-card on /dev/ttyUSB2
pic_setwdt 0 60
loady

# Using the minicom quickly initiate a ymodem file
# of the same barebox image you wrote to the SD-card
# Be quick, the upload will timeout after a few seconds

# Write the bootloader to SPI NOR
erase /dev/m25p0.barebox
# Depending on your RDU2 type flash one of the following
cp barebox-zii-imx6q-rdu2.img /dev/m25p0.barebox
# Or
cp barebox-zii-imx6qp-rdu2.img /dev/m25p0.barebox

# Connect to the Quark console on /dev/ttyUSB3 again
# Set SPI NOR as the boot source
#HostBoot s 2
reset

# Edit configuration to automatically boot from mmc:
sedit /env/config
export global.boot.default=/env/boot/mmc
export global.bootm.image=/mnt/mmc1.0/android_zImage
export global.bootm.initrd=/mnt/mmc1.0/android_ramdisk.img.gz
export global.bootm.oftree=/mnt/mmc1.0/imx6qp-zii-rdu2.dtb
export global.linux.bootargs.base="console=ttymxc0,115200 console=tty0 rw rootwait ip=dhcp buildvariant=userdebug debug ignore_loglevel root=/dev/mmcblk0p2 rootfstype=ext4 rootwait init=/init printk.devkmsg=on verbose enforcing=0 androidboot.selinux=permissive drm.debug=0x00"

sedit /env/boot/mmc
#!/bin/sh
detect mmc1
mkdir -p /mnt/mmc1.0
automount -d /mnt/mmc1.0 'mount /dev/mmc1.0 /mnt/mmc1.0'
bootm

pic_setwdt 0 60     # Disable watchdog

exit

Start Android

The SD-card created in the common steps can now be put into the middlemost slot and the device can be restarted.

Thanks

This work is built on efforts by a lot people:

  • Pengutronix who's been doing i.MX6 platform work.
  • Christian Gmeiner, Wladimir Van Der Laan, and the other etnviv developers.
  • Rob Herring at Linaro for getting the ball rolling with AOSP for Zii.
  • Andrey Smirnov for driver support for the RDU2 such as i.MX6 PCI, ARM PL310 L2 Cache controller, RTC, and other i.MX6qp driver fixups.

This post has been a part of work undertaken by my employer Collabora.

by Robert Foss at April 23, 2017 10:00 PM

Pekka Paalanen

Improved appearance for the simplest Wayland client

Of the Wayland demo clients in the Weston repository, simple-shm is the simplest. All the related code is in that one file, and it interfaces directly with libwayland. It does not use GL or EGL, so it can be ran on systems where the EGL stack does not support the Wayland platform nor extensions. However, what it renders, is surprising:
The original simple-shm client on a Weston desktop.

The square with apparently garbage texture is the original simple-shm. To any graphics developer, who does not know any better, that immediately looks like something is wrong with the image stride somewhere in the graphics stack. That really is what it was supposed to look like, not a bug.

I decided to propose a different rendering, that would not look so much like a bug, and had some real diagnostic value.
The proposed appearance of simple-shm, the way it is supposed to look like.
The new appearance has some vertical bars moving from left to right, some horizontal bars moving upwards, and some circles that shrink into the center. With these, you can actually see if there is a stride bug somewhere, or non-uniform scaling. There is one more diagnostic feature.
This is how the proposed simple-shm looks like when the X-channel is mistaken as alpha.
Simple-shm uses XRGB buffers. If the compositor does not properly ignore the X-channel, and uses it as alpha, you will see a cross over the image. Depending on whether the compositor repaints what is below simple-shm or not, the cross will either saturate to white or show the background through. It is best to have a bright background picture to clearly see it.

I do hope no-one gets hypnotized by the animation. ;-)

by pq (noreply@blogger.com) at April 23, 2017 08:07 AM

What does EGL do in the Wayland stack

Recently I drew some diagrams of how an EGL library relates to the Wayland stack. Here I am presenting the Mesa EGL version of them with the details explained.
Mesa EGL with Wayland, and simplified X as comparison.


X11 part

The X11 part of the diagram is very much simplified. It completely ignores indirect rendering, DRI1, details of DRI2, and others. It only shows, that a direct rendering X11 EGL application uses the X11 protocol to create an X11 window, and the Mesa EGL X11 platform uses the DRI2 protocol in some way to communicate with the X server. Naturally the application also uses one of the OpenGL interfaces. The X server has hardware or platform specific drivers that are generally referred to as DDX. On the Linux DRI stack, these call into libdrm and the various driver specific sub-libraries. In the end they use the kernel DRM services, like kernel mode setting (KMS). All this in the diagram is just for comparison with a Wayland stack.

Wayland server

The Wayland server in the diagram is Weston with the DRM backend. The server does its rendering using GL ES 2, which it initialises by calling EGL. Since the server runs on "bare KMS", it uses the EGL DRM platform, which could really be called as the GBM platform, since it relies on the Mesa GBM interface. Mesa GBM is an abstraction of the graphics driver specific buffer management APIs (for instance the various libdrm_* libraries), implemented internally by calling into the Mesa GPU drivers.

Mesa GBM provides graphics memory buffers to Weston. Weston then uses EGL calls to bind them into GL objects, and renders into them with GL ES 2. A rendered buffer is shown on an output (monitor) by queuing a page flip via the libdrm KMS API.

If the EGL implementation offers the extension EGL_WL_bind_wayland_display, Weston will use it to register its wl_display object (facing the clients) to EGL. In practice, the Mesa EGL then adds a new global Wayland object to the wl_display. That object (or interface) is called wl_drm, and the server will automatically advertise that to all clients. Clients will use wl_drm for DRM authentication, getting the right DRM device node, and sharing graphics buffers with the server without copying pixels.

Wayland client

A Wayland client, naturally, connects to a Wayland server, and gets the main Wayland protocol object wl_display. The client creates a window, which is a Wayland object of type wl_surface. All what follows is enabled by the Wayland platform support in Mesa EGL.

The client passes the wl_display object to eglGetDisplay() and receives an EGLDisplay to be used with EGL calls. Then comes the trick that is denoted by the double-arrowed blue line from Wayland client to Mesa EGL in the diagram. The client calls the wayland-egl API (implemented in Mesa) function wl_egl_window_create() to get the native window handle. Normally you would just use the "real" native window object wl_surface (or an X11 Window if you were using X). The native window handle is used to create the EGLSurface EGL handle. Wayland has this extra step and the wayland-egl API because a wl_surface carries no information of its size. When the EGL library allocates buffers, it needs to know the size, and wayland-egl API is the only way to tell that.

Once EGL Wayland platform knows the size, it can allocate a graphics buffer by calling the Mesa GPU driver. Then this graphics buffer needs to be mapped into a Wayland protocol object wl_buffer. A wl_buffer object is created by sending a request through the wl_drm interface carrying the name of the (DRM) graphics buffer. In the server side, wl_drm requests are handled in the Mesa EGL library, where the corresponding server side part of the wl_buffer object is created. In the diagram this is shown as the blue dotted arrow from EGL Wayland platform to itself. Now, whenever the wl_buffer object is referenced in the Wayland protocol, the server knows exactly what it is.

The client now has an EGLSurface ready, and renders into it by using one of the GL APIs or OpenVG offered by Mesa. Finally, the client calls eglSwapBuffers() to show the result in its Wayland window.

The buffer swap in Mesa EGL Wayland platform uses the Wayland core protocol and sends an attach request to the wl_surface, with the wl_buffer as an argument. This is the blue dotted arrow from EGL Wayland platform to Wayland server.

Weston itself processes the attach request. It knows the buffer is not a shm buffer, so it passes the wl_buffer object to the Mesa EGL library in an eglCreateImageKHR() function call. In return Weston gets an EGLImage handle, which is then turned into a 2D texture, and used in drawing the surface (window). This operation is enabled by EGL_WL_bind_wayland_display extension.

Summary

The important facts, that should be apparent in the diagram, are:
  • There are two different EGL platforms in play: one for the server, and one for the clients.
  • A Wayland server does not contain any graphics hardware or driver specific code, it is all in the generic libraries of DRM, EGL and GL (libdrm and Mesa).
  • Everything about wl_drm is an implementation detail internal to the EGL library in use.
The system dependent part of Weston is the backend, which somehow must be able to drive the outputs. The new abstractions in Mesa (GBM API) make it completely hardware agnostic on standard Linux systems. Therefore every Wayland server implementation does not need its own set of graphics drivers, like X does.

It is also worth to note, that 3D graphics on X uses very much the same drivers as Wayland. However, due to the Wayland requirements from the EGL framework (extensions, EGL Wayland platform), proprietary driver stacks need to specifically implement Wayland support, or they need to be wrapped into a meta-EGL-library, that glues Wayland support on top. Proprietary drivers also need to provide a way to use accelerated graphics without X, for a Wayland server to run without X beneath. Therefore the desktop proprietary drivers like Nvidia's have a long way to go, as currently nvidia does not implement EGL at all, no support for Wayland, and no support for running without X, or even setting a video mode without X.

Due to the way wl_drm is totally encapsulated into Mesa EGL and how the interfaces are defined for the EGL Wayland platform and the EGL extension, another EGL implementor can choose their very own way of sharing graphics buffers instead of using wl_drm.

There are already plans to change to some of the architecture described in this article, so it is possible that details in the diagram become outdated fairly soon. This article also does not consider a purely software rendered Wayland stack, which certainly would lift all these requirements, but quite likely be too slow in practice for the desktop.

See also: the authoritative description of the Wayland architecture

by pq (noreply@blogger.com) at April 23, 2017 08:07 AM

Wayland R&D at Collabora

While being contracted by Collabora, I started a Wayland R&D project in October 2011 with the primary goal of getting to know Wayland, and strengthening Wayland expertise in Collabora. During the four months I started the wl_shell_surface protocol for desktops, added screen locking, ported an X screensaver to Wayland with new protocol, and most recently implemented surface transformations in Weston (the reference compositor, originally the wayland-demos compositor). All this sponsored by Collabora.

The project started by getting wayland-demos running under X, and then looking into the bugs I hit. To rule out problems in hardware GL renderer, I also got the demos running with softpipe and llvmpipe. Trying to fix segmentation faults and other obvious problems was my stepping stone into the Wayland code base.

My first real piece of work was screen locking. That included adding special protocol for it, having a way to have privileged Wayland clients, implementing locking in the shell plugin in the compositor, and writing an unlock dialog for the desktop-shell client. Those are the obvious parts. I also had to extend the shell plugin interface, find a way to hide surfaces so they do not render while the screen is locked, and of course bug hunting and patch set rebasing and rewriting, before screen locking landed upstream.

Next was porting an X screensaver as a regular Wayland client. Once that worked, I extended the protocol by adding a screensaver interface, and made the shell plugin automatically start the screensaver application. Handling screensavers would have been a walk in the park, except I needed shell-specific data to be attached to all surfaces. I wrote a hacky solution, but in the end, Kristian Høgsberg wanted me to add a whole new interface into the shell protocol for this. It became the wl_shell_surface interface, and all demo clients needed to adopt it. Yet that was not all. Since we are used to have per-monitor screensavers, I needed my screensaver to set different instances for each monitor. Hence I had to add output event callbacks in the toytoolkit.

A cleanup phase came next, I took Valgrind and ran. I fixed a pile of memory leaks and wrote missing destructor functions all over, in compositor, clients and the toytoolkit, at the same time collecting a Valgrind suppressions list to ease Valgrinding in the future. This work included adding some ad hoc way of cleanly exiting demo clients.

In January there were some discussions on maximised and full-screen surfaces, what they are and how they should be implemented. Surface scaling was raised as one point. Weston already had the zoom effect, and full-screen scaling would be another surface transformation, so I decided to write a transformation matrix stack for supporting any number of simultaneous transformations. It turned out to be a three week task.

Implementing surface transformations required changes all over Weston. First, I needed a way to invert the transformation which is a 4-by-4 matrix. After searching in vain for a MIT-licenced C implementation I wrote one myself, based on LU-decomposition. I believe LU-decomposition is more efficient on a 4x4 matrix than the cofactor method. Along the inversion routines, I wrote a unit test application for testing the speed and precision of the inversion. Detecting and dealing with non-invertible transformations is also important.

Going through the transformation stack every time you need to transform a point might be costly, so I added a cached total transform and its inverse. Implementing input redirection was a simple matter of applying the inverse total transform to pointer coordinates. Needing a way to test transformations, I added a Weston key binding for rotating surfaces, and modified an existing demo application to mark the clicked point. Adding functions for explicitly converting between display global coordinates and surface local coordinates (surface local are the only ones a client knows of) clarified some of the coordinate computations.

Surface painting and damage region tracking needed fixes, too. Previously, a zoomed surface was repainted as a whole, and it forced a full display redraw, i.e. damaging the whole display. Transformed surface repaint needed to start honoring the repaint regions, so we could avoid excessive repainting. Damage and repaint regions are tracked as global coordinate axis aligned rectangles. Whenever a transformed surface is damaged (requires repainting), we need to compute the bounding box for the damage instead of simply using the global x, y of the top-left corner and the surface width, height. Then during surface painting, we take the list of damage rectangles, and render only those. Surface local coordinates (texture coordinates) are computed via the inverse transformation. This method may result in sampling outside of a surface's buffer (texture), so those samples need to be discarded in the fragment shader.

Other things that needed fixing after the surface transformations were window move and resize. Before fixing, moving a surface would not follow the pointer but move in the surface local orientation. Resize needed the same orientation fix, and another fix in relative surface motion that a client can set in the surface's attach request.

What you mostly see as the result of the surface transformations work is, that you can rotate any normal window, no application support needed. The pointer position on screen, over a window, accurately corresponds to what the application receives as the local pointer location. I did not realise it at the time, but this input redirection working flawlessly became an appreciated feature. Apparently it is hard or impossible to do in X, I would not know. In Wayland, and for me, it was just another relatively easy bug to be fixed. The window rotation feature was meant purely for debugging surface transformations.

Two rotated windows and some flowers.
There are still further issues to be fixed with surface transformations. Relative surfaces, like pop-up windows and menus, are not transformed and appear at a wrong location. Pointer cursors are not transformed; you would want the text bar cursor to be aligned with the text orientation. Continuously resizing a transformed window from its (locally) top-left corner makes the window drift away. We are probably still damaging larger regions than absolutely necessary for repaints. Repaint optimisation of opaque surfaces does not work with transformations.

During all this work of four months there were also the usual bug hunts, enhancements and fixes all over. For example, decorationless EGL apps, which turned out to have been a bug in Cairo, and moving the configuration file parser into a helper library that is shared between clients and the compositor.

Now, I am done with the Wayland R&D project and moving into another project at Collabora. In the new project I will continue working on Wayland, Weston, and the demos.

by pq (noreply@blogger.com) at April 23, 2017 08:07 AM

Nokia N9 Music Player and Album Cover Art

I recently got a Nokia N9 phone. One of the first things I did was copy my music collection into it. Since the player shows also album cover images, if such are stored, I started adding them -- not by embedding them into ID3v2 tags but as separate files, to avoid useless copies of images.

Usually it is as simple as putting a cover.jpg file into a directory, that contains a single album. Sometimes and in some cases, though, that does not work. I found out, that the N9's default music player is supposed to follow Media Art Storage specification. That gave me hints.

If a directory contains more than one album, you can name the cover image files according to the album, for example 'Back in Black.jpg' and 'Flick of the Switch.jpg', as long as the names correspond the ID3 tag album name (somehow?).

My real problem case was a directory full of songs downloaded from Nectarine. I edited them all (EasyTAG is a wonderful tool) to make the ID3 album tag "Nectarine" because I wanted to have them all under the same "album", and there are over 50 songs in that single directory. Simply adding a cover.jpg or Nectarine.jpg did not work.

There are two possible reasons that I found. First, the directory contains too many files, according to the Media Art Storage spec. Second, apparently the cover art is not taken into use, unless at least one song file, which would use that cover art, is touched (modification date updated).

I created a new directory, moved one Nectarine song into it, and put Nectarine.jpg there, too. And it started to work, for all my Nectarine songs.

There is software called Tracker in the N9, which maintains some sort of database of all media. Also album cover art gets used via Tracker. If you ssh into your phone, and move around your media files, Tracker update is not automatically triggered. You could use the command tracker-control -r to force a full rebuild when you launch e.g. the music player the next time, but the rebuild will take a long time. An easy way to force a faster rebuild is to plug the N9 into a computer via USB, and then unplug it.

by pq (noreply@blogger.com) at April 23, 2017 08:07 AM

A Wayland screensaver

Now that screen locking is done in Wayland demos, it is time to go for the eye-candy: full-screen idle animations, also known as screensavers. The first step was to port an existing screensaver to Wayland. I chose glmatrix from XScreenSaver, because it is cool, and it renders with OpenGL. This way I did not have to port Xlib based rendering to Cairo (yay!).

Here is GLMatrix running as a regular, windowed application on Wayland, using the toytoolkit:
GLMatrix on the Wayland demo compositor.
On Wayland, screensavers can be reduced to pure animation applications, while the compositor handles everything about locking. Next, we need a Wayland protocol extension to actually use this idle-animation in a screensaver'y way.

GLMatrix is already in the Wayland demo repository as a client called wscreensaver, and it requires cairo-gl, just like gears does.

by pq (noreply@blogger.com) at April 23, 2017 08:06 AM

From pre-history to beyond the global thermonuclear war

This is a short and vague glimpse to the interfaces that the Linux kernel offers to user space for display and graphics management, from the history to what is hot and new, to what might perhaps be coming after. The topic came current for me when I started preparing Weston for global thermonuclear war.

The pre-history


In the age of dragons, kernel mode setting did not exist. There was only user space mode setting, where the job of the kernel driver (if any) was simply to give user space direct access to the graphics card registers. A user space driver (well, Xorg video DDX, really, err... or what it was at the time of XFree86) would then poke the card registers to set a mode. The kernel had no idea of anything.

The kernel DRM infrastructure was started as an out-of-tree kernel module for cooperating between multiple programs wanting to access the graphics card's resources. Later it was (partially?) merged into the kernel tree (the year is a lie, 2.3.18 came out in 1999), and much much later it was finally deleted from the libdrm repository.

The middle age


For some time, the kernel DRM existed alongside user space mode setting. It was a dark time full of crazy hacks to keep it all together with duct tape, barbwire and luck. GPUs and hardware accelerated OpenGL started to come up.

The new age


With the invent of kernel mode setting (KMS), the DRM kernel drivers got in charge of the graphics card resources: outputs, video modes, memory allocations, hotplug! User space mode setting became obsolete and was eventually killed. The kernel driver was finally actually in control of the graphics hardware.

KMS probably started with just setting the main framebuffer (primary plane) for each "CRTC" and programming the video mode. A CRTC is for "cathode-ray tube controller", but essentially means a block that reads memory (a framebuffer) and produces a bitstream according to video mode timings. The bitstream is directed into an "encoder", which turns it into a proper physical/analogue signal, like VGA or digital DVI. The signal then exits the graphics card though a "connector". CRTC, encoder, and connector are the basic concepts in KMS API. Quite often these can be combined in some restricted ways, like a single CRTC feeding two encoders for clone mode.

Even ancient hardware supported hardware cursors: a small sprite that was composited into the outgoing video signal on the fly, which meant that it was very cheap to move around. Cursor being so special, and often with funny color format (alpha!), got its very own DRM ioctl.

There were also hardware overlays (additional or secondary planes) on some hardware. While the primary framebuffer covers the whole display, an overlay is another buffer (just like the cursor) that gets mixed into the bitstream at the CRTC level. It is like basic compositing done on the scanout hardware level. Overlays usually had additional benefits, for example they could apply scaling or color space conversion (hello, video players) very efficiently. Overlays being different, they too got their very own DRM ioctls.

The KMS user space ABI was anything but atomic. With the X11 tradition, it wasn't too important how to update the displays, as long as the end result eventually was what you wanted. Race conditions in content updates didn't matter too much either, as X was racy as hell anyway. You update the CRTC. Then you update each overlay. You might update the cursor, too. By luck, all these updates could hit the same vblank. Or not. Or you don't hit vblank at all, and get tearing. No big deal, as X was essentially all about front-buffer rendering anyway. (And then there were huge efforts in trying to fix it all up with X, GLX, Mesa and GL-compositors, and avoid tearing, and it ended up complicated.)

With the advent of X compositing managers, that did not play well with the  awkward X11 protocol (Xv) or the hardware overlays, and with rise of the  GPU power and OpenGL, it was thought that hardware overlays would  eventually die out. Turned out the benefits of hardware overlays were too great to abandon, and with Wayland we again have a decent chance to make the most of them while still enjoying compositing.

The global thermonuclear war (named after a git branch by Rob Clark)


The quality of display updates became important. People do not like tearing. Someone actually wanted to update the primary framebuffer and the overlays on the same vblank, guaranteed. And the cursor as the cherry on top.

We needed one ABI to rule them all.

Universal planes brings framebuffers (primary planes), overlays (secondary planes) and cursors (cursor planes) together under the same API. No more type specific ioctls, but common ioctls shared by them all. As these objects are still somewhat different, overlays having wildly differing features and vendors wanting to expose their own stuff, object properties were invented.

An object property is essentially a {key, value} pair. In the API, the name of a key is a string. Each object has its own set of keys. To use a key, you must know it by name, fetch the handle, and then use the handle when setting the value. Handles seem to be per-object, so make sure to fetch them separately for each.

Atomic mode setting and nuclear pageflip are two sides of the same feature. Atomicity is achieved by gathering a set of property changes, and then pushing them all into the kernel in a single ioctl call. Then that call either succeeds or fails as a whole. Libdrm offers a drmModePropertySet for gathering the changes. Everything is exposed as properties: the attached FB, overlay position, video mode, etc.

Atomic mode setting means setting the output modes of a single graphics device, more or less. Devices may have hard to express limitations. A simple example is the available scanout memory bandwidth: You can drive either two mid-resolution outputs, or one high-resolution output. Or maybe some crtc-encoder-connector combination is not possible with a particular other combination for another output. Collecting the video mode, encoder and connector setup over the whole grahics card into a single operation avoids flicker. Either the whole set succeeds, or it fails. Without atomic mode setting, changing multiple outputs would not only take longer, but if some step failed, you'd have to undo all earlier steps (and hope the undo steps don't fail). Plus, there would be no way to easily test if a certain combination is possible. Atomic mode setting fixes all this.

Nuclear pageflip is about synchronizing the update of a single output (monitor) and making that atomic. This means that when user space wants to update the primary framebuffer, move the cursor, and update a couple of overlays, all those changes happen at the same vblank. Again it all either succeeds or fails. "Every frame is perfect."

And then there shall be ponies (at the end of the rainbow)


Once the global thermonuclear war is over, we have the perfect ABI for driving display updates.

Well, almost. Enter NVidia G-Sync, or AMD's FreeSync which is actually backed by a VESA standard. Dynamically variable refresh rate. We have no way yet for timing display updates in DRM. All we can do is kick out a display update, and it will hopefully land on the next vblank, whenever that is. But we can't tell the DRM when we would like it to be. Everything so far assumes, that the display refresh rate is a constant, apart from an explicit mode switch. Though I have heard that e.g. Chrome for Intel (i915, LVDS/eDP reclocking) has some hacks that opportunistically drops the refresh rate to save power.

There is also a culprit in the DRM of today (Jun 3rd, 2014). You can schedule a pageflip, but if you have pending rendering on that framebuffer for the same GPU as were you are presenting it, the pageflip will not happen until the rendering completes. And you do not know when it will complete, which means you do not know if you will hit the very next vblank or something later.

If the rendering GPU is not the same graphics device that presents the framebuffer, you do not get synchronization at all. That means that you may be scanning out an incomplete rendering for a frame or two, or you have to stall the GPU to make sure it is done before scheduling the page flip. This should be fixed with the fences related to dma-bufs (Hi, Maarten Lankhorst).

And so the unicorn keeps on running.

by pq (noreply@blogger.com) at April 23, 2017 08:06 AM

Wayland protocol design: object lifespan

Now that we have a few years of experience with the Wayland protocol, I thought I would put some of my observations in writing. This, what will hopefully become a series rather than just one post, considers how to design Wayland protocol extensions the right way.

This first post considers protocol object lifespan and the related races between the compositor/server and the client. I assume that the reader is already aware of the Wayland protocol basics. If not, I suggest reading Chapter 4. Wayland Protocol and Model of Operation.

How protocol objects are created

On a new Wayland connection, the only object that exists is the wl_display which is a specially constructed object. You always have it, and there is no wire protocol for creating it.

The only thing the client can create next is a wl_registry through the wl_display. Registry is the root of the whole interface (class) hierarchy. Wl_registry advertises the global objects by numerical name, and using wl_registry.bind request to bind to a global is the first normal way to create a protocol object.

Binding is slightly special still, as the protocol specification in XML for wl_registry uses the new_id argument type, but does not specify the interface (class) for the new object. In the wire protocol, this special argument gets turned into three arguments: interface name (string), interface version (uint32_t), and the new object ID (uint32_t). This is unique in the Wayland core protocol.

The usual way to create a new protocol object is for the client to send a request that has a new_id type of argument. The protocol specification (XML) defines what the interface is, so there is no need to communicate the interface type over the wire. All that is needed on the wire is the new object ID. Almost all object creation happens this way.

Although rare, also the server may create protocol objects for the client. This happens by having a new_id type of argument in an event. Every time the client receives this event, it receives a new protocol object.

As all requests and events are always part of some interface (like a member of a class), this creates an interface hierarchy. For example, wl_compositor objects are created from wl_registry, and wl_surface objects are created from wl_compositor.

Object creation never fails. Once the request or event is sent, the new objects it creates exists, period. This keeps the protocol asynchronous, as there is no need to reply or check that the creation succeeded.

How protocol objects are destroyed

There are two ways to destroy a protocol object. By far the most common one is to have a request in the interface that is specified to be a destructor. Most often this request is called "destroy". When the client code calls the function wl_foobar_destroy(), the request is sent to the server and the client side proxy (struct wl_proxy) for the object gets destroyed. The server then handles the destructor request at some point in the future.

The other way is to destroy the object by an event. In that case, no destructor must be defined in the interface's protocol specification, and the event must be clearly documented to be destructive as there is no automation nor safeties for this. This is for cases where the server decides when an object dies, and requires extreme care in protocol design to work right in all cases. When a client receives such an event, all it can do is destroy the proxy. The (in)famous example of an interface like this is wl_callback.

Enter the boogeyman: races

It is very important that both the client and the server agree on which protocol objects exist. If the client sends a request on, or references as an argument, an object that does not exist in the server's opinion, the server raises a protocol error, and disconnects the client. Obviously this should never happen, nor should it happen that the server sends an event to an object that the client destroyed.

Wayland being a completely asynchronous protocol, we have no implicit guarantees. The server may send an event at the same time as the client destroys the object, and now the event targets an object the client does not know about anymore. Rather than the client shooting itself dead (that's the server's job), we have a trick in libwayland-client: it silently ignores events to destroyed objects, until the server confirms that the object is truly gone.

This works very well for interfaces where the destructor is a request. If the client first sends the destructor request and then sends another request on the destroyed object, it just shot its own head off - no race needed.

Things get tricky for the other case, destructor events. The server may send the destructor event at the same time the client is sending a request on the same object. When the server finally gets the request, the object is already gone, and the client gets taken behind the shed and shot. Therefore pretty much the only safe way to use destructor events is if the interface does not define any requests at all. Ever, not even in future extensions. Furthermore, objects with that interface should not be used as arguments anywhere, or you may hit the race. That is why destructor events are difficult to use right.

The boogeyman's brother

There is yet another nasty race with events that create objects, i.e. server-created objects. If the client is destroying the (parent) object at the same time as the server is sending an event on that object, creating a new (child) object, the server cannot know if the client actually handled the event or not. If the client ignored the event, it will never tell the server to destroy that new object, and you leak in the server.

You could try to make your way out of that pitfall by writing in your protocol specification, that when the (parent) object is destroyed, all the child objects will be destroyed implicitly. But then the client must not send the destructor request for the child objects after it has destroyed the parent, because otherwise the server sees requests on objects it does not know about, and kicks you in the groin, hard. If the child interface defines a destructor, the client cannot destroy its proxies after destroying the parent object. If the child interface does not define a destructor, you can never free the server-side resources until the parent gets destroyed.

The client could destroy all the child objects with a defined destructor in one go, and then immediately destroy the parent object. I am not sure if that works, but it might. If it does not, you have to specify a whole tear-down protocol sequence. The client tells the server it wants to destroy the parent object, the server acks and guarantees it no longer sends any events on it, then the client actually destroys the parent object. Hey, you have a round-trip and just turned a beautiful asynchronous protocol into synchronous, congratulations!

Concluding with recommendations

Here are my recommendations when designing Wayland protocol extensions:
  • Always make sure there is a guaranteed way to destroy all objects. This may sound obvious, but we have fixed several cases in the Wayland core protocol where there was no way to destroy a created protocol object such, that all resources on both server and client side could be freed. And there are still some cases not fixed.
  • Always define a destructor request. If you have any doubt whether your new interface needs a destructor request, just put it there. It is more awkward to add later than normal requests. If you do not have one, the client cannot tell the server to free those protocol object resources.
  • Do not use destructor events. They are hard to design right, and extending the interface later will be a bitch. The client cannot tell the server to free the resources, so objects with destructor events should be short-lived, and the destruction must be guaranteed.
  • Do not use server-side created objects without a serious thought. Designing the destruction sequence such that it never leaks nor explodes is tricky.

by pq (noreply@blogger.com) at April 23, 2017 08:05 AM

Weston repaint scheduling

Now that Presentation feedback has finally landed in Weston (feedback, flags), people are starting to pay attention to the output timings as now you can better measure them. I have seen a couple of complaints already that Weston has an extra frame of latency, and this is true. I also have a patch series to fix it that I am going to propose.

To explain how the patch series affects Weston's repaint loop, I made some JSON-timeline recordings before and after, and produced some graphs with Wesgr. Here I will explain how the repaint loop works timing-wise.

Original post Feb 11, 2015.
Update Mar 20, 2015: the patches have landed in Weston.


The old algorithm

The old repaint scheduling algorithm in Weston repaints immediately on receiving the pageflip completion event. This maximizes the time available for the compositor itself to repaint, but it also means that clients can never hit the very next vblank / pageflip.

Figure 1. The old algorithm, the client paints as response to frame callbacks.

Frame callback events are sent at the "post repaint" step. This gives clients almost a full frame's time to draw and send their content before the compositor goes to "begin repaint" again. In Figure 1. you see, that if a client paints extremely fast, the latency to screen is almost two frame periods. The frame latency can never be less than one frame period, because the compositor samples the surface contents (the "repaint flush" point) immediately after the previous vblank.

Figure 2. The old algorithm, the client paints as response to Presentation feedback events.

While frame callback driven clients still get to the full frame rate, the situation is worse if the client painting is driven by presentation_feedback.presented events. The intent is to draw and show a new frame as soon as the old frame was shown. Because Weston starts repaint immediately on the pageflip completion, which is essentially the same time when Presentation feedback is sent, the client cannot hit the repaint of this frame and gets postponed to the next. This is the same two frame latency as with frame callbacks, but here the framerate is halved because the client waits for the frame to be actually shown before continuing, as is evident in Figure 2.

Figure 3. The old algorithm, client posts a frame while the compositor is idle.

Figure 3. shows a less relevant case, where the compositor is idle while a client posts a new frame ("damage commit"). When the compositor is idle graphics-wise (the gray background in the figure), it is not repainting continuously according to the output scanout cycle. To start painting again, Weston waits for an extra vblank first, then repaints, and then the new frame is shown on the next vblank. This is also a 1-2 frame period latency, but it is unrelated to the other two cases, and is not changed by the patches.

The modification to the algorithm

The modification is simple, yet perhaps counter-intuitive at first. We reduce the latency by adding a delay. The "delay before repaint" is in all the figures, and the old algorithm is essentially using a zero delay. The compositor's repaint is delayed so that clients have a chance to post a new frame before the compositor samples the surface contents.

A good amount of delay is a hard question. Too small delay and clients do not have time to act. Too long delay and the compositor itself will be in danger of missing the vblank deadline. I do not know what a good amount is or how to derive it, so I just made it configurable. You can set the repaint window length in milliseconds in weston.ini. The repaint window is the time from starting repaint to the deadline, so the delay is the frame period minus the repaint window. If the repaint window is too long for a frame period, the algorithm will reduce to the old behaviour.

The new algorithm

The following figures are made with a 60 Hz refresh and a 7 millisecond repaint window.

Figure 4. The new algorithm, the client paints as response to frame callback.

When a client paints as response to the frame callback (Figure 4), it still has a whole frame period of time to paint and post the frame. The total latency to screen is a little shorter now, by the length of the delay before compositor's repaint. It is a slight improvement.

Figure 5. The new algorithm, the client paints as response to Presentation feedback.

A significant improvement can be seen in Figure 5. A client that uses the Presentation extension to wait for a frame to be actually shown before painting again is now able to reach the full output frame rate. It just needs to paint and post a new frame during the delay before compositor's repaint. This mode of operation provides the shortest possible latency to screen as the client is able to target the very next vblank. The latency is below one frame period if the deadlines are met.

Discussion

This is a relatively simple change that should reduce display latency, but analyzing how exactly it affects things is not trivial. That is why Wesgr was born.

This change does not really allow clients to wait some additional time before painting to reduce the latency even more, because nothing tells clients when the compositor will repaint exactly. The risk of missing an unknown deadline grows the later a client paints. Would knowing the deadline have practical applications? I'm not sure.

These figures also show the difference between the frame callback and Presentation feedback. When a client's repaint loop is driven by frame callbacks, it maximizes the time available for repainting, which reduces the possibility to miss the deadline. If a client drives its repaint loop by Presentation feedback events, it minimizes the display latency at the cost of increased risk of missing the deadline.

All the above ignores a few things. First, we assume that the time of display is the point of vblank which starts to scan out the new frame. Scanning out a frame actually takes most of the frame period, it's not instantaneous. Going deeper, updating the framebuffer during scanout period instead of vblank could allow reducing latency even more, but the matter becomes complicated and even somewhat subjective. I hear some people prefer tearing to reduce the latency further. Second, we also ignore any fencing issues that might come up in practise. If a client submits a GPU job that takes a long while, there is a good chance it will cause everything to miss a deadline or more.

As usual, this work and most of the development of JSON-timeline and Wesgr were sponsored by Collabora.

PS. Latency and timing issues are nothing new. Owen Taylor has several excellent posts on related subjects in his blog.

by pq (noreply@blogger.com) at April 23, 2017 08:05 AM

A programmer's view on digital images: the essentials

How is an uncompressed raster image laid out in computer memory? How is a pixel represented? What are stride and pitch and what do you need them for? How do you address a pixel in memory? How do you describe an image in memory?

I tried to find a web page for dummies explaining all that, and all I could find was this. So, I decided to write it down myself with the things I see as essential.


An image and a pixel

Wikipedia explains the concept of raster graphics, so let us take that idea as a given. An image, or more precisely, an uncompressed raster image, consists of a rectangular grid of pixels. An image has a width and height measured in pixels, and the total number of pixels in an image is obviously width×height.

A pixel can be addressed with coordinates x,y after you have decided where the origin is and which way the coordinate axes go.

A pixel has a property called color, and it may or may not have opacity (or occupancy). Color is usually described as three numerical values, let us call them "red", "green", and "blue", or R, G, and B. If opacity (or occupancy) exists, it is usually called "alpha" or A. What R, G, B, and A actually mean is irrelevant when looking at how they are stored in memory. The relevant thing is that each of them is encoded with a certain number of bits. Each of R, G, B, and A is called a channel.

When describing how much memory a pixel takes, one can use units of bits or bytes per pixel. Both can be abbreviated as "bpp", so be careful which one it is and favour more explicit names in code. Also bits per channel is used sometimes, and channels can have a different number of bits per pixel each. For example, rgb565 format is 16 bits per pixel, 2 bytes per pixel, 5 bits per R and B channels, and 6 bits per G channel.

A pixel in memory

Pixels do not come in arbitrary sizes. A pixel is usually 32 or 16 bits, or 8 or even 1 bit. 32 and 16 bit quantities are easy and efficient to process on 32 and 64 bit CPUs. Your usual RGB-image with 8 bits per channel is most likely in memory with 32 bit pixels, the extra 8 bits per pixel are simply unused (often marked with X in pixel format names). True 24 bits per pixel formats are rarely used in memory because trading some memory for simpler and more efficient code or circuitry is almost always a net win in image processing. The term "depth" is often used to describe how many significant bits a pixel uses, to distinguish from how many bits or bytes it occupies in memory. The usual RGB-image therefore has 32 bits per pixel and a depth of 24 bits.

How channels are packed in a pixel is specified by the pixel format. There are dozens of pixel formats. When decoding a pixel format, you first have to understand if it is referring to an array of bytes (particularly used when each channel is 8 bits) or bits in a unit. A 32 bits per pixel format has a unit of 32 bits, that is uint32_t in C parlance, for instance.

The difference between an array of bytes and bits in a unit is the CPU architecture endianess. If you have two pixel formats, one written in array of bytes form and one written in bits in a unit form, and they are equivalent on big-endian architecture, then they will not be equivalent on little-endian architecture. And vice versa. This is important to remember when you are mapping one set of pixel formats to another, between OpenGL and anything else, for instance. Figure 1 shows three different pixel format definitions that produce identical binary data in memory.

Figure 1. Three equivalent pixel formats with 8 bits for each channel. The writing convention here is to list channels from highest to lowest bits in a unit. That is, abgr8888 has r in bits 0-7, g in bits 8-15, etc.

It is also possible, though extremely rare, that architecture endianess also affects the order of bits in a byte. Pixman, undoubtedly inheriting it from X11 pixel format definitions, is the only place where I have seen that.

An image in memory

The usual way to store an image in memory is to store its pixels one by one, row by row. The origin of the coordinates is chosen to be the top-left corner, so that the leftmost pixel of the topmost row has coordinates 0,0. First there are all the pixels of the first row, then the second row, and so on, including the last row. A two-dimensional image has been laid out as a one-dimensional array of pixels in memory. This is shown in Figure 2.

Image layout in memory.
Figure 2. The usual layout of pixels of an image in memory.
There are not only the width×height number of pixels, but each row also has some padding. The padding area is not used for storing anything, it only aligns the length of the row. Having padding requires a new concept: image stride.

Padding is often necessary due to hardware reasons. The more specialized and efficient hardware for pixel manipulation, the more likely it is that it has specific requirements on the row start and length alignment. For example, Pixman and therefore also Cairo (image backend particularly) require that rows are aligned to 4 byte boundaries. This makes it easier to write efficient image manipulations using vectorized or other instructions that may even process multiple pixels at the same time.

Stride or pitch

Image width is practically always measured in pixels. Stride on the other hand is related to memory addresses and therefore it is often given in bytes. Pitch is another name for the same concept as stride, but can be in different units.

You may have heard rules of thumb that stride is in bytes and pitch is in pixels, or vice versa. Stride and pitch are used interchangeably, so be sure of the conventions used in the code base you might be working on. Do not trust your instinct on bytes vs. pixels here.

Addressing a pixel

How do you compute the memory address of a given pixel x,y? The canonical formula is:
pixel_address = data_begin + y * stride_bytes + x * bytes_per_pixel.
The formula stars with the address of the first pixel in memory data_begin, then skips to row y while each row is stride_bytes long, and finally skips to pixel x on that row.

In C code, if we have 32 bit pixels, we can write
uint32_t *p = data_begin;
p += y * stride_bytes / sizeof(uint32_t);
p += x;
Notice, how the type of p affects the computations, counting in units of uint32_t instead of bytes.

Let us assume the pixel format in this example is argb8888 which is defined in bits of a unit form, and we want to extract the R value:
uint32_t v = *p;
uint8_t r = (v >> 16) & 0xff;
Finally, Figure 3 gives a cheat sheet.

Figure 3. How to compute the address of a pixel.

Now we have covered the essentials, and you can stop reading. The rest is just good to know.

Not everyone has the "right" way up

In the above we have assumed that the image origin is the top-left corner, and rows are stored top-most first. The most notable exception to this is the OpenGL API, which defines image data to be in bottom-most row first. (Traditionally also BMP file format does this.)

Multi-planar formats

In the above, we have talked about single-planar formats. That means that there is only a single two-dimensional array of pixels forming an image. Multi-planar formats use two or more two-dimensional arrays for forming an image.

A simple example with an RGB-image would be to store R channel in the first plane (2D-array) and GB channels in the second plane. Pixels on the first plane have only R value, while pixels on the second plane have G and B values. However, this example is not used in practice.

Common and real use cases for multi-planar images are various YUV color formats. Y channel is stored on the first plane, and UV channels are stored on the second plane, for instance. A benefit of this is that e.g. the UV plane can be sub-sampled - its resolution could be only half of the plane with Y, saving some memory.

Tiled formats

If you have read about GPUs, you may have heard of tiling or tiled formats (tiled renderer is a different thing). These are special pixel layouts, where an image is not stored row by row but a rectangular block by block. Tiled formats are far too wild and various to explain here, but if you want a taste, take a look at Nouveau's documentation on G80 surface formats.

by pq (noreply@blogger.com) at April 23, 2017 08:05 AM

Wayland has been accepted as a Google Summer of Code organization

Now is a high time to start discussing what you might want to do, for both student candidates and possible mentors.

Students, have a look at our project idea examples to get a feeling of what kind of projects you could propose. First you will need to contribute at least a small but significant patch to show that you understand the workflow, we have put some first task ideas together.

There are our application instructions for students. Of course all the pages are reachable from the Wayland GSoC wiki page and also the Wayland organization page.

If you want to become a mentor, please contact me or Kat, the contact details are on the Wayland GSoC wiki page.

Note, that students can also apply under the X.Org Foundation organization since Wayland is within their scope too and they also have other excellent graphics project ideas. You are welcome to submit your Wayland proposals to both projects.

by pq (noreply@blogger.com) at April 23, 2017 08:05 AM

Waltham: a generic Wayland-style IPC over network

I have recently been occupied with a new project (and being with a cold all this week), so I have not been much present in the Wayland community. Now I can finally say what I and Emilio have been up to: Waltham! For more information, please see our annoucement.

by pq (noreply@blogger.com) at April 23, 2017 08:05 AM

Screen locking in Wayland

This is continuation to my Wayland desktop-shell post.

My goal was to implement a simple screen locking feature, a similar idea to what xlockmore does for X. In Wayland it is much simpler and more reliable to implement than in X, because the implementation will be in the display server (compositor). While the "lock" itself is in the compositor, also an unlock dialog is required. The unlock dialog usually asks the user to input his password, but I settled for "click the green ball". Screenshots below...

First a protocol (commit) is needed to drive the compositor locking and unlocking, since the unlock dialog is exported to the desktop-shell client. When the compositor hits the idle timeout, it fades out to black, and then locks itself in shell plugin. The compositor is woken up by input events, and sends prepare_lock_surface event to desktop-shell. The client replies with set_lock_surface request, with the unlock dialog's surface as an argument. Only on getting the surface, the compositor fades in, to have a nice transition to the dialog. The dialog then runs like any other application on screen, and when the user has dismissed it, desktop-shell sends unlock request to the compositor. On unlock, compositor brings all windows (surfaces) back to the desktop.

The shell plugin implements screen locking by stealing all the surfaces from the compositor's rendering list. Only the background surface and pointer cursor surfaces are left. This has the side-effect that none of the stolen (hidden) surfaces can be activated nor receive input. The compositor-side surface objects still continue living as usual. New surfaces can be created and they are automatically hidden. Output assignment of the hidden surfaces is set to NULL, which prevents sending any frame events for them, effectively also stopping any animations that might have been running. On unlock, the surfaces are simply put back into the compositor's list, and assigned to outputs.

After the last commit in the screen locking series, you can enjoy automatic screen locking in the Wayland demo compositor:
Normal desktop.




Locked, with the unlock dialog.



Note, that locking does not imply a fancy animated screensaver. The black screen is the screensaver ;-)

Thanks to Kristian Høgsberg for his reviews, comments and bug fixes.

This feature is sponsored by Collabora, Ltd.

by pq (noreply@blogger.com) at April 23, 2017 08:03 AM

Wayland screensaver integration

Continuing on the Wayland screensaver track, I sent a branch for review. The screensaver interface is now fully implemented in both the demo compositor and the demo screensaver. Screenshots below...

The compositor shell plugin of desktop-shell now implements the screensaver interface. This allows a client to register a surface as an idle animation for a given output (monitor). These surfaces remain hidden until the compositor's idle timer triggers, and the compositor fades to black. If there are any screensaver surfaces, the compositor will fade them in, showing the idle animation. The compositor can also be configured to automatically start a screensaver client.

While an idle animation is running, if the shell implements screen locking, the unlock dialog will appear on the first input event, for instance when moving the mouse. The idle animation continues as the background for the unlock screen.

There is another idle timeout running with the idle animation. When that timeout triggers, the compositor fades to black and will seize updating the screen. This also causes properly written animating clients to stop rendering, and we can hit zero CPU usage, even when there is a screensaver active. The compositor will wake from this sleep as usual, and fade in either the desktop directly, or the unlock dialog with the animation in the background.

On returning to the normal desktop, the compositor (the shell plugin, really) will kill the screensaver client if it started it in the first place.

The demo implementation also supports multiple outputs, which is convenient to demonstrate on X. The three Wayland compositor windows are the outputs of a single demo compositor running.

Normal desktop, spread over three outputs, with a few flower clients and a terminal.

The idle animations running on each output with separate state. There is only one screensaver client running.

Idle animation as the background for the unlock screen.

by pq (noreply@blogger.com) at April 23, 2017 08:03 AM

Wayland on Android snapshot release: input

It is time to announce the android-4.0.1_r1.2-b snapshot release of the Wayland on Android project at Collabora! We give you: input support in Weston and a finger-painting demo!

Collabora will have people at GUADEC demoing this on real devices, though not me personally.

Click to see the video!



This release provides ports of the following projects (git repositories, really) to Android 4.0.1 on Samsung Galaxy Nexus:
It also includes some changes to Android internals, and the aggregate for building it all.

This is just a snapshot release of a work in progress, and you cannot do much with it. Everything an end user would have known about Android is still gone.

In Weston, the three device buttons are working, and the touchscreen is working. Unfortunately, the only application really supporting touch devices is simple-touch, but I turned that into a demo that is automatically launched. If you install this release into a Galaxy Nexus device, it will boot into Weston and you can play with simple-touch. The power button is hooked up in Weston to power off the device immediately, so a computer is not necessary to show and exit the demo.

The main advancement compared to my previous posts is that the touchscreen is fully working now. Also, this time I am providing a proper release:
You can get the fastboot tool needed for flashing the images from the Android SDK, I think. I have never used the SDK myself, I have always gone with the full AOSP tree.

Please, if you try this on your device, let me know how it went. If you find problems that I can fix, I might push the fixes to the android-4.0.1_r1.2-b release branches, and update the ChangeLog for this release, but I will not provide new images. Before August I probably won't react, though.

If you look at the histories of the git branches mentioned towards the top, you will find many ugly hacky commits. All commits marked as HACK will be replaced by the proper changes during the course of this project. We are planning to send almost all changes to respective upstream projects, too. The input enablement patch series in Weston needs a rewrite, before it gets upstream.


Thanks to the whole Android team at Collabora for making this happen!

by pq (noreply@blogger.com) at April 23, 2017 08:00 AM

Wayland on Android: upgrade to 4.0.4 and new build integration

We at Collabora have been working on a new Android build system integration with autotools projects, still based on Androgenizer (git). Now we have our own repo manifest repository, and a tool called anagrman for managing optional feature packages (aggregates). Wayland on Android is one feature package, and the first to become available. We also upgraded to Ice Cream Sandwhich 4.0.4_r2.1. Instead of a snapshot release, this particular announcement is about live branches.

Weston (git) was upgraded to upstream as of Sept. 18th, 2012, though there are no user visible changes on Android. This brings the 0.95 protocol, and the new evdev input rework from upstream, which went through some changes since the 4.0.1_r1.2-b Wayland on Android release. Weston now has its GLESv2 renderer separated in code, and shaders are faster and simpler (I have not done any shader benchmarks myself).

Libxkbcommon lost its final dependencies to kbproto and xproto, and our Android build files are upstream. Thanks to Daniel Stone, we can use libxkbcommon straight from upstream.

The new Android build system integration requires you to manually download only anagrman in addition to Android repo. There is no make wayland-aggregate-configure step anymore, all generated Android makefiles are created during the full build. Also androgenizer and wayland-scanner are built automatically as needed. All this is possible using the makefile update feature of GNU Make. If there is a rule that can update a makefile, GNU Make will update the makefile as needed. If any makefiles were updated, Make will then start from scratch, reading in all makefiles again, before continuing to the actual build phase. This causes Make to reload all Android makefiles 2-3 times during the first build. It should also solve any dependency issues of the explicit configure step, like when one project's configure depends on another project's fully built library. A big thank you to Helio Chissini de Castro for doing most of the build system work.

This work is available in two ways:
The ready-made image is configured to launch Weston at boot instead of SurfaceFlinger (the setting is in device/samsung/maguro/system.prop in the source tree). The source code, however, does not start Weston nor SurfaceFlinger automatically. You have to use the commands # adb root and # adb shell to log into the phone, and run one of:
  • # setprop service.compositor surfaceflinger
  • # setprop service.compositor weston
You can also run Weston by just # weston & and start other demo clients manually. Weston will automatically start simple-touch finger drawing demo, and the power button will cause Weston to power off the phone. The available demo apps are: simple-touch, simple-shm, flower, and clickdot. Any GL based demos are not included, since the EGL Wayland platform for clients is still unimplemented.

Even though this is a live release, i.e. not tagged to a specific revision, I do not expect much changes in the near future. We are now researching other ways to enable Wayland on Android and other embedded-like devices.

by pq (noreply@blogger.com) at April 23, 2017 07:59 AM

On supporting Wayland GL clients and proprietary embedded platforms

How would one start implementing support for graphics hardware accelerated Wayland clients on an embedded platform that only has proprietary drivers?

This is a question I have answered more than once recently. Presumably you already have some ways to implement a Wayland compositor, some APIs you can use to composite and get images on screen. You may have wl_shm based clients already working, and now you want hardware rendered clients to work. Where to start?

First I will explain the architecture a bit. There are basically three things related to Wayland graphics:
  • the client side support for graphics hardware acceleration
  • the compositor's support for hardware accelerated clients
  • the compositor's own rendering or compositing, and output to screen

Usually the graphics hardware accelerated applications (clients) use EGL for the window system glue, and GL ES (2) for rendering.  The client side is the EGL Wayland platform, plus wayland-egl API. The EGL Wayland platform means, that you can pass Wayland types as EGL native types, for instance a struct wl_display * as the EGLNativeDisplayType parameter of the eglGetDisplay() function.

The compositor's support for hardware accelerated clients is the server side of Wayland-enabled libEGL. Normally it consists of the EGL_WL_bind_wayland_display extension. For a compositor programmer, this extension allows you to create an EGLImageKHR object from a struct wl_buffer *, and then bind that as a GL texture, so you can composite it.

The compositor's own rendering mechanisms are largely irrelevant to the client support. The only requirement is, that the compositor can effectively use the kinds of buffers the clients send to it. If you turn a wl_buffer object via EGLImageKHR into a GL texture, you would better be compositing with a GL API, naturally. Apart from that, it does not matter what APIs the compositor uses for compositing and displaying.

Now, what do we actually need for supporting hardware accelerated clients?

First, forget about using wl_shm buffers, they are not suitable for hardware accelerated clients. Buffers that GPUs render into are often badly suited for CPU usage, or not directly accessible by the CPU at all. Due to GPU requirements, you likely cannot make a GPU to render into an shm buffer, either. Therefore to get the pixel data into an shm buffer you would need to do a copy, like glReadPixels(). Then you send the shm buffer to the server, and the server needs to copy the pixels again to make them accessible to the GPU for compositing, e.g. by calling glTexImage2D(). That is two copies between CPU and GPU domains, and that is slow. I would say unusably slow. It is far better to not move the pixels into CPU domain at all, and avoid all copying.

Therefore, the most important thing is graphics buffer sharing or passing. Buffer sharing works by creating a handle for a buffer, and passing that handle to another process which then uses the handle to make the GPU access again the same buffer. On your graphics platform, find out:
  • Do such handles exist at all?
  • How do you create a buffer and a handle?
  • How do you direct GL ES rendering into that buffer?
  • What is the handle? Does it contain integers, open file descriptors, or opaque pointers? Integers and file descriptors are not a problem, but you cannot pass (host virtual) pointers from one process to another.
  • How do you create something usable, like an EGLImageKHR or a GL texture, from the handle?
It would be good to test that the buffer passing actually works, too.

Once you know what the handle is, and whether clients can allocate their own buffers (preferred), or must the compositor hand out buffers to clients for some obscure security reasons, you can think about how to use the Wayland protocol to pass buffers around. You must invent a new Wayland protocol extension. The extension should allow a client to create a wl_buffer object from the handle. All the standard Wayland interfaces deal with wl_buffer objects, and the server will detect the type of each wl_buffer object when used by a client. Examples of the protocol extension are wl_drm of Mesa, and my experimental android_wlegl.

I recommend you do the first implementation of the protocol extension completely ad hoc. Hack the server to work with your buffer types, and write a custom client that directly uses the protocol without any fancy API like wayland-egl. Once you confirm it works, you can design the real implementation, whether it should be in a wrapper library around the proprietary libEGL or something else.

EGL is the standard interface to implement accelerated Wayland client support and it conveniently hides the details from both servers and clients, but it is not the only way. If you control both server and client code, you can use any other API, or create your own. That is a big if, though.

The key point is buffer sharing, copying will kill your system performance. After you know how to share graphics buffers, your work has only begun. Good luck!

by pq (noreply@blogger.com) at April 23, 2017 07:59 AM