Planet Collabora

April 16, 2014

Marco Barisione

Maynard: a Wayland desktop shell for the Raspberry Pi

In the last year or so, Collabora has been working with the Raspberry Pi Foundation on a web browser and on Wayland. See Daniel’s and Pekka’s blog posts about their Wayland work.

To make Wayland on the Raspberry Pi actually usable, we needed a shell, but lightweight desktop environments (like LXDE) don’t support Wayland and normal desktops (like Gnome and KDE) are just too heavy.
This meant we ended up writing our own shell based on Tiago Vignatti’s gtk-shell, so Maynard was born!

No video displayed here? Watch the video on Youtube.
Maynard running on my laptop (webm video file)

No video displayed here? Watch the video on Youtube.
Maynard running on a Pi (mp4 video file)

Maynard is far from complete, but it’s already starting to take shape nicely. Its goals are to be functional, light and pretty, so it will never see some of the features one might expect from Gnome or KDE for instance.

The main current limitations are:

  • No XWayland support, so non-Wayland applications cannot run (issue #1).
  • GTK applications take too long to start (issue #2).
  • Active apps are not shown in the panel (issue #3).
  • No configurability (issue #7). I hope you like the background from kdewallpapers we use as you cannot change it for now ;)

Interested in the project? Follow these links:

by barisione at April 16, 2014 12:37 PM

April 06, 2014

Philip Withnall

Rate limiting of asynchronous calls in Vala

Following on from single-thread synchronisation in Vala, I just used a similar technique to easily implement rate limiting of concurrent calls to the same function in Vala. The use case for this was bug #705742, caused by attempting to write out many avatars to libfolks’ on-disk avatar cache simultaneously. So many files were opened simultaneously that the process’ file descriptor limit was hit and subsequent writes failed.

The principle for fixing this is to maintain a counter of the number of ongoing operations. If this hits a limit, operations which are started subsequently are added to a work queue and yielded. Any ongoing operation which finishes will pop a yielded operation off the work queue (if it’s non-empty) and resume that operation. This forms a FIFO queue which is guaranteed to progress due to each completed operation causing the next to resume (which will, in turn, cause the next to resume, etc., until the queue is empty).

A code example:

using GLib;

public class AvatarCache : Object
  private uint _n_ongoing_stores = 0;
  private Queue<DelegateWrapper> _pending_stores =
      new Queue<DelegateWrapper> ();

  /* Change this to change the cap on the number of concurrent
   * operations. */
  private const uint _max_n_ongoing_stores = 10;

  public async string store_avatar (string id, LoadableIcon avatar)
      throws GLib.Error
      string retval = "";

      /* If the number of ongoing operations is at the limit,
       * queue this operation and yield. */
      if (this._n_ongoing_stores >
          /* Add to the pending queue. */
          var wrapper = new DelegateWrapper ();
          wrapper.cb = store_avatar.callback;
          this._pending_stores.push_tail ((owned) wrapper);

      /* Do the actual store operation. */
          retval = yield this._store_avatar_unlimited (id, avatar);

          /* If there is a store operation pending, resume it,
           * FIFO-style. */
          var wrapper = this._pending_stores.pop_head ();
          if (wrapper != null)
              wrapper.cb ();

      return retval;

  private async string _store_avatar_unlimited (string id,
                                                LoadableIcon avatar)
      throws GLib.Error
      return /* the actual computation goes here */;

/* See:
 * */
private class DelegateWrapper
  public SourceFunc cb;

This is all done using Vala’s asynchronous operation support, so runs in a single thread using the global default main context, and no locking or thread synchronisation is needed. If you were to use AvatarCache.store_avatar() from multiple threads, locking would have to be added and things would become more complex.

As with the single-thread synchronisation example from before, the key lines are: wrapper.cb = store_avatar.callback; yield, which stores the current function pointer and its closure (in Vala terminology, a delegate with target for the current method); and wrapper.cb (), which calls that function pointer with the stored closure (in Vala terminology, executes the delegate), effectively resuming computation from the yield statement.

So that’s it: a way to rate-limit concurrent method calls in Vala, ensuring they all block correctly (i.e. calls which are waiting for earlier ones to complete continue to block until they themselves are resumed and complete computation). By changing the scheduling function applied to the GQueue, priorities can be applied to queued calls if desired.

by Philip Withnall at April 06, 2014 11:03 AM

April 02, 2014

Philip Withnall

What’s the difference between int and size_t?

Passing variable-sized buffers around is a common task in C, especially in networking code. Ignoring GLib convenience API like GBytes, GInputVector and GOutputVector, a buffer is simply an offset and length describing a block of memory. This is just a void* and int, right? What’s the problem?

tl;dr: I’m suggesting to use (uint8_t*, size_t) to describe buffers in C code, or (guint8*, gsize) if you’re using GLib. This article is aimed at those who are still getting to grips with C idioms. See also: (const gchar*) vs. (gchar*) and other memory management stories.

There are two problems here: this effectively describes an array of elements, but void* doesn’t describe the width of each element; and (assuming each element is a byte) an int isn’t wide enough to index every element in memory on modern systems.

But void* could be assumed to refer to an array of bytes, and an int is probably big enough for any reasonable situation, right? Probably, but not quite: a 32-bit signed integer can address 2 GiB (assuming negative indices are ignored) and an unsigned integer can address 4 GiB1. Fine for network processing, but not when handling large files.

How is size_t better? It’s defined as being big enough to refer to every addressable byte of memory in the current computer system (caveat: this means it’s architecture-specific and not suitable for direct use in network protocols or file formats). Better yet, it’s unsigned, so negative indices aren’t wasted.

What about the offset of the buffer — the void*? Better to use a uint8_t*, I think. This has two advantages: it explicitly defines the width of each element to be one byte; and it’s distinct from char* (it’s unsigned rather than signed2), which makes it more obvious that the data being handled is not necessarily human-readable or nul-terminated.

Why is it important to define the width of each element? This is a requirement of C: it’s impossible to do pointer arithmetic without knowing the size of an element, and hence the C standard forbids arithmetic on void* pointers, since void doesn’t have a width (C11 standard, §6.2.5¶¶19,1). So if you use void* as your offset type, your code will end up casting to uint8_t* as soon as an offset into the buffer is needed anyway.

A note about GLib: guint8 and uint8_t are equivalent, as are gsize and size_t — so if you’re using GLib, you may want to use those type aliases instead.

For a more detailed explanation with some further arguments, see this nice article about size_t and ptrdiff_t by Karpov Andrey.

  1. int doesn’t have a guaranteed width (C11 standard, §6.2.5¶5), which is another reason to use size_t. However, on any relevant modern platform it is at least 32 bits. 

  2. Technically, it’s architecture-dependent whether char is signed or unsigned (C11 standard, §6.2.5¶15), and whether it’s actually eight bits wide (though in practice it is really always eight bits wide), which is another reason to use uint8_t

by Philip Withnall at April 02, 2014 08:05 AM

March 27, 2014

Luis de Bethencourt

snappy has arrived to 1.0

snappy is an open source media player that gathers the power and flexibility of GStreamer inside the comfort of a minimalistic Clutter interface.

The snappy development team is proud to announce it's 1.0 release.
Codename: "I'll be back", Terminator

We think the project has achieved the maturity worthy of a 1.0 release. It does one thing and it does it well.

Some of the changes you will notice are:
  • It’s been given some needed visual polish
  • Playback speed adjustable
  • Video and audio synchronization tweeking
  • Time left of stream viewer
  • Better drag and drop
  • Better media history handling
  • More features accessible from Clutter interface
  • Bug fixes

Features already included from previous releases:
  • Subtitle support
  • Desktop launcher
  • Video and audio synchronization tweeking.
  • Multi-screen full-screen
  • Media queues
  • History of played media
  • Seeking/muting/cycling through languages (audio streams)
  • Frame stepping
  • Much more

Download a tarball: xz
Clone the git repo
Packages in distributions will be updated soon

Thanks to all who helped in snappy's creation!

Disclaimer: No moose were harmed during the making of this release. One got homesick and an other disappeared for days in a Breaking Bad marathon, but that's about it.

March 27, 2014 08:00 PM

Philip Withnall

What is GMainContext?

GMainContext is at the core of almost every GLib application, yet it was only recently that I took the time to fully explore it — the details of it have always been a mystery. Doing some I/O work required me to look a little closer and try to get my head around the ins and outs of GMainContext, GMainLoop and GSources. Here I’ll try and write down a bit of what I’ve learned. If you want to skip to the conclusion, there’s a list of key points for using GMainContexts in libraries at the bottom of the post.

What is GMainContext? It’s a generalised implementation of an event loop, useful for implementing polled file I/O or event-based widget systems (i.e. GTK+). If you don’t know what poll() does, read about that first, since GMainContext can’t be properly understood without understanding polled I/O. A GMainContext has a set of GSources which are ‘attached’ to it, each of which can be thought of as an expected event with an associated callback function which will be invoked when that event is received; or equivalently as a set of file descriptors (FDs) to check. An event could be a timeout or data being received on a socket, for example. One iteration of the event loop will:

  1. Prepare sources, determining if any of them are ready to dispatch immediately.
  2. Poll the sources, blocking the current thread until an event is received for one of the sources.
  3. Check which of the sources received an event (several could have).
  4. Dispatch callbacks from those sources.

This is explained very well in the GLib documentation.

At its core, GMainContext is just a poll() loop, with the preparation, check and dispatch stages of the loop corresponding to the normal preamble and postamble in a typical poll() loop implementation, such as listing 1 from Typically, some complexity is needed in non-trivial poll()-using applications to track the lists of FDs which are being polled. Additionally, GMainContext adds a lot of useful functionality which vanilla poll() doesn’t support. Most importantly, it adds thread safety.

GMainContext is completely thread safe, meaning that a GSource can be created in one thread and attached to a GMainContext running in another thread. A typical use for this might be to allow worker threads to control which sockets are being listened to by a GMainContext in a central I/O thread. Each GMainContext is ‘acquired’ by a thread for each iteration it’s put through. Other threads cannot iterate a GMainContext without acquiring it, which guarantees that a GSource and its FDs will only be polled by one thread at once (since each GSource is attached to at most one GMainContext). A GMainContext can be swapped between threads across iterations, but this is expensive.

Why use GMainContext instead of poll()? Mostly for convenience, as it takes all the grunt work out of dynamically managing the array of FDs to pass to poll(), especially when operating over multiple threads. This is done by encapsulating FDs in GSources, which decide whether those FDs should be passed to the poll() call on each ‘prepare’ stage of the main context iteration.

So if that’s GMainContext, what’s GMainLoop? Ignoring reference counting and locking gubbins, it is essentially just the following three lines of code (in g_main_loop_run()):

loop->is_running = TRUE;
while (loop->is_running)
	g_main_context_iteration (context, TRUE);

Plus a fourth line in g_main_loop_quit() which sets loop->is_running = FALSE and which will cause the loop to terminate once the current main context iteration ends. i.e. GMainLoop is a convenient, thread-safe way of running a GMainContext to process events until a desired exit condition is met, at which point you call g_main_loop_quit(). Typically, in a UI program, this will be the user clicking ‘exit’. In a socket handling program, this might be the final socket closing.

It is important not to confuse main contexts with main loops. Main contexts do the bulk of the work: preparing source lists, waiting for events, and dispatching callbacks. A main loop just iterates a context.

One of the important features of GMainContext is its support for ‘default’ contexts. There are two levels of default context: the thread-default, and the global-default. The global-default (accessed using g_main_context_default()) is what’s run by GTK+ when you call gtk_main(). It’s also used for timeouts (g_timeout_add()) and idle callbacks (g_idle_add()) — these won’t be dispatched unless the default context is running!

What are the thread-default contexts then? These are a later addition to GLib (since version 2.22), and are generally used for I/O operations which need to run and dispatch callbacks in a thread. By calling g_main_context_push_thread_default() before starting an I/O operation, the thread-default context has been set, and the I/O operation can add its sources to that context. The context can then be run in a new main loop in an I/O thread, causing the callbacks to be dispatched on that thread’s stack rather than on the stack of the thread running the global-default main context. This allows I/O operations to be run entirely in a separate thread without explicitly passing a specific GMainContext pointer around everywhere.

Conversely, by starting a long-running operation with a specific thread-default context set, your code can guarantee that the operation’s callbacks will be emitted in that context, even if the operation itself runs in a worker thread. This is the principle behind GTask: when a new GTask is created, it stores a reference to the current thread-default context, and dispatches its completion callback in that context, even if the task itself is run using g_task_run_in_thread().

For example, the code below will run a GTask which performs two writes in parallel from a thread. The callbacks for the writes will be dispatched in the worker thread, whereas the callback from the task as a whole will be dispatched in the interesting context.

typedef struct {
	GMainLoop *main_loop;
	guint n_remaining;
} WriteData;

/* This is always called in the same thread as thread_cb() because
 * it’s always dispatched in the @worker_context. */
static void
write_cb (GObject *source_object, GAsyncResult *result,
          gpointer user_data)
	WriteData *data = user_data;
	GOutputStream *stream = G_OUTPUT_STREAM (source_object);
	GError *error = NULL;
	gssize len;

	/* Finish the write. */
	len = g_output_stream_write_finish (stream, result, &error);
	if (error != NULL) {
		g_error ("Error: %s", error->message);
		g_error_free (error);

	/* Check whether all parallel operations have finished. */

	if (write_data->n_remaining == 0) {
		g_main_loop_quit (write_data->main_loop);

/* This is called in a new thread. */
static void
thread_cb (GTask *task, gpointer source_object, gpointer task_data,
           GCancellable *cancellable)
	/* These streams come from somewhere else in the program: */
	GOutputStream *output_stream1, *output_stream;
	GMainContext *worker_context;
	GBytes *data;
	const guint8 *buf;
	gsize len;

	/* Set up a worker context for the writes’ callbacks. */
	worker_context = g_main_context_new ();
	g_main_context_push_thread_default (worker_context);

	/* Set up the writes. */
	write_data.n_remaining = 2;
	write_data.main_loop = g_main_loop_new (worker_context, FALSE);

	data = g_task_get_task_data (task);
	buf = g_bytes_get_data (data, &len);

	g_output_stream_write_async (output_stream1, buf, len,
	                             G_PRIORITY_DEFAULT, NULL, write_cb,
	g_output_stream_write_async (output_stream2, buf, len,
	                             G_PRIORITY_DEFAULT, NULL, write_cb,

	/* Run the main loop until both writes have finished. */
	g_main_loop_run (write_data.main_loop);
	g_task_return_boolean (task, TRUE);  /* ignore errors */

	g_main_loop_unref (write_data.main_loop);

	g_main_context_pop_thread_default (worker_context);
	g_main_context_unref (worker_context);

/* This can be called from any thread. Its @callback will always be
 * dispatched in the thread which currently owns
 * @interesting_context. */
parallel_writes_async (GBytes *data,
                       GMainContext *interesting_context,
                       GCancellable *cancellable,
                       GAsyncReadyCallback callback,
                       gpointer user_data)
	GTask *task;

	task = g_task_new (NULL, cancellable, callback, user_data);
	g_task_set_task_data (task, data,
	                      (GDestroyNotify) g_bytes_unref);
	g_task_run_in_thread (task, thread_cb);
	g_object_unref (task);

From the work I’ve been doing recently with GMainContext, here are a few rules of thumb for using main contexts in libraries which I’m going to follow in future:

  • Never iterate a context you don’t own, including the global-default or thread-default contexts, or you can cause the user’s sources to be dispatched unexpectedly and cause re-entrancy problems.
  • Always remove GSources from a main context once you’re done with them, especially if that context may have been exposed to the user (e.g. as a thread-default). Otherwise the user may keep a reference to the main context and continue iterating it after your code expects it to have been destroyed, potentially causing unexpected source dispatches in your code.
  • If your API is designed to be used in threads, or in a context-aware fashion, always document which context callbacks will be dispatched in. For example, “callbacks will always be dispatched in the context which is the thread-default at the time of the object’s construction”. Users of your API need to know this information.
  • Use g_main_context_invoke() to ensure callbacks are dispatched in the right context. It’s much easier than manually using g_idle_source_new().
  • Libraries should never use g_main_context_default() (or, equivalently, pass NULL to a GMainContext-typed parameter). Always store and explicitly use a specific GMainContext, even if that reduces to being some default context. This makes your code easier to split out into threads in future, if needed, without causing hard-to-debug problems with callbacks being invoked in the wrong context.
  • Always write things asynchronously internally (using the amazing GTask where appropriate), and keep synchronous wrappers to the very top level, where they can be implemented by calling g_main_context_iteration() on a specific GMainContext. Again, this makes future refactoring easier. You can see it in the above example: the thread uses g_output_stream_write_async() rather than g_output_stream_write().
  • Always match pushes and pops of the thread-default main context.

In a future post, I hope to explain in detail what’s in a GSource, and how to implement one, plus do some more in-depth comparison of poll() and GMainContext. Any feedback or corrections are gratefully received!

by Philip Withnall at March 27, 2014 11:06 AM

March 24, 2014

Luis de Bethencourt


"It always seems impossible until its done."
Nelson Mandela

My first Soyuz simulator! Summer 1964, nearly 5 years old."
Chris Hadfield

March 24, 2014 07:00 PM

March 19, 2014

Alban Crequy

Traffic control for multimedia devices

Multimedia devices traditionally don't manage the network bandwidth required by applications. This causes a problem when users try to watch a streaming video or listen to a web radio seamlessly while other applications are downloading other content in the background. The background downloads can use too much bandwidth for the streaming video or web radio to keep up and users notice unnecessary interruptions in the playback.

I have been working on an approach to improve this using traffic control on Linux. This work was sponsored by Collabora.

What is traffic control

Traffic control is a technique to control network traffic in order to optimise or guarantee performance, low-latency, and/or bandwidth. This includes deciding which packets to accept at what rate in an input interface and determining which packets to transmit in what order at what rate on an output interface. 

On Linux, applications can send the traffic control configuration to the kernel using a Netlink socket with  the NETLINK_ROUTE protocol. By default, traffic control on Linux consists of a single queue which collects entering packets and dequeues them as quickly as the underlying device can accept them. The tc tool (from the iproute2 package) or the more recent "nl-*" tools (part of libnl) are different implementations but they can both be used to configure traffic control. Libnl has an incomplete support for traffic control but is in active development and progressing quickly.

Difficulty of shaping ingress traffic

Traffic control and shaping comes in two forms, the control of packets being received by the system (ingress) and the control of packets being sent out by the system (egress). Shaping outgoing traffic is reasonably straight-forward, as the system is in direct control of the traffic sent out through its interfaces. Shaping incoming traffic is however much harder as the decision on which packets to sent over the medium is controlled by the sending side and can't be directly controlled by the system itself.

However, for multimedia devices, control over incoming traffic is far more important then controlling outgoing traffic. Our use-case is ensuring glitch-free playback of a media stream (e.g. internet radio). In such a case, essentially, a minimal amount of incoming bandwidth needs to be reserved for the media stream.

For shaping (or rather influencing or policing) incoming traffic, the only practical approach is to put a fake bottleneck in place on the local system and rely on TCP congestion control to adjust its rate to match the intended rate as enforced by this bottleneck. With such a system it's possible to, for example, implement a policy where traffic that is not important for the current media stream (background traffic) can be limited, leaving the remaining available bandwidth for the more critical streams.

On Linux, ingress traffic control ("ingress qdisc" on the graph) happens before the Netfilter subsystem:
By Jengelh (Own work, Origin SVG PNG) [CC-BY-SA-3.0], via Wikimedia Commons

Difficulty of shaping on mobile networks

However, to complicate matters further, in mobile systems which are connected wirelessly to the internet and have a tendency to move around it's not possible to know the total amount of available bandwidth at any specific time as it's constantly changing. Which means, a simple strategy of capping background traffic at a static limit simply can't work.

The implemented solution

To cope with the dynamic nature, a traffic control daemon (tcmmd) has been implemented which can dynamically update the kernel configuration to match the current needs of the playback applications and adapt to the current network conditions. Furthermore to address the issues mentioned above, the implementation will use the following strategy:

  • Split the traffic streams into critical traffic and background traffic. Police the incoming traffic by limiting the bandwidth available to background traffic with the goal of leaving enough bandwidth available for critical streams.
  • Instead of having static configuration, let applications (e.g. a media player) indicate when the current traffic rate is too low for their purposes. This both means the daemon doesn't have to actively measure the traffic rate and allows it cope with streams that don't have a constant bitrate more naturally.
Communication between the traffic control daemon and the applications is done via D-Bus. The  D-Bus interface allow applications to register critical streams by passing the standard 5-tuple (source ip and port, destination ip and port and protocol) which uniquely identify a stream and indicate when a particular stream bandwidth is too low.

To allow the daemon to effectively control the incoming traffic, a so-called Intermediate Functional Block device (ifb0) is used to provide a virtual network device to provide an artificial bottleneck. This is done by transparently redirecting the incoming traffic from the physical network device through the virtual network device and shape the traffic as it leaves the virtual device again. The reason for the traffic redirection is to allow the usage of the kernels egress traffic control to effectively be used on incoming traffic. The results in the example setup shown below (with eth0 being a physical interface and ifb0 the accompanying virtual interface).

To demonstrate the functionality as described above, a simple demonstration media application using Gstreamer (tcdemo) has been written that communicates with the Traffic control daemon in the manner described.

Testing, the set-up

The traffic control feature in tcdemo can be enabled or disabled on the command line. This allowed me to compare the behaviour in both cases. 

On my left, I have a web server serving both the files for a video stream and the files for background downloads. On my right, I have a multimedia device rendering a video stream while downloading other files on the same web server.

Traffic control is only useful when the available bandwidth is limited. In order to have meaningful tests, I simulated a low bandwidth with the following commands on the web server:
tc qdisc add dev wlan0 root handle 1: cbq avpkt 1000 bandwidth 10Mbit
tc class add dev wlan0 parent 1: classid 1:1 cbq rate 3Mbit allot 1500 prio 3 bounded isolated
tc filter add dev wlan0 parent 1: protocol ip u32 match ip protocol 6 0xff match ip sport 80 0xffff flowid 1:1

Only the traffic from port 80/http was limited. It is important to note that the background traffic and the stream traffic were both going through the same bottleneck.

Tcdemo was playing a video file streamed over http while 8 wgets were downloading the same file continuously. The 9 connections were competing for the limited bandwidth. Without traffic control, tcdemo would not have got enough bandwidth.

The following graph shows what happened with traffic control. The video streaming is composed of several phases:
  1. tcdemo opened the HTTP connection and its GStreamer pipeline started downloading. At the same time, tcmmd was notified there was a new stream connection and it restricted any potential background traffic to a very low limit. As long as the initial GStreamer queue was buffering, the background traffic limit did not change.
  2. The GStreamer queue became full at t=4s and the video started to be played on the screen. The daemon increased the limit on the background traffic exponentially and the stream bandwidth got reduced as a consequence.
  3. Despite the stream bandwidth degrading slowly, GStreamer managed to keep its queue over 75% full until t=25s. When the queue is more than 75% full, Gstreamer does not report it because tcdemo chose that threshold with the low-percent property on GstQueue2 (the graph shows 100% in this case). 
  4. At t=30s, the GStreamer queue was less than 70% full and that threshold triggered tcmmd to restrict the background traffic to its minimum.
  5. The stream could use most of the bandwidth and the GStreamer queue became full quickly at t=31s. The background traffic could start its exponential growth again.
traffic control stats
Thanks to traffic control, the GStreamer queue never got empty in my test.

Get the sources

git clone git://
git clone


Q: Do I need any privileges to run this?
A: No privileges required for tcdemo, the GStreamer application. But tcmmd needs CAP_NET_ADMIN to change the TC rules.

Q: The 5-tuple contains the TCP source port. How does the application know that number?
A: The application can either call bind(2) before connect(2) to choose a TCP source port, or call getsockname(2) after connect(2) to retrieve the TCP source port assigned automatically by the kernel. The former allows to install the traffic control rules before the call to connect(2) triggers the emission and reception of the first packets on the network. The latter means the first few packets will be exchanged without being shaped by the traffic control. Tcdemo implements the latter to avoid more invasive changes in the souphttpsrc GStreamer element and libsoup. See bgo#721807.

Q: What happens if an application forgets to unregister a 5-tuple when the video stream finishes?
A: That would be bad manners from the application. The current traffic control settings would remain.  And if the application notifies tcmmd that its buffer was empty and forgets to notify any changes, the background traffic would be severely throttled. However, if the application just terminates or crashes, tcmmd would notice it immediately on D-Bus and the traffic control rules would be removed.

Q: Does tcmmd remove its traffic control rules when terminated?
A: It depends how it is terminated. Tcmmd removes its traffic control rules on SIGINT and SIGTERM. But the rules remain in other cases (SIGSEGV, SIGKILL, etc.). If it is a problem in case of crash, tcmmd initialisation properly removes previous rules, so you could start tcmmd and interrupt it with ctrl-c.

Q: Instead of using the 5-tuple, why not using setsockopt-SO_MARK?
A: First, SO_MARK requires CAP_NET_ADMIN which is not something that media player should have. It could be worked around by fd-passing the socket to a more privileged daemon to call setsockopt-SO_MARK but it's not elegant. More importantly, tcmmd's goal is not to shape the egress traffic but the ingress traffic. The shaping of incoming packets is performed very early in the Linux network stack: it happens before Netfilter, and before the packet is associated to a socket. So we can't check the SO_MARK of a socket to shape incoming packets.

Q: Instead of using the 5-tuple, why not using cgroups?
A: The granularity of cgroups are only per-process. So the traffic control would not be able to distinguish between different HTTP connections in a web browser used to render a video stream and used for background downloads. And for the same reason as setsockopt-SO_MARK, it would not work for shaping ingress traffic: we would not be able to link the packet to any process or cgroup.

Q: Instead of sending the 5-tuple to tcmmd, why not set the IP type-of-service (TOS) on outgoing packets with setsockopt-SO_PRIORITY to avoid changes in the application and have an iptables target to feed that information about connections back to the ingress traffic control?
A: It could be possible if the bandwidth was fixed, but on mobile networks, the application needs to be changed anyway to give feedback when the queue in the GStreamer pipeline get emptied.

Q: Why not play with the TCP windows instead shaping the ingress traffic?
A: As far as I know, Linux does not have the infrastructure for that. The TCP windows to manipulate would not be from the GStreamer application but from all other connections, so it can't be done from userspace.

Q: Does tcdemo require any new feature in GStreamer?
A: Yes, souphttpsrc needs this patch: bgo#721807

Q: Does tcmmd require any new feature in the Linux kernel?
A: No.

Q: Does tcmmd work on several network interfaces (e.g. eth0 + wlan0)?
A: No, at the moment tcmmd only support one interface and it has to be started after the interface is up. Patches welcome!

Q: Tcmmd uses both libnl and /sbin/tc via system() calls. Why?
A: My goal is to use libnl and avoid spawning processes to call /sbin/tc. I just didn't have time to finish this. It will involve checking that libnl has the right features. Some needed features such as u32 action support were implemented recently in the last version.

Q: How did you get the graphs?
A: I used tcmmd's --save-stats option and the script tests/

Q: Why is there so frequent Netlink communication between tcmmd and the kernel?
A: One part of this is to gather regular statistics in order to generate graphs if the option --save-stats was used. The other part is for implementing the exponential progression of the bandwidth allocated to the background traffic: at regular interval, tcmmd changes the rate assigned to a qdisc. It could be avoided by implementing a specialised qdisc in the kernel for our use case. It would require more thinking how to design the API for that new qdisc.

Q: Does it work with IPv6?
A: No. The architecture is not specific to IPv4 but it is just not implemented yet for IPv6. Tcmmd would need to generate new TC rules because the IP headers are different between IPv4 and IPv6.

Thanks Sjoerd for the architecture diagram and proof-reading.

by Alban Crequy ( at March 19, 2014 02:11 PM

March 12, 2014

Neil McGovern

Entering the fray – 2014 DPL election

After a number of years in Debian, in various roles, I decided to put my name forward for election to Debian Project Leader. My platform has now been published, and campaigning is now under way!

Featured image is CC-BY by State Library of Victoria Collections

by Neil McGovern at March 12, 2014 12:42 PM

March 04, 2014

Luis de Bethencourt

Why Coding Is Fun

"First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. [...]

Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. [...]

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. [...]

Fourth is the joy of always learning, which springs from the non-repeating nature of the task. In one way or another the problem is ever new, and its solver learns something; sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer [...] works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework,so readily capable of realizing gran conceptual structures.

Yet the program construct [...] is real in the sense that it moves and works, producing visible outputs separate from the construct itself.

From Fred Brooke's "Mythical Man Month"
Image from Hello Ruby

March 04, 2014 05:30 PM

February 26, 2014

Vincent Sanders

There are only two hard things in Computer Science: cache invalidation and naming things.

It is the first of these which I have recently been attempting and I think Phil Karlton might have a good point.

What are we talking about?

Web browsers are pretty complex beasties but the basic concept is pretty easy to understand. They fetch a bunch of files that make up a web page and render those source files into something suitable for human consumption.

It is also intuitive that fetching all the files that make up a web page, every time you browse to a new page, might be wasteful, especially if most of the files had not changed from the previous page.

In order to address this browsers hang onto copies of these files in case your browsing needs them again, this is known as a source file cache. Caches are a widely used technique throughout many aspects of computer technology but the basic idea is that they trade one resource for another.

In this case we are trading local storage space (memory or disc) for access time. This type of trade may give large benefits due to the large differences in access times (sometimes known as latency) between local and network resources.

For example, if the source files that make up a web page are a Megabyte in size accessed with a 2013 era PC with a 10Mbit connection to the Internet:
  • Accessing the data from main memory will take approximately 20µs (millionths of a second).
  • The same data retrieved from a hard disc drive will take 4,000µs to arrive, around 200 times longer.
  • If we were to retrieve the data from a fast web server and assuming the network connection is in perfect working order, the data will roll in 1,200,000µs later, or around 300 times more slowly than disc and a unfathomable 60,000 times slower than memory.
From this example it becomes startlingly obvious why a source cache is so desirable. I have greatly simplified the browsers operation here as they usually implement many layers of additional caching in other areas to make the browsing experience subjectively smoother.

And before the observant reader suggests that we could just make the network faster it ought to be mentioned that due to fundamental physical constraints, like the speed of light, it would be unlikely that the average practical network reply will ever drop much below 150,000µs [1] or some 40 times slower than disc. Still a worthwhile gain.

Knowing the cost of everything and the value of nothing

Having shown the benefits it might be nice to think there are no downsides to caching, this is not the case, as I mentioned we are trading storage for time and as in any trade one must expect there to be overheads.

In the case of a browser we face several obstacles. Firstly not every file that goes to make up a web page can be cached. The HTTP specification contains an entire section on caching which contains numerous rules and operations to ensure that only objects that should be cached are and determines for how long they can be kept before going "stale".

A great deal of this complexity comes from just how desirable caching is to network providers. Many ISPs will have web proxy servers to cache requests for all users to reduce their bandwidth usage, or increasingly, to implement content filtering according to the local government censorship rules. The browsers cache must know how to interact with these proxies without getting erroneous results.

Then comes the problem alluded to in the section title. A browsers cache cannot grow indefinitely. Eventually a system would run out of memory and disc if a browser kept every file. To deal with this the cache is limited sometimes by the number of files it holds, but more commonly by size both in memory and on disc.

The caching structure used by a web browser is hierarchical in that it will first evict (move) files from memory to a backing store (disc) and when the disc cache size limit is exceeded files are evicted to the next tier. Which in this case, will involve deleting the local copy (invalidating) and requiring performing a fetch from the network if the file is needed again.

Tardis at the Beeb by Sarah G from flikrWhen the cache size is exceeded the decision is made about what to remove from the cache using a cache strategy or algorithm. The aim of this decision is to discard the data that will not be required again for the longest time.

If we had a blue box with a madman inside, who we could persuade to bring our browsing history back from the future, implementing bélády's algorithm might be possible. Failing a perfect solution we are reduced to making a judgement based on the information available to us.

This involves computing a cost for replacing each entry in the cache and evicting those with the lowest values. The selected algorithm has a large impact on the effectiveness of the cache and often needs fine tuning for the task at hand.

Never has so little been measured so much

Having established that caching is useful and that we need to make decisions on what to keep it follows that those decisions must be based on information. This information consists of two main groups:
  • A series of metrics about the cache state such as the number of objects and their cumulative size currently being stored in memory.
  • Information about each object, known as metadata, this includes things like the objects individual size, how long until it becomes stale and when it was last used.
Maintaining all of this information is a significant part of the overheads involved with caching and a compromise must be struck between having the necessary information to make good caching choices and the cost maintaining that data.

Cache metrics

Most cache implementations maintain at least some basic metrics, at the very least how much storage is being used so a decision on if evicting entries from the cache to reclaim space is necessary.

It is also common to keep a record of how well the cache is performing. The cache efficiency (the hit rate) is usually measured as the ratio of successful accesses provided from the cache (a cache hit) to the total number of requests to the cache.

A perfect cache would have all hits and no misses (a hit rate of 1) while using as little storage as possible (but as already noted that requires a special madman). The hit rate can be used to automatically change the behaviour of the eviction algorithm, perhaps using more storage if the hit rate drops too low or changing the eviction frequency.


Each object must carry with it the data necessary to regain its state when retrieved from the cache. This information is essential for the correct operation of the cache but in of itself provides no direct value.

In the case of browser source data this includes all the headers sent in the original network fetch. These headers are themselves just metadata sent along with the object by the web server which must be preserved.

Two particularly useful value for making a good eviction decision are how long it has been since the object was last used and how many times it has been used. This is because an object that was cached a long time ago and then never used again is much less valuable than one that has recently been repeatedly used.

An Implementation

It is all very well talking about the general solution but a full understanding can only be gained by examining a real implementation. The NetSurf cache was chosen as it is a suitably small amount of code but still demonstrates all the major design features.


The cache deals with all source data and provides the browser with a single unified object request interface regardless of if the source data can be cached or not. This is useful to hide the implementation details allowing for changes without affecting the rest of the browsers code.

The implementation effectively consists of two lists of objects, those that can be cached and those that cannot. Each object on these lists is reference counted against one or more users and while there is at least one user the source data is maintained in memory.

Objects determined to be non cached are always directly fetched from their source and are immediately released once they have no users remaining.

For cacheable objects an attempt is made to fulfil the request in descending order:
  1. from objects already in memory.
  2. from the backing store.
  3. from a network fetch.
Irrespective of the source of the data the remaining operations on the object (determining freshness etc.) are exactly as if the request had been fulfilled from the network. This approach means the use of the cache is transparent to the object users.


Objects are placed in the backing store asynchronously to any other operations. When an object has been fetched from the network a background task is scheduled for when the browser is otherwise idle.

This writeout task constructs a list of all cacheable objects not yet in the backing store, associates a cost value with each object and then proceeds to write those objects to the backing store in highest to lowest value order.


A periodically scheduled task deals with ensuring the cache is cleaned. This consists of destroying stale objects and arranging fresh objects are not using more memory than the configuration permits.

Memory usage is reduced within the cleaning task by discarding the source data for objects already held in the backing store (where the data can be retrieved relatively cheaply)

Additional memory may be recovered by simply discarding unused objects held only in memory. These objects are usually those which have only recently become unused otherwise the writeout task would have committed them to the backing store.

The most recently used objects are statistically likely to be used again soon, because of this there is a high risk of a cache miss associated with discarding these objects. In order to mitigate this undesirable effect the cleaning heuristic favours using more memory than configured for short periods rather than discarding from memory.

Backing store

The backing store is a simple key-value database. Each source data object and its metadata (its headers etc.) is associated with a unique key (the URL used to fetch it). The backing store is simply required to store and retrieve the object data associated with the given URL.

The backing store implementation in NetSurf is pluggable, this flexibility is required for dealing with the greatly varying capabilities and limits of the various systems the browser is required to execute on.

A novel aspect of this backing store interface is that if an implementation returns the wrong object it is not considered an error and is simply treated as a cache miss. This is possible because the metadata contains the URL of the stored object enabling object verification.

This behaviour is permitted because techniques which significantly improve key-value store performance (principally key hashing) become available if they are not required to always give the correct answer.

The backing store is also required to manage its size within the configured limits and deal with any filesystem behaviour details.

The reference backing store is trivial in that it performs a hash operation on the input URL, encodes the result with base64url encoding and uses that as the object filename. The length of the hash can be configured allowing use of the reference implementation in any situation where using the filesystem as a database is acceptable.

The reference backing store default hash is SHA-1 yielding almost unique [2] 160 bit key values stored in much the same way as the git DVCS. Note we are not using this hash for its cryptographic properties and at worst a collision will result in the cache being a tiny amount less efficient.


Hopefully this has shown what a browser source file cache is, why it is useful and a basic understanding of how they are implemented.

I will admit I have glossed over some of the more challenging aspects especially in relation to actually implementing the cache strategy but I hope that the reader will forgive those omissions of detail in a quest for a little more clarity of the general principles.

I would like to thank John-Mark Bell for implementing the NetSurf cache and to Melodie Parry for proof reading and providing feedback.

[1] Speed of light is 3x10^8m/s earth circumference is 4x10^7m, divide circumference by speed gives 133,000µs which gives longest round trip time from anywhere on surface of earth to point furthest from. So assuming a whole megabyte can be transferred in one round trip and allowing for some overhead the lower bound estimate of 150,000µs looks reasonable.

[2] Mathematically speaking this hash would allow us to say that with the current estimated 1x10^11 URLs on the internet the likelihood of a collision is still vanishingly small (at least 1x10^-15)

by Vincent Sanders ( at February 26, 2014 09:04 PM

Neil McGovern

Valve reveal Portal 2 Linux beta, sponsor DebConf14, public issue tracking, oh my!

Valve have announced the availability of Portal 2 beta for Linux! It’s exciting to see such a fantastic game coming to the platform. Luckily, some developers can get it for free :) Interestingly, there’s also a public bug tracker, which also hosts the public bug tracker for a number of their linux engagements, and also quite a bit of code for things like voglperf, steam-runtime, halflife and the source-sdk.

Additionally, Valve’s logo also now appears as a gold sponsor for DebConf14, in Portland. A huge thanks I think to Valve for putting in the effort!

by Neil McGovern at February 26, 2014 12:18 PM

February 20, 2014

Luis de Bethencourt

OpenStack Swift and Temporary URLs How To

One of the many amazing features of Swift, OpenStack's object storage component, is how it creates URLs to provide temporary public access to objects. Thanks to the middleware component called tempurl. For example, imagine a website wants to provide a link to a media file stored in Swift for HTML5 playback. Instead of needing a Swift account with public access, and all the problems that would bring, Swift can generate a URL with limited access time to the resource. This way the browser can access the object directly instead of needing the website to act as a proxy and objects can selectively be made publicly accessible without compromising the rest or using expensive copies between private to public accounts. If the link is accidentally leaked, the access is time limited until the expiration time set.

So how do you use this temporary urls? Glad you asked.

Let's assume you have the following Swift installation:

Account: AUTH_data
Container: media
Object: example_obj

The first thing we need to do is add temporary URL secret keys to the Swift account. Since tempurl will look at the Temp-URL-Key and Temp-URL-Key-2 metas when an object is requested through a temporary URL to decide if access is allowed. Only one key is neccessary, but a second one is useful to rotate through keys while keeping existing temporary urls validated. Any arbitrary string can serve as a secret key.

We add the temporary urls secret keys with the command:
$ swift post -m "Temp-URL-Key:secrete_key_a"
$ swift post -m "Temp-URL-Key-2:secrete_key_b"

Once we have the keys we can allow public access to objects. The easiest way is to use the tool swift-temp-url which returns a temporary URL. The arguments it needs are; HTTP method, availability period in seconds, object path and one temp-url-key. For example to GET the object we mentioned above:

$ swift-temp-url GET 60 /v1/AUTH_data/media/example_obj secret_key_a


Now you can retrieve the file via this temporary URL, you just need to add the hostname to have the full URL.

$ curl

Eventhough you probably already guessed it, let's note the format of the temporary URL. Typical object path, plus temp_url_sig, and temp_url_expires.
In this example:

There you have it. Notice we set it up for a GET method. You could do the same with a PUT method to allow a user to push data to a specified path. Perhaps using this in combination with browser form post translation middleware to allow direct-from-browser uploads to specific locations in Swift. Besides GET and PUSH, you can use HEAD to retrieve the header of the object.

At this stage you must be happy to see how easy and convenient this is, but wondering how you can integrate it with your code. You can replicate swift-temp-url with the following block of Python code:

import hmac
from hashlib import sha1
from time import time

# Get a temporary public url for the object
method = 'GET'
duration_in_seconds = 60*60*3
expires = int(time() + duration_in_seconds)
path = '/v1/AUTH_test/%s/%s' % (self.container, track)
key = self.temp_url_key
hmac_body = '%s\n%s\n%s' % (method, expires, path)
sig =, hmac_body, sha1).hexdigest()
s = 'https://{host}/{path}?temp_url_sig={sig}&temp_url_expires={expires}'
url = s.format(host=self.url, path=path, sig=sig, expires=expires)

Have in mind that any alteration of the resource path or query arguments results in a 401 Unauthorized error. Similarly, a PUT where GET was the allowed method returns a 401. HEAD is allowed if GET or PUT is allowed. Changing the X-Account-Meta-Temp-URL-Key invalidates any previously generated temporary URLs within 60 seconds (the memcache time for the key).

That's all for now. What else would you like to learn regarding Swift?

February 20, 2014 08:17 PM

February 19, 2014

Neil McGovern

+++ BREAKING Debian 9.0 codename

A source close to the Debian Project has informed me that the project will break from it’s long running tradition of Pixar codenames for their releases.
Going further than this, in what is surely a contentious move, the project will also use multi-worded code names. This is as they feel jealous of the extra exposure that Ubuntu and Fedora have with double barrelled release names.
Thus, the next version of Debian will actually be called Debian 8dot0 “phoronoix-will-believe-anything”.

For those without a satirical radar, this may all be a complete lie

by Neil McGovern at February 19, 2014 05:34 PM

February 07, 2014

Trever Fischer

An experiment in space building

Greetings, fellow humans! Some people may have noticed a lack of FOSS-y contributions from me over the last year or so. A lot of that time went towards running the first full year of SYNHAK, the Hackerspace in Akron, OH that I helped found in 2011.

As it began to grow and take off, my interests in FOSS moved towards the infrastructure that actually makes a Hackerspace tick. Hackerspaces are wonderful things: autonomous collections of people where even the organization structure itself is designed to be resiliant and hackable. From the outset, I wanted SYNHAK to have a minimal bus factor. Being the first person to come with any kind of workable plan and convince some others to help it out, I had a significant hand in constructing the foundations that we maintain today.

Before SYNHAK, I was very involved in KDE and Linux spheres for a good number of years. FOSS is an amazing environment for new ideas to sprout and take hold, and a lot of that is due to the infrastructure supporting it: the tools, public and open documentation, instantly hackable sources, community governance, and more. I wanted to bring that to SYNHAK. I feel that for the most part, it has been highly successful on all of those fronts. I would argue that SYNHAK bears the right to be called an Open Source Hackerspace.

Forking SYNHAK

SYNHAK’s infrastructure is easily forkable. You can have your own local copy of SYNHAK up and running in a day, if you’ve got the time. I drew a lot of my design decisions from the FOSS world. Almost all of SYNHAK is available online in git. The rest is in a wiki. We are incredibly transparent with heaps of documentation, discussion, and documentation of discussion.

In particular, the ideas I brought from FOSS were:

  • The community governs itself
  • Release early, release often
  • Transparency in operations
  • If it didn’t happen on the mailing list, it didn’t happen
  • Meritocracy

Over the next few days, I will be publishing a series of articles I wrote on the various pieces of infrastructure I contributed to for SYNHAK during it’s first two full years of existence in an attempt to build an open source hackerspace from the ground up.

by Torrie Fischer at February 07, 2014 04:10 PM

February 04, 2014

Tomeu Vizoso

Multi-touch gestures in Mutter-based compositors

A customer has asked for documentation on handling multi-touch gestures in their Mutter-based compositor (see my previous post) and I thought that it could be a good idea to have it in the GNOME wiki, in case it helps when they are added to GNOME Shell:

Multi-touch gestures in Mutter-based compositors

I'm not sure of what would be the best use for multi-touch gestures in the Shell, probably for resizing windows (with a 4-finger pinch gesture) or for switching desktops (with a 3 or 4 finger swipe). Probably some ideas can be taken from the multi-tasking gestures in recent versions of iOS, such as using pinch gestures to activate hidden panels or to switch to another views.

Something I feel strongly is about restricting system-wide gestures to more than 3 fingers, because the user experience and the implementation gets quite complicated if the compositor and the applications need to compete for touch sequences in similar gestures.

It's currently a bit convoluted due to zero support in Mutter for touch events, but once the Shell starts using touch events, I think it will make sense to move some of the setup and boilerplate into Mutter.

Once more, thanks to my employer Collabora for sponsoring this work:

by Tomeu Vizoso ( at February 04, 2014 02:59 PM

January 31, 2014

Pekka Paalanen

Improving presentation on Wayland

In the last two (or three?) weeks at Collabora I have been looking into a Wayland protocol extension that would allow accurately timed presentation. Accurate timing is essential for two quite different use cases: video playback with audio/video synchronization, and interactive GUI with just-in-time redrawing. Video playback with A/V sync was our primary goal when we started working on this, and there is Frederic Plourde's first proposal from October 2013. Since then I have realized that also other kinds of applications need timings, especially feedback on when their content updates were shown, and when is the next chance to show an update (vblank). Primarily my re-design started with the aim to improve resizing performance when I got the assignment from Daniel Stone to push Wayland presentation protocol forward. The RFC v2 of Wayland presentation extension is now out for review and discussion.

I looked at various timing and content posting related APIs, like EGL and its extensions, GLX_OML_sync_control, and the new X11 Present and Keith's blog posts about it. I found a couple of things I could not understand what they were for and asked about them on a few mailing lists. The replies and further pondering resulted in a conclusion that I do not have to support the MSC modulus matching logic, and eglSwapInterval for intervals greater than one could be implemented if anyone really needs it.

I took a more detailed look at X11 Present for XWayland purposes. The Wayland presentation protocol extension I am proposing is not meant to solve all problems in supporting X11 Present in XWayland, but the investigation gave me some faith that with small additional XWayland extensions it could be done. Axel Davy is already implementing Present on core Wayland protocol as far as it is possible anyway, and we had lots of interesting discussions.

I am not going into any details of the RFC v2 proposal here, as the email should contain exhaustive documentation on the design. If no significant flaws are found, the next steps would be to implement this in Weston and see how it works.

by pq ( at January 31, 2014 12:03 PM

January 26, 2014

Javier Martinez Canillas

An update on Device Tree support for IGEP boards

Linux v3.13 has been released a couple of days ago and this is the first kernel version that has complete Device Tree support for IGEP boards.

Since the initial DT support that I talked before, the following peripherals were added:

- Ethernet
- NAND flash
- Wifi/BT combo (support added by Enric Balletbo)

The only remaining bit in DT is video since the DT bindings for the OMAP Display Sub-System (DSS) have not landed in mainline yet so display support is still provided using the OMAP platform data quirk infrastructure.

So, this make the IGEP the first OMAP3 device to complete the transition to Device Tree based booting and the board file could finally be removed.

The migration from board files to Device Tree booting was not trivial as I initially thought since OMAP GPMC and GPIO DT support was not mature enough so I had to add support for GPMC DT ethernet and fix some annoying bugs.

by Javier Martinez Canillas at January 26, 2014 06:30 PM

January 07, 2014

Vincent Sanders

Healthy discontent is the prelude to progress

Back in the mists of Internet time (2002) when spirits were brave, the stakes were high, men were real men, women were real women and small furry creatures from Alpha Centauri were real small furry creatures from Alpha Centauri there were few hosting options for a budding open source project.

Despite the meteoric rise of many dot com companies in the early noughties a project either had to go it alone by running everything themselves or pick from a small selection of companies that had not yet worked out how to turn free hosting into money.

One such company, arguably the first, was VA Research with their SourceForge system. Many projects used their platform including a small niche web browser called NetSurf. For years the service was great, there were rough edges but nothing awful.

Netsurf issue tracker in the new SourceForge interface
Over time the NetSurf projects requirements grew beyond what SourceForge could provide and service after service was migrated away eventually all that was left was the bug tracking system. This remained the state of affairs until mid 2013 when SourceForge forced a migration to their new platform which made them unsuitable for the projects use case.

Aside from the issue trackers questionable user interface SourceForge had started aggressively placing advertising throughout their platform. Some of the placements so inadvisable that projects started taking the decision to leave.

While I appreciate that SourceForge had to make money to provide a service they appear to have sown discontent within a large part of their user base without understanding that there are a number of alternatives solutions with a much less onerous funding model.

NetSurf used this as an opportunity to move the remaining issue tracking service to our own infrastructure. Rob Kendrick proceeded to evaluate several solutions and in December 2013 I finally found the time to migrate an XML dump of the old data from SourceForge into MantisBT.

Migrating data from one database into another via incompatible formats took me back to my roots. My early career started with programming tasks moving historical business data from ancient large systems which were about to be scrapped to modern Sparc based systems. Later I would be in a role where financial data needed to be retrieved from obsolete proprietary systems and moved into databases on x86 servers.

My experience in this field was not really stretched as it turns out that modern systems can process a few tens of megabytes in seconds rather than the days for a run in my youth! So some ugly perl scripts and a few hours later I had a nice shiny SQL database filled with NetSurfs bugs and MantisBT instance configured to use them.

NetSurf MantisBT instance showing most recently updated open bugs
Then came the hard bit, triaging all the open bugs, fixing up all the bugs submitted as an anonymous user but with email addresses in, removing duplicates and checking every open bug was still valid took almost two weeks of tedious drudge.

I set up an initial bug workflow within the system that the project developers are still fine tuning to better suit their needs but overall Mantis is proving a very flexible tool. The main deficiencies center around configuration for the projects useage, especially removing unused fields from filters and making the workflow more intuitive..

The resulting system is now getting bug reports submitted again where the sourceforge system had had three in the six months since the forced migration.

The issue tracker is once more a useful tool for the developers allowing us to focus on areas actually causing problems for our users and allowing us to see progress we are making fixing issues.

Overall this was a successful migration and provides a platform the NetSurf project can control where we can offer guarantees to our users about their personal information usage and having a clean rapid interface with no advertisements.

by Vincent Sanders ( at January 07, 2014 06:10 PM

January 06, 2014

Vincent Sanders

NetSurf Developer Workshop Redux

The NetSurf Developers bringing you alternative LOLcat viewing software since 2002Once again the NetSurf developers congregated in Cambridge at the Collabora offices where we were made welcome in a nice environment for the event.

Five developers managed to attend from around the UK Rob Kendrick, Vincent Sanders, Daniel Silverstone, John-Mark Bell and Michael Drake. We also had Chris Young and François Revol providing some bug fixes remotely.

This was the first time we had all met since the previous event towards the end of 2012 and we took full advantage of this to discuss a pretty extensive agenda in addition to the practical programming tasks.

From Friday lunchtime through to Sunday evening we managed 30 hours of work consisting of over 70 commits to over 100 files.

The whiteboard of our notes

Our main focus was working towards a 3.1 release which is scheduled for early April. Along with the source the release will have binary builds for RISC OS, AmigaOS, Windows and Mac OS X (x86 and ppc). Although the NetSurf project will not be directly releasing binaries for the GTK and Framebuffer frontends we will be ensuring the Debian packages are updated which is our prefered method of distribution for those targets.

We analysed the 3.0 release and formulated an improved process for the future. The 3.1 release will be generated automatically by the CI system ensuring constant results and removing the problems we encountered previously.

A set of release blocking issues was derived which we used as a task list during the workshop.The majority of these were completed including:
DOM based forms
Web forms are a feature Netsurf has supported for a long time and their implementation has not kept up with the rest of the browser. This is a long standing problem area which has resulted in numerous strange bugs with form submission. With this change the form system has been reworked to correctly operate directly from the DOM resulting in the squashing of a large number of bugs and a much improved user experience.

DOM based image loading
Up to now image fetching was performed only during the rendering of a page. With this change when the image link is placed into the DOM during the page parse it is scheduled to be fetched, this should give an improved user experience as images should be available earlier in a pages render.

Removal of MNG support
NetSurf has supported MNG since the 1.0 release, indeed the MNG library used to provide the PNG support too though we have long ago transitioned to libPNG. Alas the web has moved on and MNG has been largely forgotten, the libMNG library that performs the image decoding is old and generally unsupported specifically lacking security updates.

The build issues with libMNG (lack of pkg-config, reliance on libcms1 etc.) were causing maintenance issues in code nobody was actually using (there were crash bugs discovered during its removal!). Because of these issues it was decided to join the vast majority of browsers and remove support for this format.
The developers also addressed several issues with toolchain construction and a number of annoying usability bugs.

Plans for how to improve printing support we made. Initially we intend to fix the existing haru based pdf generation using this to print via pdf and in future have correct css styled page paginated printing render output.

The perennial issue of javascript was discussed however, while efforts to improve the existing support are ongoing, our usage of the spidermonkey library continues to raise various challenges including platform support and API changes between versions.

Due to these issues it has been suggested that we might add support for using the duktape JS engine instead, initial results are promising but given the size of the task of implementing an additional javascript engine binding further investigation is necessary before making a commitment.

Amongst the other discussions the group has also agreed that we will once again apply to be a GSoC organisation for a single student with some very focused projects:
  • Improving our HTML5 parser (hubbub)
  • Improving the DOM library implementing missing functionality.
While neither of these projects are as fashionable as some of our previous proposals they are well defined enough that as a group we believe we could offer enough support to the student to make their experience a pleasurable one and get the resulting code reviewed and merged promptly.

This event was very successful with a great deal achieved, the project is much more likely to be in a good shape to release 3.1 by April now and the meeting has given the developers a much welcomed boost.

I would like to extend the groups thanks to Robert McQueen for letting us use the Collabora offices, Dorée Carrier for organising all the administrative things and to Vivek Dasmohapatra for coming out on his Sunday afternoon to let us back in after locking ourselves out.

by Vincent Sanders ( at January 06, 2014 05:43 PM

December 14, 2013

Philip Withnall

Notes on Vala and automake

Here are some brief notes about getting automake’s Vala support1 to play nicely with you when writing a library, mostly for my own future benefit.

Firstly, use target-specific *_VALAFLAGS variables (rather than AM_VALAFLAGS) to specify the flags to pass to the valac compiler. Doing this enables a couple of dist and clean features in automake which wouldn’t otherwise be enabled. It’s also means you can specify multiple targets in a single, which is good for non-recursive automake.

When building a library, explicitly specify --vapi, --header, --internal-vapi, --internal-header and --gir (as appropriate) in the *_VALAFLAGS variable directly, rather than including them through another variable. automake looks for these flags in order to build dist and clean rules for the generated VAPI, C header and GIR files.

automake will automatically delete .stamp, VAPI and generated C header and source files on maintainer-clean, but not as part of any other clean target — this is because automake is designed to ensure generated C files are included in the distribution tarball. Failing to specify --header, --vapi (etc.) directly in *_VALAFLAGS will cause the appropriate maintainer-clean rules to not be generated.

automake knows nothing about .deps files (which accompany .vapi files), and will not generate dist rules for them; that must be done manually using:

vapidir = $(datadir)/vala/vapi
dist_vapi_DATA = my-library.deps

Source: lang_vala_finish_target() in the automake-1.13 script. Read it: it’s quite clear and informative.

  1. Note: That documentation seems to be a bit out of date regarding VALAFLAGS

by Philip Withnall at December 14, 2013 10:34 PM

December 11, 2013

Gustavo Noronha Silva

WebKitGTK+ hackfest 5.0 (2013)!

For the fifth year in a row the fearless WebKitGTK+ hackers have gathered in A Coruña to bring GNOME and the web closer. Igalia has organized and hosted it as usual, welcoming a record 30 people to its office. The GNOME foundation has sponsored my trip allowing me to fly the cool 18 seats propeller airplane from Lisbon to A Coruña, which is a nice adventure, and have pulpo a feira for dinner, which I simply love! That in addition to enjoying the company of so many great hackers.

Web with wider tabs and the new prefs dialog

Web with wider tabs and the new prefs dialog

The goals for the hackfest have been ambitious, as usual, but we made good headway on them. Web the browser (AKA Epiphany) has seen a ton of little improvements, with Carlos splitting the shell search provider to a separate binary, which allowed us to remove some hacks from the session management code from the browser. It also makes testing changes to Web more convenient again. Jon McCan has been pounding at Web’s UI making it more sleek, with tabs that expand to make better use of available horizontal space in the tab bar, new dialogs for preferences, cookies and password handling. I have made my tiny contribution by making it not keep tabs that were created just for what turned out to be a download around. For this last day of hackfest I plan to also fix an issue with text encoding detection and help track down a hang that happens upon page load.

Martin Robinson and Dan Winship hack

Martin Robinson and Dan Winship hack

Martin Robinson and myself have as usual dived into the more disgusting and wide-reaching maintainership tasks that we have lots of trouble pushing forward on our day-to-day lives. Porting our build system to CMake has been one of these long-term goals, not because we love CMake (we don’t) or because we hate autotools (we do), but because it should make people’s lives easier when adding new files to the build, and should also make our build less hacky and quicker – it is sad to see how slow our build can be when compared to something like Chromium, and we think a big part of the problem lies on how complex and dumb autotools and make can be. We have picked up a few of our old branches, brought them up-to-date and landed, which now lets us build the main WebKit2GTK+ library through cmake in trunk. This is an important first step, but there’s plenty to do.

Hackers take advantage of the icecream network for faster builds

Hackers take advantage of the icecream network for faster builds

Under the hood, Dan Winship has been pushing HTTP2 support for libsoup forward, with a dead-tree version of the spec by his side. He is refactoring libsoup internals to accomodate the new code paths. Still on the HTTP front, I have been updating soup’s MIME type sniffing support to match the newest living specification, which includes specification for several new types and a new security feature introduced by Internet Explorer and later adopted by other browsers. The huge task of preparing the ground for a one process per tab (or other kinds of process separation, this will still be topic for discussion for a while) has been pushed forward by several hackers, with Carlos Garcia and Andy Wingo leading the charge.

Jon and Guillaume battling code

Jon and Guillaume battling code

Other than that I have been putting in some more work on improving the integration of the new Web Inspector with WebKitGTK+. Carlos has reviewed the patch to allow attaching the inspector to the right side of the window, but we have decided to split it in two, one providing the functionality and one the API that will allow browsers to customize how that is done. There’s a lot of work to be done here, I plan to land at least this first patch durign the hackfest. I have also fought one more battle in the never-ending User-Agent sniffing war, in which we cannot win, it looks like.

Hackers chillin' at A Coruña

Hackers chillin’ at A Coruña

I am very happy to be here for the fifth year in a row, and I hope we will be meeting here for many more years to come! Thanks a lot to Igalia for sponsoring and hosting the hackfest, and to the GNOME foundation for making it possible for me to attend! See you in 2014!

by kov at December 11, 2013 09:47 AM

December 09, 2013

Luis de Bethencourt

Happy Birthday Grace Hopper

Link to video for people in planets.

"She developed the first compiler for a computer programming language. She conceptualized the idea of machine-independent programming languages, which led to the development of COBOL, one of the first modern programming languages. She is credited with popularizing the term "debugging" for fixing computer glitches (inspired by an actual moth removed from the computer)." Wikipedia

By the way, for anybody wondering why the COBOL in the Google Doodle for her today:
"SUBTRACT CurrentYear from BirthYear GIVING Age
Display Age"
doesn't return a negative value, it is because age is defined unsigned in the data division. Since ages can't be negative, and COBOL handles this negative return to an unsigned value Gracefully.

December 09, 2013 07:30 PM

Philip Withnall

Clang plugin for GLib and GNOME

This past week, I’ve been working on a Clang plugin for GLib and GNOME, with the aim of improving static analysis of GLib-based C projects by integrating GIR metadata and assumptions about common GLib and GObject coding practices. The idea is that if running Clang for static analysis on a project, the plugin will suppress common GLib-based false positives and emit warnings for common problems which Clang wouldn’t previously have known about. Using it should be as simple as enabling scan-build, with the appropriate plugin, in your JHBuild configuration file.

I’m really excited about this project, which I’m working on as R&D at Collabora. It’s still early days, and the project has a huge scope, but so far I have a Clang plugin with two major features. It can:

  • Load GIR metadata1 and use it to add nonnull attributes based on the presence (or absence) of (allow-none) annotations.
  • Detect g_return_if_fail(), assert(), g_return_val_if_fail() (etc.) preconditions in function bodies and add nonnull attributes if the assertions are non-NULL checks (or are GObject type checks which also assert non-nullability).

Additionally, the plugin can use GIR metadata to add other attributes:

  • Add deprecated attributes based on Deprecated: gtk-doc tags.
  • Add warn_unused_result attributes to functions which have a (transfer container) or (transfer full) return type.
  • Add malloc attributes to functions which are marked as (constructor).
  • Constify the return types of (transfer none) functions which return a pointer type.

By modifying Clang’s abstract syntax tree (AST) for a file while it’s being scanned, the plugin can take advantage of the existing static analysis for NULL propagation to function calls with nonnull attributes, plus its analysis for variable constness, use of deprecated functions, etc.

So far, all the code is in two ‘annotaters’, which modify the AST but don’t emit any new warnings of their own. The next step is to implement some ‘checkers’, which examine (but don’t modify) the AST and emit warnings for common problems. Some of the problems checked for will be similar to those handled above (e.g. warning about a missing deprecated attribute if a function has a Deprecated: gtk-doc tag), but others can be completely new, such as checking that if a function has a GError parameter then it also has a g_return_if_fail(error == NULL || *error == NULL) precondition (or something along those lines).

I’ve got several big ideas for features to add once this initial work is complete, but the next thing to work on is making the plugin effortless to use (or, realistically, nobody will use it). Currently, the plugin is injected by a wrapper script around Clang, e.g.:

cd /path/to/glib/source
GNOME_CLANG_GIRS="GLib-2.0 Gio-2.0" gnome-clang -cc1 -analyze … /files/to/analyse

This injects the -load $PREFIX/lib64/clang/3.5/ -add-plugin gnome arguments which load and enable the plugin. By specifying GIR namespaces and versions in GNOME_CLANG_GIRS, those GIRs will be loaded and their metadata used by the plugin.

This can be configured in your ~/.jhbuildrc with:

static_analyzer = True
static_analyzer_template = 'scan-build --use-analyzer=$JHBUILD_PREFIX/bin/gnome-clang -v -o %(outputdir)s/%(module)s'
os.environ['GNOME_CLANG_GIRS'] = 'GLib-2.0 Gio-2.0 GnomeDesktop-3.0'

Please note that it’s alpha-quality code at the moment, and may introduce program bugs if used during compilation, so should only be used with Clang’s -analyze option. Furthermore, it still produces some false positives, so please be cautious about submitting bug reports to upstream projects ‘fixing’ problems detected by static analysis.

The code so far is here: and there’s a primitive website here: If you’ve got any ideas, bug reports, patches, or just want to criticise my appalling C++, please e-mail me: philip {at}

  1. Actually, it loads the metadata from GIR typelib files, rather than using the source code annotations themselves, so there is a degree of data loss incurred due to the limitations of the typelib binary format. 

by Philip Withnall at December 09, 2013 04:04 PM

November 29, 2013

Tollef Fog Heen

Redirect loop with (and how to fix it)

I'm running a local unbound instance on my laptop to get working DNSSEC. It turns out that with the captive portal NSB (the Norwegian national rail company), this doesn't work too well and you get into an endless series of redirects. Changing resolv.conf so you use the DHCP-provided resolver stops the redirect loop and you can then log in. Afterwards, you're free to switch back to using your own local resolver.

November 29, 2013 07:37 AM

November 24, 2013

Xavier Claessens

Folks2 progress

I am working on Folks2 when I have a bit of free time. In short it is a rewrite from scratch of Folks in C with a daemon/db to remember who is merged with who. The goal is to be faster and being able to load individuals separately instead of having to load them all.

Recently, I’ve been writing libfolks2-roster, a GTK library providing widgets to display a contact list and contact details. It is using awesome GTK 3.10 GtkListBox, GtkStack and GtkSearchBar. The main issue I’m facing now is GtkListBox slowness, it really needs a model to create only visible widgets.

(Yes it is ubuntu theme, I’m usually using unity but video recording didn’t work)

by xclaesse at November 24, 2013 05:48 PM

November 22, 2013

Luis de Bethencourt

Basic Data Structures and Algorithms in the Linux Kernel

Thanks to Vijay D'Silva's brilliant answer in I have been reading some of the famous data structures and algorithms used in the Linux Kernel. And so can you.

Links based on linux 2.6:

November 22, 2013 08:00 PM

Vincent Sanders

Error analysis is the sweet spot for improvement

Although Don Norman was discussing designers attitude to user errors I assert the same is true for programmers when we use static program analysis tools.

The errors, or rather defects in the jargon, that a static analysis tools produce can be considered low cost well formed bug reports available very early in the development process.

When I say low cost it is because they can be found by a machine without a user or fellow developer wasting their time finding them. Well formed comes because the machine can describe exactly how it came to the logical deduction leading to the defect.


Static analysis is in general terms using the computer to examine a program for logic errors beyond those of pure syntax before it is executed. Examining a running program for defects is known as dynamic program analysis and while a powerful tool in its own right is not the topic of discussion.

This analysis has historically been confined to compiled languages as their compilers already had the Abstract Syntax Tree (AST) of the code available for analysis. As an example the C language (released in 1972) had the lint tool (released in 1979) based on the PCC compiler.

Practical early compilers (I am generalising here as the 19070s were a time of white hot innovation in computing and examples of just about any innovation in the field could probably be found) were pretty primitive and produced executables which were less good than hand written assembler output. Due to practical constraints the progress of optimising compilers was not as rapid as might be desired so static analysis was largely used as an external process.

Before progressing I ought to explain why I just mixed the concept of an optimising compiler and static analysis. The act of optimisation within those compilers requires program analysis, from which they can generate defect reports which we all know and love as compiler warnings, also explaining why many warnings only appear at higher optimisation levels where deeper analysis is required.

The attentive reader may now enquire as to why we would need external analysis tools when our compilers already perform the task. The answer stems from the issue that a compiler is trying to reconcile many desirable traits including:
  • Produce correct (expected) output from the source code for the target processor 
  • Produce output which will execute using the smallest amount of resources possible (not just CPU time but memory access and cache usage)
  • Generate output in a reasonable amount of time. 
  • Have a reasonable cost (both developer time and research into new methods) to implement the compiler itself.
  • Produce useful diagnostics
The slow progress in creating optimising compilers initially centred around the problem of getting the compiled output in a reasonable time to allow for a practical edit-compile-run-debug cycle although the issues more recently have moved more towards the compiler implementation costs.

Because the output generation time is still a significant factor compilers limit the level of static analysis performed to that strictly required to produce good output. In standard operation optimising compilers do not do the extended analysis necessary to find all the defects that might be detectable. 

An example: compiling one 200,000 line C program with the clang (v3.3) compiler producing x86 instruction binaries at optimisation level 2 takes 70 seconds but using the clang based scan-build static analysis tool took 517 seconds or more than seven times as long.

Using static analysis

As already described warnings are a by-product of an optimising compilers analysis and most good programmers will endeavour to remove all warnings from a project. Thus almost all programmers are already using static analysis to some degree.

The external analysis tools available can produce many more defect reports than the compiler alone as long as the developer is prepared to wait for the output. Because of this delay static analysis is often done outside the usual developers cycle and often integrated into a projects Continuous Integration (CI) system.

The resulting defects are usually presented as annotated source code with a numbered list of logical steps which shows how the defect can present. For example the steps might highlight where a line of code allocates memory from the heap and then an exit path where no reference to the allocated memory is kept resulting in a resource leak.

Once the analysis has been performed and a list of defects generated the main problem with this technology rears its ugly head, that of so called "false positives". The analysis is fundamentally an undecidable problem (it is a variation of the halting problem) and relies on algorithms to generate approximate solutions. Because of this some of the identified defects are erroneous.

The level of erroneous defect reports varies depending on the codebase being analysed and how good the analysis tool being used is. It is not uncommon to see false positive rates, even with the best tools, in excess of 10%

Good tools allow for this and provide ways to supply additional context through model files or hints in the source code to suppress the incorrect defect reports. This is analogous to using asserts to explicitly constrain  variable values or a type cast to suppress a type warning.

Even once the false positives have been dealt with there comes the problem of defects which while they may be theoretically possible take so many steps to achieve that their probability is remote at best. These defects are often better categorized as a missing constraint and the better analysis tools generate fewer than the more naive implementations.

An issue with some defect reports is that often defects will appear in a small number of modules within programs, generally where the developers already know the code is of poor quality, thus not adding useful knowledge about a project.

As with all code quality tools static analysis can be helpful but is not a panacea code may be completely defect free but still fail to function correctly.

Defect Density

A term that is often used as a metric for code quality is the defect density. This is nothing more than the ratio of defect to thousands of lines of code e.g. a defect density of 0.9 means that there is approximately one defect found in every 1100 lines of code.

The often quoted industry average defect density value is 1, as with all software metrics this can be a useful indicator but should not be used without understanding.

The value will be affected by improvements in the tool as well as how lines of code are counted so is exceptionally susceptible to gaming and long term trends must be treated with scepticism.

Practical examples

I have integrated two distinct static analysis tools into the development workflow for the NetSurf project which I shall present as case studies. These examples show a good open source solution and a commercial offering highlighting the issues with each.

Several other solutions, both open source and commercial, exist many of which have been examined and discarded as either impractical or proving less useful than those selected. However the investigation was not comprehensive and only considered what was practical for the project at the time.


The clang project is a frontend to the LLVM project providing an optimising compiler for the C, C++ and objective C languages. As part of this project the compiler has been enhanced to run a collection of "checkers" which implement various methods of analysis on the code being compiled.

The "scan-build" tool is provided to make the using these features straightforward. This tool generates defect reports as a series of html files which show the analysis results.

NetSurf CI system scan-build overview
Because the scan-build takes in excess of eight minutes on powerful hardware the NetSurf developers are not going to run this tool themselves as a matter of course. To get the useful output without the downsides it was decided to integrate the scan into the CI system code quality checks.

NetSurf CI system scan-build result list
Whenever a git commit happens to the mainline branch and the standard check build completes successfully on all target architectures the scan is performed and the results are published as a list of defects.

The list is accessible directly through the CI interface and also incorporates a trend graph showing how many defects were detected in each build.

A scan-build report showing an extremely unlikely path to a defect
Each defect listed has a detail link which reveals the full analysis and logic necessary to cause the defect to occur.

Unfortunately even NetSurf which is a relatively small piece of software (around 200,000 lines of code at time of writing) causes 107 defects to be emitted by scan-build.

All but 25 of the defects are however "Dead Store" where the code has a value assigned but is never checked. These errors are simply not interesting to the developers and are occurring in code generated by a tool.

Of the remaining defects identified the majority are false positives and several (like the example in the image above) are simply improbable requiring a large number of steps to reach.

This shows up the main problem with the scan-build tool in that there is no way to suppress certain checks, mark defects as erroneous or avoid false positives using a model file. This reduces the usefulness of these builds because the developers all need to remember that this list of defects is not relevant.

Most of the NetSurf developers know that the project currently has 107 outstanding issues and if a code change or tool improvement were to change that value we have to manually work through the defect list one by one to check what had changed.


The coverity SAVE tool is a commercial offering from a company founded in the Computer Systems Laboratory at Stanford University in Palo Alto, California. The results of the original novel research has produced a good solution which improved on analysis tools previously available.

Coverity Interface showing summary of NetSurf analysis. Layout issues are a NetSurf bug
The company hosts a gratis service for open source projects, they even provide scans for the Linux kernel so project size does not appear to be an issue.

The challenges faced integrating the coverity tool into the build process differed from clang however the issue of execution time remained and the CI service was used.

The coverity scanning tool is a binary executable which collects data on the build which is then submitted to the coverity service to be analysed. This tool obviously relies upon the developer running the executable to trust coverity to some degree.

A basic examination of the binary was performed and determined the executable was not establishing network connections or performing and observably undesirable behaviour. From this investigation the decision was made that running the tool inside a sandbox environment on a CI build slave was safe. The CI system also submits the collected results in a compressed form directly to the coverity scan service.

Care must be taken to only submit builds according to the services Acceptable Use Policy which limits the submission frequency of NetSurf scans to every other day. To ensure the project stays within the rules the build performed by the CI system is manually controlled and confined to a subset of NetSurf developers.

Coverity connect defect management console for NetSurfThe results are presented using the coverity connect web technology based defect management tool. Access to the coverity connect interface is controlled by a user management system which precludes publicly publishing the results within the CI system.

Unfortunately NetSurf itself does not currently have good enough JavaScript DOM bindings to support this interface so another browser must be used to view it.

Despite the drawbacks the quality of the analysis results is greatly superior to the clang solution. The false positive rate is very low while finding many real issues which had not been previously detected.

The analysis can be enhanced by use of collection configuration and modelling files which remove intended constructions from consideration reducing the false positive rate to very low levels. The ability to easily and persistently suppress false positives through the web interface is also available.

The false positive management capabilities coupled with a user interface that makes understanding the defect path simple make this solution very practical and indeed the NetSurf developers have removed over 50 actual issues within a relatively short period since the introduction of the tool.

Not all of those defects could be considered serious but they had the effect of encouraging deeper inspection of some very dubious smelling source.


The principle conclusions of implementing and using static analysis have been:

  • It is a powerful tool which aids programmers in improving their software. 
  • It is not a panacea and bad code can have no defects.
  • It can suggest possible defects early in the development cycle.
  • It can highlight possibly problematic areas well before they affect a programs users.
  • The tool and the infrastructure around it have a large impact on the usefulness of the results.
  • The way results are presented has disproportionately significant impact on the usability of the defect reports.
  • The open source tools are good, and improving, but coverity currently provides a superior experience.
  • Integration into a projects CI system is beneficial.
When I started looking at this technology I was somewhat dubious about its usefulness but I have definitely changed my mind. It is a useful addition to any non-trivial project and the return on time and effort should be repaid handsomely in all but already perfect code (if you believe you have such code I have a bridge to sell you).

by Vincent Sanders ( at November 22, 2013 03:25 PM

November 18, 2013

Pekka Paalanen

Sub-surfaces. Now.

Wayland sub-surfaces is a feature that has been brewing for a long long time, and finally it has made it into Wayland core in a recent commit + the Weston commit. The design for sub-surfaces started some time in December 2012, when the task was given to me at Collabora. It went through several RFCs and was finally merged into Weston in May 2013. After that there have been only small changes if any, and sub-surfaces matured (Or was forgotten? I had other things to do.) over several months. Now it is coming out in Wayland 1.4 (plan), but what is it really?


The basic visual (and UI) building block in Wayland (the protocol) is a wl_surface. Basically everything on screen is represented as wl_surfaces in the protocol: mouse cursors, windows, icons, etc. A surface gets its content and size by attaching a wl_buffer to it, which is a handle to a pixel container. A surface has many attributes, like the input region: the region of the surface where it can receive input events. Input events, e.g. pointer motion, that happen on the surface but outside of the input region get directed to what is below the surface. The input region can be empty, but it cannot extend beyond the surface dimensions.

It so happens, that cursor, shell surface (window), and drag icon are also surface roles. Under a desktop shell, a surface cannot become visible (mapped) unless it has a role, and it fills the requirements of that particular role. For example, a client can set a cursor surface only when it has the pointer focus. Without a role the compositor would not know what do with a surface. Roles are exclusive: a surface can have only one role at a time. How a role is assigned depends on the protocol for the particular role, there is no generic set_role-interface.

A window is a wl_surface with a suitable shell role, there is no separate object type "window" in the protocol. A window being a single wl_surface means that its contents must come from a single wl_buffer at a time. For most applications that is just fine, but there are few exceptions where it makes things less than optimal when you want to take advantage of hardware acceleration features to the fullest.

The problem

Let us consider a video player in a window. Window decorations and GUI elements are usually rendered in an RGB color format on the CPU. Video usually decodes into some YUV color format. To create one complete wl_buffer for the window, the application must merge these: convert the video into RGB and combine it with the GUI elements. And it has to do that for every single video frame, whether the GUI elements change or not. This causes several performance penalties. If your graphics card is capable of showing YUV-formatted content directly in an overlay, you cannot take advantage of that. If you have video decoding hardware, you probably have to access and copy the produced YUV images with the CPU, while doing a color conversion. Getting CPU access to a hardware rendered buffer may be expensive to begin with, and then color conversion means you are doing a copy. When you finally have that wl_buffer finished and send it to the compositor, the compositor will likely just have to upload it to the GPU again, making another expensive copy. All this hassle and pain is just to get the GUI elements and the video drawn into the same wl_buffer.

Another example is an OpenGL window, or an OpenGL canvas in a window. You definitely do not want to make the GL rendered buffer CPU-accessible, as that can be very expensive. The obvious workaround is to upload your other GUI elements into textures, and combine them with the GL canvas in GL. That could be fairly performant, but it is also very painful to achieve, especially if your toolkit has not been designed to work like that.

A more complex example is a Web browser, where you can have any number of video and GL widgets around the page.

Enter sub-surfaces

Sub-surface is a wl_surface role, that means the surface is an integral sub-part of a window. A sub-surface must always have a parent surface, and the parent surface can have any role. Therefore a window can be constructed from any number of wl_surface objects by choosing one of them to be the main surface which gets a role from the shell, and others are sub-surfaces. Also nesting is allowed, so you can have sub-sub-surfaces etc.

The tree of sub-surfaces starting from the main surface defines a window. The application sets the sub-surface's position on the parent surface, and the compositor will keep the sub-surface glued to the parent. The compositor does not clip sub-surfaces to the parent surface. This means you could implement decorations as four surfaces around the content surface, and compared to one big surface for decorations, you avoid wasting memory for the part that will always be behind the content surface. (This approach may have a visual downside, though.) It also means, that for window management purposes, the size of the window comes from the union of the whole (sub-)surface tree.

In the windowed video player example, the video can be put on a wl_surface of its own, and the decorations into another. If there are sub-titles on top of the video, that could be a third wl_surface. If the compositor accepts the YUV color format the video decoder produces, you can decode straight into a wl_buffer's storage, and attach that wl_buffer to the wl_surface. No more copying or color conversions in the application. When the compositor gets the YUV buffer, it could use GLSL shaders to convert it into RGBA while it composites, or put the buffer into a hardware overlay directly. In the overlay case, the data produced by the (hardware) video decoder gets scanned out on the graphics chip zero-copy! After decoding, the data is not copied or converted even once, which is the optimal path. Of course, in practice there are many implementation details to get right before reaching the optimal path.


Updates to one wl_surface are made atomic with the commit request. A tree of sub-surfaces needs to be updated atomically, too. This is important especially in resizing a window.

A sub-surface's commit request acts specially, when the sub-surface is in synchronized mode. A commit on the sub-wl_surface does not immediately apply the pending surface state, but instead the pending state is cached. The cache is just another copy of the surface state, in addition to the pending and current sets of state. The cached state gets applied when the parent wl_surface gets new state applied (Note: not straight on the parent surface's commit, but when it gets new state applied.) Relying on the cache mechanism, an application can submit new state for the whole tree of surfaces, and then apply it all with a single request: commit on the main surface.

Input handling considerations

When a window has sub-surfaces completely overlapping with its main surface, it is often easiest to set the input region of all sub-surfaces to empty. This will cause all input events to be reported on the main surface, and in the main surface coordinates. Otherwise the input events on a sub-surface are reported in the sub-surface's coordinates.

Independent application sub-modules

A use case than was strongly affecting the design of the sub-surface protocol was application plugin level embedding. An application creates a wl_surface, turns it into a sub-surface, and gives control of that wl_surface to a sub-module or a plugin.

Let us say the plugin is a video sink running in its own thread, and the host application is a Web browser. The browser initializes the video sink and gives it the wl_surface to play on. The video sink decodes the video and pushes frames to the wl_surface. To avoid waking up the browser for every video frame and requiring it to commit on its main surface to let each video frame become visible, the browser can set the sub-surface to desynchronized mode. In desynchronized mode, commits on the sub-surface apply the pending state directly, just like without the sub-surface role. The video sink can run on its own. The browser is still able to control the sub-surface's position on the main surface, glitch-free.

However, resizing gets more complicated, which was also a cause for some criticism. When the browser decides it needs to resize the sub-surface the video sink is using, it sets the sub-surface to synchronized mode temporarily, which means the video on screen stops updating, as all surface state updates now go into the cache. Then the browser signals the new size to the video sink, and the sink acknowledges when it has committed the first buffer with the new size. In the mean time, the browser has repainted its other window parts as needed, and then commits on its main surface. This produces an atomic window update on screen. Finally the browser sets the sub-surface back to the free-running mode. If all goes fast, the result is a glitch-free resize without missing a frame. If things take time, the user still sees a window resize without any flickers, but the video content may freeze for a moment.

Multiple input handlers

It is possible that sub-modules want to handle input on their wl_surfaces, which happen to be sub-surfaces. Sub-modules may even create new wl_surfaces, regardless whether they will be part of the sub-surface tree of a window or not. In such cases, there are a couple of catches.

The first catch is, that when input focus moves to a sub-surface, the input events are given in that surfaces coordinates, like said before.

The bigger catch is how input actually targets surfaces in the client side code. Actual input events for keyboards and pointer devices do not carry the target wl_surface as a parameter. The targeted surface is given by enter events, wl_pointer.enter(surface) for instance. In C code, it means a callback with the following signature gets called:
void pointer_enter(void *data, struct wl_pointer *wl_pointer, uint32_t serial, struct wl_surface *surface, wl_fixed_t surface_x, wl_fixed_t surface_y)
You get a struct wl_surface* saying which surface the following pointer events will target. I assume, that toolkits will call wl_surface_get_user_data(surface) to get a pointer to their internal structure, and then continue with that.

What if the wl_surface is not created by the toolkit to begin with? What if the surface was created by a sub-module, or a sub-module unexpectedly set a non-empty input region on a sub-surface? Then, get_user_data will give you a pointer which points to something else that you thought, and the application likely crashes.

When a toolkit gets an enter event for a surface it does not know about, it must not try to use the user_data pointer. I see two obvious ways to detect such surfaces: maintain a hash table of known wl_surface pointers, or use a magic value in the beginning of the struct used as user_data. Neither is nice, but I do not see a way around it, and this is not limited to sub-surfaces or sub-sub-surfaces. Enter events may refer to any wl_surface objects created through the Wayland connection.

Therefore I would propose the following:
  • Always be prepared to receive an unknown wl_surface on enter and similar events.
  • When writing sub-modules and plugin interfaces, specify whether input is allowed, and whose responsibility is to set the input region to empty.

Out of scope

When I started designing the sub-surface protocol, a huge question was what to leave out of it. The following are not provided by sub-surfaces:
  • Embedding content from other Wayland clients. The sub-surface extension does not implement any "foreign surface" interfaces, or anything like what X allows by just taking the Window XID and passing it to another client to use. The current consensus seems to be that this should be solved by implementing a mini-compositor in the hosting application.
  • Clipping or scaling. The buffer you attach to a sub-surface will decide the size of the sub-surface. There is another extension coming for clipping and scaling.
  • Any kind of message passing between application components. That is better solved in application specific ways.


Sub-surfaces are intended for special cases, where you need to build a window from several buffers that are composited together, to make efficient use of the hardware resources. They are not meant for widgets in general, nor for pushing parts of application rendering to the compositor. Sub-surfaces are also not meant for things that are not integral parts of a window, like tooltips, menus, or drop-down boxes. These "transient" surface types should be offered by the shell protocol.

Thanks to Collabora, reviewers on wayland-devel@ and in IRC, my work colleagues, and everyone who has helped me with this. Special thanks to Giulio Camuffo for testing the decorations in 4 sub-surfaces use case. I hope I didn't forget anyone.

by pq ( at November 18, 2013 03:28 PM

November 13, 2013

Luis de Bethencourt

Minimalist illustrations reflective of Modern time

Technology Pride

Inner Child

Icons that screw up our day

more at

November 13, 2013 07:16 PM

November 04, 2013

Philip Withnall

libfolks: BlueZ backend for phone address book access

libfolks now has a backend for accessing address book data from Bluetooth-enabled phones, making use of BlueZ 5. This has been a long time in development, with various developers from Collabora contributing to it (thanks to Matthieu Bouron, Gustavo Padovan, Arun Raghavan and Jeremy Whiting!).

It works using the Bluetooth Phone Book Access Profile (PBAP), which allows a computer to download a phone’s address book as an array of vCards. libfolks will automatically do this for any phone which is paired with the computer (and which isn’t blocked), establishing a connection with any (paired, non-blocked) phone in range.

We’ve only been able to test this with a limited number of phones, so further real-world testing and feedback would be greatly appreciated. To do so, grab the latest version of libfolks from git (, build it and run folks-inspect (making sure that the ‘bluez’ backend is not disabled in ~/.local/share/folks/backends.ini, if you’ve created and customised that file).

As is the design of libfolks, this is not intended as a solution for synchronising contacts between your phone and your computer, but it does provide a handy way of accessing contacts on your phone from GNOME Shell or Contacts, or from other consumers of libfolks data.

Note that the backend only supports BlueZ 5 and explicitly doesn’t support BlueZ 4.

by Philip Withnall at November 04, 2013 02:14 PM