Planet Collabora

October 31, 2014

Philip Withnall

Introduction to ICE and libnice

As part of the series of tea time talks we do within Collabora, I recently got to refresh my knowledge of STUN, TURN and ICE (the protocols for NAT traversal) and give an introductory talk on how they all fit together within the context of libnice.

Since the talk might be useful (and perhaps even interesting) to a wider audience, I’ve made it available: slides, handout and source (git). It’s under CC-BY-SA 4.0. Please leave comments if anything is unclear, incorrect, or could do with more in-depth coverage!

by Philip Withnall at October 31, 2014 10:02 AM

Recent improvements in libnice

For the past several months, Olivier Crête and I have been working on a project using libnice at Collabora, which is now coming to a close. Through the project we’ve managed to add a number of large, new features to libnice, and implement hundreds (no exaggeration) of cleanups and bug fixes. All of this work was done upstream, and is available in libnice 0.1.8, released recently! GLib has also gained a number of networking fixes, API additions and documentation improvements.

tl;dr: libnice now has a GIOStream implementation, support for scatter–gather send and receive, and more mature pseudo-TCP support — so its API should be much nicer to integrate; GLib has gained a number of fixes.

Firstly, what is libnice? It’s a GLib implementation of ICE, the standard protocol for NAT traversal. Briefly, NAT traversal is needed when two hosts want to communicate peer-to-peer in a network where there is at least one NAT translator between them, meaning that at least one of the hosts cannot directly address the other until a mapping is created in the NAT translator. This is a very common situation (due to the shortage of IPv4 addresses, and the consequence that most home routers act as NAT translators) and affects virtually all peer-to-peer communications. It’s well covered in the literature, and the rest of this post will assume a basic understanding of NAT and ICE, a topic about which I recently gave a talk.

Conceptually, libnice exists just to create a reliable (TCP-like) or unreliable (UDP-like) socket which connects your host with a remote one in a manner that traverses any intervening NATs. At its core, it is effectively an implementation of send(), recv(), and some ancillary functions to negotiate the ICE stream at startup time.

The biggest change is the addition of nice_agent_get_io_stream(), and the GIOStream subclass it returns. This allows reliable ICE streams to be used via GIOStream, with all the API sugar which comes with GIO streams — for example, g_output_stream_splice(). Unreliable (UDP-like) ICE streams can’t be used this way because they’re not technically streams.

Highly related, the original receive API has been augmented with scatter–gather support in the form of a recvmmsg()-like API: nice_agent_recv_messages(). Along with appropriate improvements to libnice’s underlying socket implementations (the most obscure of which are still to be plumbed in), this allows performance improvements by batching messages, reducing the number of system calls needed for communication. Furthermore (perhaps more importantly) it reduces memory copies when assembling and parsing packets, by allowing the packets to be split across multiple non-contiguous buffers. This is a well-studied and long-known performance technique in networking, and it’s nice that libnice now supports it.

So, if you have an ICE connection (stream 1 on agent, with 2 components) exchanging packets with 20B headers and variable-length payloads, instead of:

nice_agent_attach_recv (agent, 1, 1, main_context, recv_cb, NULL);
nice_agent_attach_recv (agent, 1, 2, main_context, recv_cb, NULL);

…

static void
recv_cb (NiceAgent *agent, guint stream_id, guint component_id,
         guint len, const gchar *buf, gpointer user_data)
{
    if (stream_id != 1 ||
        (component_id != 1 && component_id != 2)) {
        g_assert_not_reached ();
    }

    if (parse_header (buf)) {
        if (component_id == 1)
            parse_component1_data (buf + 20, len - 20);
        else
            parse_component2_data (buf + 20, len - 20);
    }
}

…

static void
send_to_component (guint component_id,
                   const gchar *data_buf, gsize data_len)
{
    gsize len = 20 + data_len;
    guint8 *buf = malloc (len);

    build_header (buf);
    memcpy (buf + 20, data, data_len);

    if (nice_agent_send (agent, 1, component_id,
                         len, buf) != len) {
        /* Handle the error */
    }
}

you can now do:

/* Only set up 1 NiceInputMessage as an illustration. */

static guint8 buf1_1[20];  /* header */
static guint8 buf1_2[1024];  /* payload size limit */
static GInputVector buffers1[2] = {
    { &buf1_1, sizeof (buf1_1) },  /* header */
    { &buf1_2, sizeof (buf1_2) },  /* payload */
};
static NiceInputMessage messages[1] = {
    buffers1, G_N_ELEMENTS (buffers1),
    NULL, 0
};
GError *error = NULL;

n_messages = nice_agent_recv_messages (agent, 1, 1, &messages,
                                       G_N_ELEMENTS (messages),
                                       NULL, &error);
if (n_messages == 0 || error != NULL) {
    /* Handle the EOS or error. */
    if (error != NULL)
        g_error ("Error: %s", error->message);
    return;
}

/* Component 2 can be handled similarly and code paths combined. */
for (i = 0; i < n_messages; i++) {
    NiceInputMessage *message = &messages[i];

    if (parse_header (message->buffers[0].buffer)) {
        parse_component1_data (message->buffers[1].buffer,
                               message->buffers[1].size);
    }
}

…

static void
send_to_component (guint component_id, const gchar *data_buf,
                   gsize data_len)
{
    GError *error = NULL;
    guint8 header_buf[20];
    GOutputVector vec[2] = {
        { header_buf, sizeof (header_buf) },
        { data_buf, data_len },
    };
    NiceOutputMessage message = { vec, G_N_ELEMENTS (vec) };

    build_header (header_buf);

    if (nice_agent_send_messages_nonblocking (agent, 1, component_id,
                                              &message, 1, NULL,
                                              &error) != 1) {
        /* Handle the error */
        g_error ("Error: %s", error->message);
    }
}

libnice has also gained non-blocking variants of its I/O functions. Previously, one had to explicitly attach a libnice stream to a GMainContext to start receiving packets. Packets would be delivered individually via a callback function (set with nice_agent_attach_recv()), which was inefficient and made for awkward control flow. Now, the non-blocking I/O functions can be used with a custom GSource from g_pollable_input_stream_create_source() to allow for more flexible reception of packets using the more standard GLib pattern of attaching a GSource to the GMainContext and in its callback, calling g_pollable_input_stream_read_nonblocking() until all pending packets have been read. libnice’s internal timers (used for retransmit timeouts, etc.) are automatically added to the GMainContext passed into nice_agent_new() at construction time, which you must run all the time as before.

GIOStream *stream = nice_agent_get_io_stream (agent, 1, 1);
GInputStream *istream;
GPollableInputStream *pollable_istream;

istream = g_io_stream_get_input_stream (stream);
pollable_istream = G_POLLABLE_INPUT_STREAM ();

source = g_pollable_input_stream_create_source (pollable_istream, NULL);
g_source_set_callback (source, readable_cb, NULL, pollable_istream);
g_source_attach (main_context, source);

static gboolean
readable_cb (gpointer user_data)
{
    GPollableInputStream *pollable_istream = user_data;
    GError *error = NULL;
    guint8 buf[1024];  /* whatever the maximum packet size is */

    /* Read packets until the queue is empty. */
    while ( (len = g_pollable_input_stream_read_nonblocking (pollable_istream,
                                                             buf, sizeof (buf),
                                                             NULL,
                                                             &error) ) > 0) {
        /* Do something with the received packet. */
    }

    if (error != NULL) {
        /* Handle the error. */
    }
}

libnice also gained much-improved support for restarting individual streams using ICE restarts with the addition of nice_agent_restart_stream(), switching TURN relays with nice_agent_forget_relays(), plus a number of bug fixes.

Finally, FIN/ACK support has been added to libnice’s pseudo-TCP implementation. The code was originally based on Google’s libjingle pseudo-TCP, establishing a reliable connection over UDP by encapsulating TCP-like packets within UDP. This implemented the basics of TCP, but left things like the closing FIN/ACK handshake to higher-level protocols. Fine for Google, but not for our use case, so we added support for that. Furthermore, we needed to layer TLS over a pseudo-TCP connection using GTlsConnection, which required implementing half-duplex close support and fixing a few nasty leaks in GTlsConnection.

Thanks to the libnice community for testing out the changes, and thanks to the GLib developers for patiently reviewing the stream of tiny documentation fixes and several larger GLib patches! All of the libnice API changes are shown on the handy upstream-tracker.org tool.

by Philip Withnall at October 31, 2014 09:50 AM

October 28, 2014

Jeremy Whiting

Accessibility is alive (QtSpeech progress, Jovie's deprecation)

For some time I've been considering what to do about Jovie which was previously known as ktts (KDE Text To Speech). Since before the first KDE Frameworks release actually, since kdelibs used to host a dbus interface definition for the KSpeech dbus interface that ktts and then Jovie implemented. I have a qt5 frameworks branch of Jovie, but it didn't make much sense to port it, since a lot of it is or could become part of the upcoming QtSpeech module. So Jovie has no official qt5 port and wont be getting one either.

What will Okular, KNotify, and other applications that want to speak to users do instead? The answer is QtSpeech. QtSpeech is a project started by Frederik Gladhorn to bring speech api's to all the platforms that Qt supports. It is still in its infancy, but is quickly improving. A few weeks ago when I built my kf5 stack with kdesrc-build I noticed that kdepim(libs?) was depending on it and it hasn't been released yet, so I got motivated to send some improvements to qt-project. Frederik and Laurent Montel have been pushing fixes and improving it also. It is as easy if not easier to use than the KSpeech dbus api (and doesn't require dbus either) and can be used to speak text on linux/unix, osx, windows, and android platforms so far. If you are an expert on any of these platforms please send patches to implement api on these platforms in their backends, the more eyes on this project the faster we can get it solidified and released.

You may be asking but what about feature X in Jovie that I will miss desperately. Yes there are a few things that QtSpeech will not do that Jovie did. These will either need to be done in individual applications or we can create a small framework to add these features (or possibly add them to QtSpeech itself if they make sense there). The features I'm thinking of are:

1. Filtering - Changing ": Hey QtSpeech is really coming along now" to "jpwhiting says 'Hey QtSpeech is really coming along now'" for KNotifications and the like. This could likely be implemented easily in knotify itself and exposed in the notifications configuration dialog.
2. Voice switching - Changing which voice to use based on the text, or the application it is coming from or anything else. This might make sense in QtSpeech itself, but time will tell if it's a wanted/needed feature.
3. User configuration - Jovie had a decent (not ideal, but it was functional) ui to set some voice preferences, such as which voice you wanted to use, which pitch, volume, speed, gender, etc. This will become the only part of Jovie that will get ported, which is a KDE Control Module for speech-dispatcher settings. This may also change over time, as speech-dispatcher itself possibly grows a ui for it's settings.

All in all, progress is being made. I expect QtSpeech to be ready for release with Qt 5.5, but we'll see what happens.

by Jeremy Whiting (noreply@blogger.com) at October 28, 2014 10:05 PM

Jonny Lamb

Sciopero

screenshot

Public transport strikes in Rome are so frequent that it’s hard to remember when they are. I wrote a Gnome Shell extension to help remind me when there’s one either coming up or in progress. Find it on extensions.gnome.org. It gets its data from another little service I just made.


A Roma gli scioperi dei mezzi pubblici sono così frequenti che spesso è facile dimenticarsi quando ci sono. Ho scritto un’estensione per Gnome Shell per avvisare quando c’è o si avvicina uno sciopero dell’Atac. La puoi trovare su extensions.gnome.org. Funziona grazie ad un altro piccolo servizio che ho creato.

by Jonny Lamb at October 28, 2014 11:11 AM

October 18, 2014

Jeremy Whiting

Upcoming KDE Applications 14.12 release prep

Hello,

In preparing for the upcoming releases of KDE Applications 14.12 (2014 Month 12) I realized the other day that we have an interesting situation.  For Qt4 based applications there's libkdeedu which contains the kvtml parsing and manipulating code and also a handful of .kvtml files that KAnagram and KHangMan use to get their word lists. KAnagram has been ported to Qt5 and KDE Frameworks for some time now, and will have it's first Qt 5 based release at the end of this year. It uses libkeduvocdocument which was ported to Qt 5 also at about the same time this year. libkeduvocdocument uses Qt 5 and KDE Frameworks 5, and also ships the same handful of kvtml files that libkdeedu ships. (libkdeedu has been split for the Qt 5 based releases) KHangMan hasn't yet been ported to frameworks and Qt5 or at least the port isn't stable yet, so it will depend on libkdeedu still, as will KWordQuiz and Parley from what I understand. So we have two libraries that ship the same files, makes them not coinstallable. So we'll be moving the kvtml files out of the libraries and into kdeedu-data soonish to solve this problem. The moral of this story is to look around, see what will be released using Qt5 in the upcoming release, and what will be using Qt4 still. https://community.kde.org/Frameworks/Application-release-status-December-2014 may help also. If you maintain an application and haven't put your application on that page under the Qt4 or Qt5 tables yet, please do, the more we coordinate, the better this release will be.

Thanks, and keep up the good work all.

by Jeremy Whiting (noreply@blogger.com) at October 18, 2014 09:22 PM

October 11, 2014

Jeremy Whiting

Simple Elegance

I just noticed a couple of features today and yesterday in plasma next and kwin that I appreciate and wanted to thank whoever thought of adding them. Both are simple but very handy to have. I'm talking about the little X buttons on both the wallpaper configuration dialog and the kwin present windows effect. I don't use either of these features very often, but yesterday when I was testing knewstuff with the wallpaper config it was very handy to be able to delete the installed wallpaper from the wallpaper selection dialog. Then just now it was very handy to be able to close extra windows I had open that I no longer need when I was in the present windows effect looking at what I need to be doing next. Makes it very simple to clean up a workspace.

Just throwing this out there, thanks whoever added these simple nice features.

by Jeremy Whiting (noreply@blogger.com) at October 11, 2014 08:18 PM

October 07, 2014

Jeremy Whiting

Plasma Next Improvements and KApplication -> QApplication gotchas

tldr, if you port from KApplication to QApplication, remove the %i from the Exec lines of your .desktop files.

Hey all, so I've been running plasma-next on my main development machine for a month or two now and have definitely enjoyed the speed at which improvements come. Just in time for plasma 5.1 release timezone support was added to the digital clock, so you can choose which timezones you want to see, use your mouse scrollwheel to change which one is visible, and so on. Also because our translators are awesome we got a last minute change in after the string freeze with their permission to show all the configured timezones in the clock's tooltip (similar but not identical to how it worked in kdelibs4 based plasma times). I enjoy the new Breeze theme and icons, and the alternatives system for switching between different k-menus, different task managers (I'm currently enjoying the icons only one) etc. is so handy.

In other news something that has been bothering me for a while is that okular just didn't want to get launched from any visual launcher. Clicking on a pdf in dolphin acted like it was launching, but no Okular ui would appear. Clicking view book on pdf books in Calibre would do likewise. So I spent a couple of days adding debug messages to kinit and klauncher trying to figure out what was going on. Kate launches just fine, so I tried copying the kate.desktop to okularApplication_pdf.desktop and replaced kate with okular etc. and that worked fine also.

So today I asked Albert if he had any ideas and got thinking it had to be something in the .desktop files themselves. So I uncommented another qDebug line in klauncher that said exactly what it was asking kinit to start and found it using "/usr/local/bin/okular blah.pdf --icon okular. So I tried the same from a terminal and found that okular's binary failed to launch because it doesn't understand the --icon parameter. A bit of digging found that KApplication handled that argument, while QApplication doesn't and the frameworks port of okular like a good example ported from KApplication to QApplication already.
Klauncher puts --icon blah in when you have %i in the Exec line of the .desktop file. So if you port from KApplication to QApplication, be sure to remove the %i from the .desktop files of your application also.

by Jeremy Whiting (noreply@blogger.com) at October 07, 2014 10:56 PM

October 06, 2014

Alban Crequy

Improving the security of D-Bus

In the last months, I have been working on improving the security of D-Bus, mainly to make it more resistant to denial of service attacks. This work was sponsored by Collabora.

Eight security issues were discovered, fixed and attributed a CVE. They were found by looking at the source code (in D-Bus and Linux' af_unix implementation), checking existing issues in the D-Bus bugzilla and a bit of luck.

Security issues fixed in D-Bus

  • CVE-2014-3477(Bug #78979): dbus-daemon sent an AccessDenied error to the service instead of a client when the client is prohibited from accessing the service, which allowed local users to cause a denial of service (initialization failure and exit) or possibly conduct a side-channel attack via a D-Bus message to an inactive service.
  • CVE-2014-3532(Bug #80163): when running on Linux 2.6.37-rc4 or later, local users could cause a denial of service (system-bus disconnect of other services or applications) by sending a message containing a file descriptor, then exceeding the maximum recursion depth before the initial message is forwarded.
  • CVE-2014-3533(Bug #80469): dbus-daemon allowed local users to cause a denial of service (disconnect) via a certain sequence of crafted messages that caused the dbus-daemon to forward a message containing an invalid file descriptor.
  • CVE-2014-3635(Bug #83622): an off-by-one error in dbus-daemon allowed remote attackers to cause a denial of service (dbus-daemon crash) or possibly execute arbitrary code by sending one more file descriptor than the limit, which triggered a heap-based buffer overflow or an assertion failure.
  • CVE-2014-3636(Bug #82820): a denial-of-service vulnerability in dbus-daemon allowed local attackers to prevent new connections to dbus-daemon, or disconnect existing clients, by exhausting descriptor limits.
  • CVE-2014-3637(Bug #80559): malicious local users could create D-Bus connections to dbus-daemon which could not be terminated by killing the participating processes, resulting in a denial-of-service vulnerability.
  • CVE-2014-3638(Bug #81053): dbus-daemon suffered from a denial-of-service vulnerability in the code which tracks which messages expect a reply, allowing local attackers to reduce the performance of dbus-daemon.
  • CVE-2014-3639(Bug #80919): dbus-daemon did not properly reject malicious connections from local users, resulting in a denial-of-service vulnerability.

Other fixes

In addition to fixing specific bugs, I also explored ideas to restrict the number of D-Bus connections a process or a cgroup could create. After discussions with upstream, those ideas were not retained upstream. But while working on cgroups, my patch for parsing /proc/pid/cgroup was accepted in Linux 3.17.

Identify bogus D-Bus match rules

D-Bus security issues are not all in dbus-daemon: they could be in applications misusing D-Bus. One common mistake done by applications is to receive a D-Bus signal and handle it without checking it was really sent by the expected sender. It seems impossible to check the code of all applications potentially using D-Bus in order to see if such a mistake is done. Instead of looking the code of random applications, my approach was to add a new method GetAllMatchRulesin dbus-daemon to retrieve all match rules and look for suspicious patterns. For example, a match rule for NameOwnerChanged signals that does not filter on the sender of such signals is suspicious and it worth checking the source code of the applications to see if it is legitimate. With this method, I was able to fix bugs in Bluez, ConnMan, Pacrunner, Ofonoand Avahi.

GetAllMatchRulesis released in dbus 1.9.0 and it is now possible to try it without recompiling D-Busto enable the feature. I have used a script to tell me which processes register suspicious match rules. I would like if there was a way to do that in a graphical interface. It's not ready yet, but I started a patch in D-Feet.

by Alban Crequy (noreply@blogger.com) at October 06, 2014 04:19 PM

October 01, 2014

Philip Withnall

Dynamic relocs, runtime overflows and -fPIC

While merrily compiling something a little while ago, my linker threw me this gem of an error message (using GNU gold):

error: libmumble.a(libmumble.o): requires dynamic R_X86_64_PC32 reloc against 'g_strdup' which may overflow at runtime; recompile with -fPIC

or, if you’re using GNU ld (the two linkers have different error messages for the same problem):

error: mumble.o: relocation R_X86_64_PC32 against symbol `g_strdup' can not be used when making a shared object; recompile with -fPIC

I recompiled everything with -fPIC, and magically the problem went away. But I didn’t understand why. I finally got a bit of time to investigate, so here we go.

tl;dr: This is caused by linking a shared library (which requires position-independent code, PIC) to a static library (which has not been compiled with PIC). You need to either link the shared library against a shared version of the static code (such as is produced automatically by libtool), or re-compile the static library with PIC enabled (-fPIC or -fpic).

To understand this, we need a brief introduction to the different types of linking, and how static objects and libraries differ from shared (or dynamic) objects and libraries. Let’s run with a minimal working example: two C files, shared.c and static.c. static.c is compiled to a static archive, libstatic.a (without position-independent code, PIC), and shared.c is compiled to a shared object, libshared.so, which links against libstatic.a.

What is a static object? It’s one where all symbol references are resolved at compile time. What’s a dynamic object? One where symbol references can be resolved at runtime. This means that dynamic objects have to have relocations performed as they’re loaded, which incurs a load-time penalty, but allows for shared libraries and symbol interpositing.

It is these relocations which cause the problem hinted at by the error message above. Each relocation is effectively a note to the runtime loader instructing it to replace a symbol reference in the dynamic object being loaded, with an address calculated at load time.

There are various types of relocations, defined by the platform ABI, as they are specific to the processor’s instruction set. For a more in-depth account of them, see Relocations, Relocations by Michael Guyver. In this case, the R_X86_64_PC32 relocation was chosen by the compiler, which is defined by the AMD64 ABI (Table 4.10). What does that mean? Each relocation type is essentially a mathematical function to define the address of a relocated symbol, given the information in various symbol, section and relocation tables in the dynamic object. The ABI defines R_X86_64_PC32 as \(S+A-P\). Less succinctly, it is the offset of the referenced symbol, plus a constant adjustment (the addend) minus the offset of the relocation. This is all explained brilliantly by Michael Guyver on his blog.

So, with our example, we get the error:

$ make libshared.so
cc -Wall -c -o shared.o shared.c
cc -Wall -c -o static.o static.c
ar rcs libstatic.a static.o
cc -shared -o libshared.so shared.o libstatic.a
/usr/bin/ld: error: shared.o: requires dynamic R_X86_64_PC32 reloc against 'my_static_function' which may overflow at runtime; recompile with -fPIC
collect2: error: ld returned 1 exit status
make: *** [libshared.so] Error 1

If we look at the disassembly of the shared object:

$ objdump -d shared.o

shared.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <my_shared_function>:
   0:    55                       push   %rbp
   1:    48 89 e5                 mov    %rsp,%rbp
   4:    e8 00 00 00 00           callq  9 <my_shared_function+0x9>
   9:    5d                       pop    %rbp
   a:    c3                       retq

we can see at offset 4 that the callq instruction (calling my_static_function()) leaves 4 bytes for the address of the function to call (actually, callq is instruction-pointer-relative, so the 4 bytes are for the offset of the function from the RIP register).

As the code in libstatic.a is not PIC, it has to be loaded at a fixed offset in a process’ address space. The shared library, libshared.so, must be capable of being loaded anywhere in an address space. This would be fine if the callq instruction could take an absolute address to call, as the linker could substitute in the absolute address of my_static_function() (as is done on 32-bit systems). However, it cannot – it only has 4 bytes of operand to play with, rather than the 8 needed for a 64-bit address – so linking has to fail. And that’s why we get an error which talks about overflow.

What happens if libstatic.a is compiled with PIC enabled? Not a whole lot changes, actually. The disassembly of libstatic.a remains unchanged. shared.o gains a global object table (GOT) section and its relocation for the my_static_function() call changes from R_X86_64_PC32 to R_X86_64_PLT32 — a procedure linkage table (PLT) relocation using the GOT. We can see that in action in the disassembly of the successfully-linked libshared.so (with irrelevant bits omitted):

$ objdump --disassemble libshared.so 

libshared.so:     file format elf64-x86-64


Disassembly of section .plt:

00000000000005f0 <my_static_function@plt>:
 5f0:    ff 25 fa 13 00 00        jmpq   *0x13fa(%rip)        # 19f0 <_GLOBAL_OFFSET_TABLE_+0x28>
 5f6:    68 02 00 00 00           pushq  $0x2
 5fb:    e9 c0 ff ff ff           jmpq   5c0 <_init+0x20>

Disassembly of section .text:

00000000000006e8 <my_shared_function>:
 6e8:    55                       push   %rbp
 6e9:    48 89 e5                 mov    %rsp,%rbp
 6ec:    e8 ff fe ff ff           callq  5f0 <my_static_function@plt>
 6f1:    5d                       pop    %rbp
 6f2:    c3                       retq   
 6f3:    90                       nop

00000000000006f4 <my_static_function>:
 6f4:    55                       push   %rbp
 6f5:    48 89 e5                 mov    %rsp,%rbp
 6f8:    5d                       pop    %rbp
 6f9:    c3                       retq   
 6fa:    66 90                    xchg   %ax,%ax

Firstly, the callq instruction in my_shared_function() has acquired a non-zero operand. This is a constant offset from the instruction pointer at that instruction which references the entry for my_static_function() in the PLT, which we can see as my_static_function@plt in the .plt section. Rather than being the code for the my_static_function(), this is actually a ‘trampoline’ which loads the address of my_static_function() from the GOT, then jumps to it. The GOT is set up by the runtime loader, and allows for the address of my_static_function() to be changed; for example when relocating it, or when interpositing a different version using LD_PRELOAD. By default, the GOT entry for my_static_function() will point to the implementation in the .text section, as linked in from libstatic.a.

This trampolining through a PLT and GOT is the standard solution for producing position independent code, and demonstrates three things:

  • Exported functions incur a runtime cost (in the PLT) on every call. This can be eliminated for private symbols (or internal calls to public symbols, with -Bsymbolic), but not (easily) for public ones, as explained by Ian Lance Taylor. This cost is only three instructions; as they change control flow, they could be relatively expensive, but are probably also catered specifically for in modern superscalar 64-bit processors, as the majority of the code they execute will do indirect function calls this way. So the cost can be safely ignored for all but rather specific use cases.
  • Position independent code is easy to achieve, and the indirection it requires brings other benefits like the ever-useful LD_PRELOAD, used by developer tools everywhere.
  • Marking internal functions as static is important, because ELF exports functions by default, so internal function calls end up being indirected through the PLT if you omit the static modifier. (Though note that none of the functions here could have been marked as such, as they were all in different compilation units.)

So in summary:

  • The “requires dynamic R_X86_64_PC32 reloc against ‘mumble’ which may overflow at runtime; recompile with -fPIC” error is caused by attempting to link a shared library against a static object.
    • One solution is to compile a position-independent version of the static object. libtool does this automatically, so why aren’t you using libtool?
    • Another (highly related) solution is to link against a shared version of the static object.
  • This isn’t an issue on 32-bit systems because PIC is possible by default on those systems, since instruction operands are wide enough to contain absolute symbol addresses .
  • Compiling with position independent code introduces a procedure linkage table (PLT) and global offset table (GOT) for each object file, which are very hard to eliminate if you want to avoid the (small) function call overhead they introduce.
    • So you should avoid PIC if compiling for constrained targets like embedded devices.
    • But use it otherwise (e.g. on desktop systems) for the flexibility (the use of shared libraries!) and security (address space layout randomisation) it affords.

Source code for the example here is available on gitorious in the public domain.

by Philip Withnall at October 01, 2014 07:44 AM

September 17, 2014

Philip Withnall

Long live gnome-common? Macro deprecation

gnome-common is shrinking, as we’ve decided to push as much of it as possible upstream. We have too many layers in our build systems, and adding an arbitrary dependency on gnome-common to pull in some macros once at configure time is not helpful — there are many cases where someone new has tried to build a module and failed with some weird autotools error about an undefined macro, purely because they didn’t have gnome-common installed.

So, for starters:

What does this mean for you, a module maintainer? Nothing, if you don’t want it to. gnome-common now contains copies of the autoconf-archive macros, and has compatibility wrappers for them.

In the long term, you should consider porting your build system to use the new, upstreamed macros. That means, for each macro:

  1. Downloading the macro to the m4/ directory in your project and adding it to git.
  2. Adding the macro to EXTRA_DIST in Makefile.am.
  3. Ensuring you have ACLOCAL_AMFLAGS = -I m4 ${ACLOCAL_FLAGS} in your top-level Makefile.am.
  4. Updating the macro invocation in configure.ac; just copy out the shim from gnome-common.m4 and tidy everything up.

Here’s an example change for GNOME_CODE_COVERAGE ? AX_CODE_COVERAGE in libgdata.

It seems from the comments that there’s more discussion to be had about the best way to implement this. So hold off on these changes for the moment!

This is the beginning of a (probably) long road to deprecating a lot of gnome-common. Macros like GNOME_COMPILE_WARNINGS and GNOME_COMMON_INIT are next in the firing line — they do nothing GNOME-specific, and should be part of a wider set of reusable free software build macros, like the autoconf-archive. gnome-common’s support for legacy documentation systems (DocBook, anyone?) is also getting deprecated next cycle.

Comments? Get in touch with me or David King (amigadave). This work is the (long overdue) result of a bit of discussion we had at the Berlin DX hackfest in May.

by Philip Withnall at September 17, 2014 11:22 PM

September 08, 2014

Xavier Claessens

OpenGlucose: Again

Made progress this weekend on OpenGlucose. The GUI is still ugly but it has the info I want.

Important things on my wish-list, when I have time:

  1. Handling the units. My only device is mg/dl but other countries uses mmol/L. Since I’m living in Canada where they use mmol/L I should grab a new device with those units, so I’ll be able to compare logs and see how to know in which unit the device is configured.
  2. Make printable report, I’ve heard doctors like that. OTOH, I shouldn’t encourage using unofficial app for medical purpose.
  3. Support other FreeStyle devices. I’m pretty sure they all have the same kind of format so most of the parser should be reusable, I hope. I should be able to get a spare FreeStyle Freedom Lite in a few weeks.
  4. Publish an ubuntu package on a PPA.
  5. CSV export.
  6. Ideas?

I’m curious, does someone else own an InsuLinx and tried OpenGlucose yet? Or even tried to support another device?

 

openglucose

by xclaesse at September 08, 2014 04:14 PM

September 04, 2014

Nick Richards

First as tragedy

Nine years ago I drew some diagrams attempting to reverse engineer the coffee, milk and foam proportions available in the beverages served at Eat. I'm not going to pretend that this was particularly original work but when walking past one of their branches the other day I noticed some uncannily familiar design.

Eat Coffee Chart

Actually pretty decent communication, well done. Avoid the actual drinks there though.

by Nick Richards at September 04, 2014 04:58 PM

September 01, 2014

Marco Barisione

A web browser for the Raspberry Pi

As I previously mentioned, Collabora has been working with the Raspberry Pi Foundation on various projects including a web browser optimised for the Raspberry Pi.
Since the first beta release we have made huge improvements; now the browser is more responsive, it’s faster, and videos work much better (the first beta could play 640×360 videos at 0.5fps, now we can play 25fps 1280×720 videos smoothly). Some web sites are still a bit slow (if they are heavy on the JavaScript side), but there’s not much we can do for web sites that, even on my laptop with an Intel Core i7, use 100% of one of the cores for more than ten seconds!

The browser is based on Gnome Web (Epiphany) using WebKit 1 (i.e. the non-multi-process version of WebKit).

Our main achievements are:

  • More responsive UI and scrolling, even under heavy load (like when loading a page)
  • Progressive tiled rendering for smoother scrolling (as mobile browsers do)
  • Startup is three times faster
  • Avoid useless image format conversions
  • Better YouTube support, including on-demand load of embedded YouTube videos to make page load much faster
  • Hardware decoding of videos (through gst-omx)
  • Hardware scaling of videos (again, through gst-omx)
  • Reduction of the number of memory copies to play videos
  • Faster fullscreen playback using dispmanx directly (a bit buggy at the moment, we are working on it)
  • Memory and CPU friendly tab management
  • JavaScript JIT fixes for ARMv6
  • Disk image cache (decoded images are kept in memory mapped files in a cache, saving CPU)
  • Memory pressure handler support

No video displayed here? Watch the video on Youtube.
The Raspberry Pi web browser (mp4 video file)

To install the browser, just update your Raspbian and install the “epiphany-browser” package:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install epiphany-browser

Thanks to all the people at Collabora that, at some point or another, helped on this project: Julien Isorce, Emanuele Aina, ChangSeok Oh, Tomeu Vizoso, Pekka Paalanen, André Moreira Magalhaes, Derek Foreman, Gustavo Noronha, Danilo Cesar, Emilio Pozuelo Monfort and Jonny Lamb (I hope I haven’t forgotten anybody!).
Also thanks to the Raspberry Pi Foundation, and in particular to Eben Upton, for their commitment to making browsing on the Pi better, and to Ben Avison for his work on optimising pixman and libav for ARMv6.

Update: people have reported a few bugs since the release, in particular a problem with Raspbian configured to use 24-bit or 32-bit mode for graphics. We should be able to fix this in a week or so.
Another problem is that Vimeo videos stopped working. This seems to be due to a change made by Vimeo that broke playback also on other browsers and on Android.

by barisione at September 01, 2014 01:00 PM

August 27, 2014

Nicolas Dufresne

GStreamer gains V4L2 Mem-2-Mem support

Two years since my last post, that seems a long time, but I was busy becoming a GStreamer developer. This story started late in 2009. A team at Samsung (and many core Linux contributors) started adding new type of drivers to the Linux Media Infrastructure API (also known as Video4Linux 2). They introduced video decoding, encoding and video post-processing support through a class of drivers called memory-to-memory.

At the end of 2012, my employer, Collabora was chosen to implement a proof of concept, enabling hardware decoding support to the Cotton Candy, a USB stick size computer based on Samsung Exynos 4412 and built by FXI. The new element has been developed by Sebastien Dröge and was called mfcdec. All this being demonstration code, it never got close to being useful in production..

At the end of 2013, we got contracted again, to bring the demonstration code toward production code. At this point, we took the decision that we where no longer going to build an Exynos specific decoder, but instead re-use the existing GStreamer V4L2 support and do it the “right” way.

It took nearly three months, but with the help of my colleague Julien Isorce, we managed to upstream and ship hardware decoding support for the Cotton Candy. The new element is called v4l2videoNdec, where videoN is that name of the driver node (to allow having multiple decoder at the same time). The element was well suited for static pipeline and embedded applications, but not as flexible as software decoders for desktop.

At the beginning of 2014, we started a new project with Endless Mobile. This time, the goal was to do hardware accelerated decoding also on an Exynos 4412 platform, but in a desktop environment base on Gnome Shell. Two main issues had to be addressed. The buffer pool in GstV4l2 did not track it’s memory, and the color format produced by this decoder could not be color converter using GLES2 shader (not enough coordinate precision). We had to implement a custom memory allocator and rewrite most of the v4l2 buffer pool code. To handle the color format, we had to implement an element that wraps hardware video converter in order to obtain video frames in a format that can be uploaded to GLES2.

As of today, all this effort has landed into GStreamer and is now part of 1.4 GStreamer release. Some of my colleagues went even further by demonstrating during SIGGRAH the benefits of using V4L2 decoder when combining DMABUF and Wayland. Other team, including Pengutronix on Freescale CUDA and STE have started testing against this new promising decoder which finally brings a standard and low level way of decoding medias on Linux.

by Nicolas at August 27, 2014 07:55 PM

August 26, 2014

Xavier Claessens

OpenGlucose: continued

I started working on the UI to display the results:

openglucoseIt is made using a GtkApplicationWindow containing a WebkitWebView, the content is made with HTML/CSS/JS with jquery and the chart is made using jqplot.

To make testing easier, I also added a dummy device that has random data, it can be enabled by setting OPENGLUCOSE_DUMMY_DEVICE=1 in your env.

A lot more work is needed, but that’s a start.

by xclaesse at August 26, 2014 04:58 AM

August 17, 2014

Xavier Claessens

New project: OpenGlucose

Hello there,

I recently got diagnosed with a diabetes type 1. Like all diabetics, I got a glucometer device that comes with a windows/mac closed-source application. That’s clearly not acceptable for a freedom lover! So here is my new challenge: reverse-engineer the USB protocol of my Abbott FreeStyle InsuLinx device, and write an open source Linux application for it.

And here it is: https://github.com/xclaesse/OpenGlucose

So far it only fetch the bare minimum information from the device and print them in the terminal. More GUI/features will come later.

If you’re a geek diabetic, your help is welcome!

by xclaesse at August 17, 2014 01:59 AM

August 15, 2014

Olivier Crête

GNOME.Asia Summit 2014

Everyone has been blogging about GUADEC, but I’d like to talk about my other favorite conference of the year, which is GNOME.Asia. This year, it was in Beijing, a mightily interesting place. Giant megapolis, with grandiose architecture, but at the same time, surprisingly easy to navigate with its efficient metro system and affordable taxis. But the air quality is as bad as they say, at least during the incredibly hot summer days where we visited.

The conference itself was great, this year, co-hosted with FUDCon’s asian edition, it was interesting to see a crowd that’s really different from those who attend GUADEC. Many more people involved in evangelising, deploying and using GNOME as opposed to just developing it, so it allows me to get a different perspective.

On a related note, I was happy to see a healthy delegation from Asia at GUADEC this year!

Sponsored by the GNOME Foundation

by ocrete at August 15, 2014 04:50 AM

August 13, 2014

Tomeu Vizoso

Dynamic scaling of the memory bus


The problem


These days there's quite good support for CPU scaling in the mainline kernel, and many ARM SoCs are making use of it already. But in modern hardware with lots of very fast external memory, running the memory bus at its maximum frequency drastically reduces the amount of time that the device can run when on battery.

A problem that many teams are finding when trying to upstream their power management code is that there's currently no way for several clock consumers to influence the frequency of the memory bus. There has been a few tries to upstream the solutions currently in vendor trees, but so far no acceptable solution has been found.

I'm helping to upstream some of the stuff in the ChromeOS tree, and this issue is currently blocking very interesting work from reaching mainline.

The past


In the vendor tree for Tegra this is addressed by creating virtual clocks that are child of the clock that wants to be influenced. Depending on the type of the virtual clock, setting its rate will influence the rate of its parent clock by setting a floor or ceiling value.

In Qualcomm's vendor tree for the Snapdragon family of SoCs, the concept of a voter clock is introduced. Drivers can vote on the rate of a given clock by "voting" through a child clock, so not that different to how Tegra does it.

Both approaches have the critical disadvantage of adding clk instances for things that aren't real clocks, thus making the API considerably more confusing for relatively little gain.

Both vendor trees have additional API for registering bandwidth needs: tegra_isomgr and msm_bus_scale. They bear quite some resemblance with each other and with pm_qos_interface, but both are tightly tied to specificities of their platforms.

The discussion was brought back to life a couple of months ago when a patch was posted for allowing the tegra-drm driver to set the frequency rate of the external memory controller based on the amount of bandwidth that was needed by the display controller for refreshing the display. Of course, that patch was rejected because there are other components that need to have a say in the frequency rate of the memory bus.

But in that discussion some kind of plan took form and I have been working on making something from it that can be merged upstream.

A possible future


There's so far two main additions to existing frameworks, with the rationale being explained further below:
  • Add per-user floor and ceiling constraints to the Common Clock Framework, so drivers can set maximum and minimum frequency rates that the clock should respect. Patchset here.
  • Add a PM_QOS_MEMORY_BANDWIDTH class to pm_qos, for drivers to register their expected bandwidth needs. Patchset here.
The idea is for the following agents to be able to influence the current frequency of the memory bus:
  • Thermal: a cooling device would call clk_set_ceiling_rate to cap the memory bus to a frequency based on the current temperature.
  • Power: a battery driver would set a ceiling in the same way, based on the remaining capacity.
  • Devfreq: a devfreq driver wrapping a power management unit such as the ACTMON on Tegra or the PPMU on Exynos would set a floor frequency based on the current load stats.
  • Cpufreq: a cpufreq driver would set a floor frequency based on the current CPU frequency.
  • Devices that can anticipate how much memory bandwidth will need (such as the display controller, the camera, multimedia codecs, an ISP, USB, etc) would register their requirements in the PM_QOS_MEMORY_BANDWIDTH class. The EMC driver would be listening for notifications and setting a floor frequency based on the aggregated bandwidth that is needed.
The impression so far is that this approach matches the needs of the Tegra and Exynos SoCs, and people working on Rockchip upstreaming are evaluating it. Others working on other SoCs are very welcome to look at it and comment, so the result is also useful to them and they can improve their power management in mainline without having to refactor things later.

by Tomeu Vizoso (noreply@blogger.com) at August 13, 2014 03:36 PM

August 08, 2014

Frederic Plourde

Gecko on Wayland

266px-Wayland_Logo.svgAt Collabora, we’re always on the lookout for cool opportunities involving Wayland and we noticed recently that Mozilla had started to show some interest in porting Firefox to Wayland. In short, the Wayland display server is becoming very popular for being lightweight, versatile yet powerful and is designed to be a replacement for X11. Chrome and Webkit already got Wayland ports and we think that Firefox should have that too.

Some months ago, we wrote a simple proof-of-concept basically starting from actual Gecko’s GTK3 paths and stripping all the MOZ_X11 ifdefs out of the way. We did a bunch of quick hacks fixing broken stuff but rather easily and quickly (couple days), we got Firefox to run on Weston (Wayland official reference compositor). Ok, because of hard X11 dependencies, keyboard input was broken and decorations suffered a little, but that’s a very good start! Take a look at the below screenshot :)

firefox-on-wayland


by fredinfinite23 at August 08, 2014 01:44 PM

August 07, 2014

Frederic Plourde

Firefox/Gecko : Getting rid of Xlib surfaces

Over the past few months, working at Collabora, I have helped Mozilla get rid of Xlib surfaces for content on Linux platform. This task was the primary problem keeping Mozilla from turning OpenGL layers on by default on Linux, which is one of their long-term goals. I’ll briefly explain this long-term goal and will thereafter give details about how I got rid of Xlib surfaces.

LONG-TERM GOAL – Enabling Skia layers by default on Linux

My work integrated into a wider, long-term goal that Mozilla currently has : To enable Skia layers by default on Linux (Bug 1038800). And for a glimpse into how Mozilla initially made Skia layers work on linux, see bug 740200. At the time of writing this article, Skia layers are still not enabled by default because there are some open bugs about failing Skia reftests and OMTC (off-main-thread compositing) not being fully stable on linux at the moment (Bug 722012). Why is OMTC needed to get Skia layers on by default on linux ?  Simply because by design, users that choose OpenGL layers are being grandfathered OMTC on Linux… and since the MTC (main-thread compositing) path has been dropped lately, we must tackle the OMTC bugs before we can dream about turning Skia layers on by default on Linux.

For a more detailed explanation of issues and design considerations pertaining turning Skia layers on by default on Linux, see this wiki page.

MY TASK – Getting rig of Xlib surfaces for content

Xlib surfaces for content rendering have been used extensively for a long time now, but when OpenGL got attention as a means to accelerate layers, we quickly ran into interoperability issues between XRender and Texture_From_Pixmap OpenGL extension… issues that were assumed insurmountable after initial analysis. Also, and I quote Roc here, “We [had] lots of problems with X fallbacks, crappy X servers, pixmap usage, weird performance problems in certain setups, etc. In particular we [seemed] to be more sensitive to Xrender implementation quality that say Opera or Webkit/GTK+.” (Bug 496204)

So for all those reasons, someone had to get rid of Xlib surfaces, and that someone was… me ;)

The Problem

So problem was to get rid of Xlib surfaces (gfxXlibSurface) for content under Linux/GTK platform and implicitly, of course, replace them with Image surfaces (gfxImageSurface) so they become regular memory buffers in which we can render with GL/gles and from which we can composite using GPU. Now, it’s pretty easy to force creation of Image surfaces (instead of Xlib ones) for just all content layers in gecko gfx/layers framework, just force gfxPlatformGTK::CreateOffscreenSurfaces(…) to create gfxImageSurfaces in any case.

Problem is, naively doing so gives rise to a series of perf. regressions and sub-optimal paths being taken, for example, to copy image buffers around when passing them across process boundaries, or unnecessary copying when compositing under X11 with Xrender support. So the real work was to fix everything after having pulled the gfxXlibSurface plug ;)

The Solution

First glitch on the way was that GTK2 theme rendering, per design, *had* to happen on Xlib surfaces. We didn’t have much choice as to narrow down our efforts to the GTK3 branch alone. What’s nice with GTK3 on that front is that it makes integral use of cairo, thus letting theme rendering happen on any type of cairo_surface_t. For more detail on that decision, read this.

Upfront, we noticed that the already implemented GL compositor was properly managing and buffering image layer contents, which is a good thing, but on the way, we saw that the ‘basic’ compositor did not. So we started streamlining basic compositor under OMTC for GTK3.

The core of the solution here was about implementing server-side buffering of layer contents that were using image backends. Since targetted platform was Linux/GTK3 and since Xrender support is rather frequent, the most intuitive thing to do was to subclass BasicCompositor into a new X11BasicCompositor and make it use a new specialized DataTextureSource (that we called X11DataTextureSourceBasic) that basically buffers upcoming layer content in ::Update() to an gfxXlibSurface that we keep alive for the TextureSource lifetime (unless surface changes size and/or format).

Performance results were satisfying. For 64 bit systems, we had around 75% boost in tp5o_shutdown_paint, 6% perf gain for ‘cart’, 14% for ‘tresize’, 33% for tscrollx and 12% perf gain on tcanvasmark.

For complete details about this effort, design decisions and resulting performance numbers, please read the corresponding bugzilla ticket.

To see the code that we checked-in to solve this, look at those 2 patches :

https://hg.mozilla.org/mozilla-central/rev/a500c62330d4

https://hg.mozilla.org/mozilla-central/rev/6e532c9826e7

Cheers !

 


by fredinfinite23 at August 07, 2014 08:21 PM

July 25, 2014

Pekka Paalanen

Wayland protocol design: object lifespan

Now that we have a few years of experience with the Wayland protocol, I thought I would put some of my observations in writing. This, what will hopefully become a series rather than just one post, considers how to design Wayland protocol extensions the right way.

This first post considers protocol object lifespan and the related races between the compositor/server and the client. I assume that the reader is already aware of the Wayland protocol basics. If not, I suggest reading Chapter 4. Wayland Protocol and Model of Operation.

How protocol objects are created

On a new Wayland connection, the only object that exists is the wl_display which is a specially constructed object. You always have it, and there is no wire protocol for creating it.

The only thing the client can create next is a wl_registry through the wl_display. Registry is the root of the whole interface (class) hierarchy. Wl_registry advertises the global objects by numerical name, and using wl_registry.bind request to bind to a global is the first normal way to create a protocol object.

Binding is slightly special still, as the protocol specification in XML for wl_registry uses the new_id argument type, but does not specify the interface (class) for the new object. In the wire protocol, this special argument gets turned into three arguments: interface name (string), interface version (uint32_t), and the new object ID (uint32_t). This is unique in the Wayland core protocol.

The usual way to create a new protocol object is for the client to send a request that has a new_id type of argument. The protocol specification (XML) defines what the interface is, so there is no need to communicate the interface type over the wire. All that is needed on the wire is the new object ID. Almost all object creation happens this way.

Although rare, also the server may create protocol objects for the client. This happens by having a new_id type of argument in an event. Every time the client receives this event, it receives a new protocol object.

As all requests and events are always part of some interface (like a member of a class), this creates an interface hierarchy. For example, wl_compositor objects are created from wl_registry, and wl_surface objects are created from wl_compositor.

Object creation never fails. Once the request or event is sent, the new objects it creates exists, period. This keeps the protocol asynchronous, as there is no need to reply or check that the creation succeeded.

How protocol objects are destroyed

There are two ways to destroy a protocol object. By far the most common one is to have a request in the interface that is specified to be a destructor. Most often this request is called "destroy". When the client code calls the function wl_foobar_destroy(), the request is sent to the server and the client side proxy (struct wl_proxy) for the object gets destroyed. The server then handles the destructor request at some point in the future.

The other way is to destroy the object by an event. In that case, no destructor must be defined in the interface's protocol specification, and the event must be clearly documented to be destructive as there is no automation nor safeties for this. This is for cases where the server decides when an object dies, and requires extreme care in protocol design to work right in all cases. When a client receives such an event, all it can do is destroy the proxy. The (in)famous example of an interface like this is wl_callback.

Enter the boogeyman: races

It is very important that both the client and the server agree on which protocol objects exist. If the client sends a request on, or references as an argument, an object that does not exist in the server's opinion, the server raises a protocol error, and disconnects the client. Obviously this should never happen, nor should it happen that the server sends an event to an object that the client destroyed.

Wayland being a completely asynchronous protocol, we have no implicit guarantees. The server may send an event at the same time as the client destroys the object, and now the event targets an object the client does not know about anymore. Rather than the client shooting itself dead (that's the server's job), we have a trick in libwayland-client: it silently ignores events to destroyed objects, until the server confirms that the object is truly gone.

This works very well for interfaces where the destructor is a request. If the client first sends the destructor request and then sends another request on the destroyed object, it just shot its own head off - no race needed.

Things get tricky for the other case, destructor events. The server may send the destructor event at the same time the client is sending a request on the same object. When the server finally gets the request, the object is already gone, and the client gets taken behind the shed and shot. Therefore pretty much the only safe way to use destructor events is if the interface does not define any requests at all. Ever, not even in future extensions. Furthermore, objects with that interface should not be used as arguments anywhere, or you may hit the race. That is why destructor events are difficult to use right.

The boogeyman's brother

There is yet another nasty race with events that create objects, i.e. server-created objects. If the client is destroying the (parent) object at the same time as the server is sending an event on that object, creating a new (child) object, the server cannot know if the client actually handled the event or not. If the client ignored the event, it will never tell the server to destroy that new object, and you leak in the server.

You could try to make your way out of that pitfall by writing in your protocol specification, that when the (parent) object is destroyed, all the child objects will be destroyed implicitly. But then the client must not send the destructor request for the child objects after it has destroyed the parent, because otherwise the server sees requests on objects it does not know about, and kicks you in the groin, hard. If the child interface defines a destructor, the client cannot destroy its proxies after destroying the parent object. If the child interface does not define a destructor, you can never free the server-side resources until the parent gets destroyed.

The client could destroy all the child objects with a defined destructor in one go, and then immediately destroy the parent object. I am not sure if that works, but it might. If it does not, you have to specify a whole tear-down protocol sequence. The client tells the server it wants to destroy the parent object, the server acks and guarantees it no longer sends any events on it, then the client actually destroys the parent object. Hey, you have a round-trip and just turned a beautiful asynchronous protocol into synchronous, congratulations!

Concluding with recommendations

Here are my recommendations when designing Wayland protocol extensions:
  • Always make sure there is a guaranteed way to destroy all objects. This may sound obvious, but we have fixed several cases in the Wayland core protocol where there was no way to destroy a created protocol object such, that all resources on both server and client side could be freed. And there are still some cases not fixed.
  • Always define a destructor request. If you have any doubt whether your new interface needs a destructor request, just put it there. It is more awkward to add later than normal requests. If you do not have one, the client cannot tell the server to free those protocol object resources.
  • Do not use destructor events. They are hard to design right, and extending the interface later will be a bitch. The client cannot tell the server to free the resources, so objects with destructor events should be short-lived, and the destruction must be guaranteed.
  • Do not use server-side created objects without a serious thought. Designing the destruction sequence such that it never leaks nor explodes is tricky.

by pq (noreply@blogger.com) at July 25, 2014 07:01 PM

July 10, 2014

Jeremy Whiting

Plasma Next is pretty darn stable

Today I wanted to share some of my experiences with using Plasma Next for the past couple of weeks. Since I had been working on some frameworks development (just a small bit here and there), I thought I'd try running Plasma Next a couple of weeks ago to see how things were coming along and to be able to work on and test some things I helped with back in KDE 4.0 days.

I have to say I'm very impressed with the stability.  I hit two issues since then, and one of the issues has been fixed. The issue I hit that has already been fixed was a crash in yakuake or konsole when closing a tab that caused the whole application to crash. I looked into the Konsole codebase with Eike Hein's guidance, but Argonel ultimately found the best patch to fix the problem.
The second issue I hit with Plasma Next has to do with disconnecting and reconnecting an external monitor. I don't do that very often at all, but when I tried last weekend I got a variety of issues. For example sometimes when disconnecting Plasma (or maybe KSmServer?) crashes and I am taken back to the sddm login screen. Other times when connecting my external screen my panel ends up floating on the external monitor but nothing on it is clickable.

I just realized this post probably sounds like a rant or complaining about Plasma Next, but that's not what I intended at all. The main point I wanted to get across is that I haven't used Plasma more than once in the past 2 weeks since Plasma Next is stable enough for my usage.

Now for the obligatory desktop screenshot:

So good job to all the people that have worked on this new iteration of the Plasma Desktop.

P.S. One other minor thing I miss from plasma is the ability to show multiple timezone's times in the clock's tooltip. I'll see if I can get that fixed though. :)

by Jeremy Whiting (noreply@blogger.com) at July 10, 2014 04:21 AM

July 09, 2014

Jeremy Whiting

Qt5/KDE Frameworks porting steps

As I said in my last post I would elaborate about how porting of libkeduvocdocument (name pending currently) from Qt4 and kdelibs4 to Qt5 and KDE Frameworks happened.

Commits can be seen here but it went like this:
1. Change CMakeLists.txt to look for frameworks and Qt5 packages.
2. Try to build, fix any errors. All while checking the Porting Notes.
3. Port away from deprecated methods.
4. Port away from kdelibs4support.

I forget which part of the above involved each of these, but this is what was changed:
Ported from KUrl to QUrl.
Ported from KStandardDirs to QStandardPaths
Ported from KGlobal::locale() to QLocale
Ported away from other deprecated methods and classes.

So rinse and repeat until it's in a state where you are happy with it.
Note that step 4 above isn't strictly necessary, and is similar to porting Qt4 applications away from Qt3Support (Some kde4 applicationss never were ported away from Qt3Support sadly...) Yes KMouth, I'm looking at you.

by Jeremy Whiting (noreply@blogger.com) at July 09, 2014 05:42 AM

July 07, 2014

Jeremy Whiting

Libkeduvocdocument Qt5/KDE Frameworks port

Hello all. Yes I'm still alive. Yes I'm still doing KDE stuff as I find time or make time. I'll report in the next few posts about what is happening and where we are going.
One of the things that happened recently was the port of libkeduvocdocument to Qt5 and frameworks. Vishesh started the effort, and I completed it with some review by Aleix Pol. It was decided as documented here that since libkdeedu only contains libkeduvocdocument it should be split up. Upon further investigation we realized that the other parts of libkdeedu are not used anymore. Besides the icons subfolder that are still looking for a home, the rest of the git repo is only libkeduvocdocument related, so we decided to just rename the git repository for the frameworks and going forward release. So the libkdeedu git repository holds the kde sc 4 codebase, while the libkeduvocdocument git repo holds the qt5 and frameworks based code. Both contain the history so all history is preserved. 

I'll write next time about the steps taken to port the library to Qt5 and KDE Frameworks.

by Jeremy Whiting (noreply@blogger.com) at July 07, 2014 01:49 AM

June 25, 2014

Emilio Pozuelo Monfort

Firefox and GTK+ 3

Lately at Collabora I have been working on helping Mozilla with the GTK+ 3 port of Firefox.

The problem

The issue we had to solve is that GTK+ 2 and GTK+ 3 cannot be loaded in the same address space. Moving Firefox from GTK+ 2 to GTK+ 3 isn’t a problem, as only GTK+ 3 gets loaded in its address space, and everything is fine. The problem comes when you load a plugin that links to GTK+ 2, e.g. Flash. Then, GTK+ 2 and GTK+ 3 get both loaded, GTK+ detects that, and aborts to avoid bigger problems. This was tracked as bug #624422.

More specifically, Firefox links to libxul.so, which in turn links to GTK+. These days, the plugins are loaded in a separate process, plugin-container, which communicates with the Firefox process through IPC. If plugin-container didn’t link to GTK+, there would be absolutely no problem, as the browser (Firefox) process could link to GTK+ 3 and plugin-container could load any plugin, including GTK+ 2 ones. However, although plugin-container doesn’t directly use GTK+, it links to libxul.so for IPC, which brings GTK+ into its address space.

The solution

In order to solve this, we evaluated various options. The first one was to split libxul.so in two parts, one with the IPC code and lower level stuff, which wouldn’t link to GTK+, and another side with the rest of the code, including all the widget and toolkit integration, which would obviously link to GTK+. However this turned not to be possible as the libxul code was too intricate.

In the end, we decided to add a thin layer between libxul and GTK+, which we called libmozgtk.so. This small layer links to GTK+ 3, and provides stubs for GTK+ 2 specific symbols. Additionally, there is a libmozgtk2.so with SONAME “libmozgtk.so”, which links to GTK+ 2 and provides stubs for GTK+ 3 symbols. We made libxul link against libmozgtk.so, and so when Firefox runs, libxul.so, libmozgtk.so, and GTK+ 3 are loaded, and Firefox uses GTK+ 3. However when plugin-container is executed, we add LD_PRELOAD=libmozgtk2.so in the environment. Since libmozgtk2.so has a libmozgtk.so SONAME, the libxul.so dependency is satisfied, and the plugin-container process ends with GTK+ 2. Since plugin-container doesn’t make use of the GTK+ code in libxul, this is safe, and we end up with a GTK+ 3 Firefox that can load GTK+ 2 plugins. The end result is that you can watch Youtube videos again!

While this solution is somewhat hacky, it means we didn’t need to mess with libxul, splitting it in two just for the Linux/GTK+ port’s sake. And when the GTK+ 2 plugins become irrelevant, or NPAPI support is removed (as it recently happened in Chrome), we should be able to easily revert this and use GTK+ 3 everywhere.

Wayland

On an unrelated note, we have looked a bit at porting Firefox to Wayland. Wayland is designed to be a replacement for X11, and is becoming very popular in the digital TV and set top box space. Those obviously need HTML engines and web browsers, and with WebKit and Chrome already having Wayland ports, we think Firefox shouldn’t fall behind.

For this, the GTK+ 3 port was a prerequisite, but that isn’t enough. There are many X11 uses on the Firefox codebase, most of which are guarded by #ifdef MOZ_X11, though not all of them are. We got Firefox to start on Weston (the Wayland reference compositor) with a bunch of hacks, one of which broke keyboard input (but avoided a segfault). As you can see from the screenshot, things aren’t perfect, but it’s at least a good start!

Firefox running on Weston

by Emilio Pozuelo Monfort at June 25, 2014 09:25 AM

June 21, 2014

Nick Richards

Pinpoint COPR Repo

A few years ago I worked with a number of my former colleagues to create Pinpoint, a quick hack that made it easier for us to give presentations that didn't suck. Now that I'm at Collabora I have a couple of presentations to make and using pinpoint was a natural choice. I've been updating our internal templates to use our shiny new brand and wanted to use some newer features that weren't available in Fedora's version of pinpoint.

There hasn't been an official release for a little while and a few useful patches have built up on the master branch. I've packaged a git snapshot and created a COPR repo for Fedora so you can use these snapshots yourself. They're good.

by Nick Richards at June 21, 2014 08:59 PM

June 20, 2014

Philip Withnall

gobject-introspection gets a development mailing list

Public service announcement: if you’re a bindings author, or are otherwise interested in the development of GIR annotations, the GIR format or typelib format, please subscribe to the gir-devel-list mailing list. It’s shiny and new, and will hopefully serve as a useful way to announce and discuss changes to GIR so that they’re suitable for all bindings.

Currently under discussion (mostly in bug #719966): changes to the default nullability of gpointer nodes, and the addition of a (never-null) annotation to complement (nullable).

by Philip Withnall at June 20, 2014 01:13 PM

June 10, 2014

George Kiagiadakis

GStreamer on wayland with GTK+

During the past few months I’ve been occasionally working on integrating GStreamer better with wayland. GStreamer already has a ‘waylandsink’ element in gst-plugins-bad, but the implementation is very limited. One of the things I’ve been working on was to add GstVideoOverlay support in it, and recently, I managed to make this work nicely embedded in a GTK+ window.

GStreamer Wayland GTK Demo

gtk+ video player demo running on weston

I’m happy to say that it works pretty well, even though GTK does not support wayland sub-surfaces, which was being thought of as a problem initially. It turns out there is no problem with that, and both the GTK and GstVideoOverlay APIs are sufficient to make this work. However, there needs to be a small addition in GstVideoOverlay to allow smooth resizing. Currently, I have a GstWaylandVideo API that includes those additions.

This essentially means that as soon as this work is merged (hopefully soon), there is nothing stopping applications like totem from being ported to wayland :D

I believe embedding waylandsink in Qt should also work without any problems, I just haven’t tested it.

If you are interested, check the code. The code of the above running demo can also be found here, and the ticket for merging this branch is being tracked here.

I should say many thanks here to my employer, Collabora, for sponsoring this work.


by gkiagia at June 10, 2014 12:24 PM

June 05, 2014

Nick Richards

Suburbs Driven by Trains Not Cars

I was reading Maciej Cegłowski's excellent talk, 'The Internet With A Human Face' and aside from it's persuasive argument found something interesting in the tension between his argument rooted in American reality and his German audience. I've also recently been reading 'Concretopia', a book about the post war rebuilding of Britain and realised that there, in the gap between the wars was the suburbia he wasn't aware of. I am of course talking of Metroland, where I grew up.

Clouds, Pond, Trees

This environment was enabled by a fascinating business model where the railway company created their own demand. They built houses in the middle of nowhere near London where transport to work required a train journey, giving themselves ongoing revenue and financing the building of the railway (and more). Most recently London has seen a similar excercise in commercial demand creation with Arsenal financing their new stadium (and a bit more) with property development on the old stadium.

As such Metro-land was inherently commuter based from the beginning and unsurprisingly it remains so today, although the nearby presence of the M25 has drawn off some of the rail traffic that existed before. But what's different about a train driven suburbia? It's still strip development, as the logic of the tracks wants a direct and ideally flat route. However it also leads to a more freeway style of node location due to the distance required between stations if the network is to be efficient. There is a greater density of housing than you'd expect of car driven suburbia as well since most houses needed to be within walking or cycling range of the station. This distinctive landscape of small towns and villages where there's nothing to do is characteristic of Metro-land. A parade is different to a strip mall, but they're two sides of the same coin.

All that aside, the artistic flowering of this railway suburbia wasn't just in the lovely graphic design and dead on branding of the Metropolitan Railway company, the poetry of Betjeman, music of Elton John and writing of J.G Ballard are just as distinctively suburban and just as meaningful. In the end railway suburbia is fundamentally different and it comes down to a key phrase in Maciej's talk:

When everyone has a car, it means you can't get anywhere without one. Instead of freeing you, the car becomes a cage.

The railway isn't a cage, it's a highway to possibility.

by Nick Richards at June 05, 2014 07:40 PM

Pekka Paalanen

From pre-history to beyond the global thermonuclear war

This is a short and vague glimpse to the interfaces that the Linux kernel offers to user space for display and graphics management, from the history to what is hot and new, to what might perhaps be coming after. The topic came current for me when I started preparing Weston for global thermonuclear war.

The pre-history


In the age of dragons, kernel mode setting did not exist. There was only user space mode setting, where the job of the kernel driver (if any) was simply to give user space direct access to the graphics card registers. A user space driver (well, Xorg video DDX, really, err... or what it was at the time of XFree86) would then poke the card registers to set a mode. The kernel had no idea of anything.

The kernel DRM infrastructure was started as an out-of-tree kernel module for cooperating between multiple programs wanting to access the graphics card's resources. Later it was (partially?) merged into the kernel tree (the year is a lie, 2.3.18 came out in 1999), and much much later it was finally deleted from the libdrm repository.

The middle age


For some time, the kernel DRM existed alongside user space mode setting. It was a dark time full of crazy hacks to keep it all together with duct tape, barbwire and luck. GPUs and hardware accelerated OpenGL started to come up.

The new age


With the invent of kernel mode setting (KMS), the DRM kernel drivers got in charge of the graphics card resources: outputs, video modes, memory allocations, hotplug! User space mode setting became obsolete and was eventually killed. The kernel driver was finally actually in control of the graphics hardware.

KMS probably started with just setting the main framebuffer (primary plane) for each "CRTC" and programming the video mode. A CRTC is for "cathode-ray tube controller", but essentially means a block that reads memory (a framebuffer) and produces a bitstream according to video mode timings. The bitstream is directed into an "encoder", which turns it into a proper physical/analogue signal, like VGA or digital DVI. The signal then exits the graphics card though a "connector". CRTC, encoder, and connector are the basic concepts in KMS API. Quite often these can be combined in some restricted ways, like a single CRTC feeding two encoders for clone mode.

Even ancient hardware supported hardware cursors: a small sprite that was composited into the outgoing video signal on the fly, which meant that it was very cheap to move around. Cursor being so special, and often with funny color format (alpha!), got its very own DRM ioctl.

There were also hardware overlays (additional or secondary planes) on some hardware. While the primary framebuffer covers the whole display, an overlay is another buffer (just like the cursor) that gets mixed into the bitstream at the CRTC level. It is like basic compositing done on the scanout hardware level. Overlays usually had additional benefits, for example they could apply scaling or color space conversion (hello, video players) very efficiently. Overlays being different, they too got their very own DRM ioctls.

The KMS user space ABI was anything but atomic. With the X11 tradition, it wasn't too important how to update the displays, as long as the end result eventually was what you wanted. Race conditions in content updates didn't matter too much either, as X was racy as hell anyway. You update the CRTC. Then you update each overlay. You might update the cursor, too. By luck, all these updates could hit the same vblank. Or not. Or you don't hit vblank at all, and get tearing. No big deal, as X was essentially all about front-buffer rendering anyway. (And then there were huge efforts in trying to fix it all up with X, GLX, Mesa and GL-compositors, and avoid tearing, and it ended up complicated.)

With the advent of X compositing managers, that did not play well with the  awkward X11 protocol (Xv) or the hardware overlays, and with rise of the  GPU power and OpenGL, it was thought that hardware overlays would  eventually die out. Turned out the benefits of hardware overlays were too great to abandon, and with Wayland we again have a decent chance to make the most of them while still enjoying compositing.

The global thermonuclear war (named after a git branch by Rob Clark)


The quality of display updates became important. People do not like tearing. Someone actually wanted to update the primary framebuffer and the overlays on the same vblank, guaranteed. And the cursor as the cherry on top.

We needed one ABI to rule them all.

Universal planes brings framebuffers (primary planes), overlays (secondary planes) and cursors (cursor planes) together under the same API. No more type specific ioctls, but common ioctls shared by them all. As these objects are still somewhat different, overlays having wildly differing features and vendors wanting to expose their own stuff, object properties were invented.

An object property is essentially a {key, value} pair. In the API, the name of a key is a string. Each object has its own set of keys. To use a key, you must know it by name, fetch the handle, and then use the handle when setting the value. Handles seem to be per-object, so make sure to fetch them separately for each.

Atomic mode setting and nuclear pageflip are two sides of the same feature. Atomicity is achieved by gathering a set of property changes, and then pushing them all into the kernel in a single ioctl call. Then that call either succeeds or fails as a whole. Libdrm offers a drmModePropertySet for gathering the changes. Everything is exposed as properties: the attached FB, overlay position, video mode, etc.

Atomic mode setting means setting the output modes of a single graphics device, more or less. Devices may have hard to express limitations. A simple example is the available scanout memory bandwidth: You can drive either two mid-resolution outputs, or one high-resolution output. Or maybe some crtc-encoder-connector combination is not possible with a particular other combination for another output. Collecting the video mode, encoder and connector setup over the whole grahics card into a single operation avoids flicker. Either the whole set succeeds, or it fails. Without atomic mode setting, changing multiple outputs would not only take longer, but if some step failed, you'd have to undo all earlier steps (and hope the undo steps don't fail). Plus, there would be no way to easily test if a certain combination is possible. Atomic mode setting fixes all this.

Nuclear pageflip is about synchronizing the update of a single output (monitor) and making that atomic. This means that when user space wants to update the primary framebuffer, move the cursor, and update a couple of overlays, all those changes happen at the same vblank. Again it all either succeeds or fails. "Every frame is perfect."

And then there shall be ponies (at the end of the rainbow)


Once the global thermonuclear war is over, we have the perfect ABI for driving display updates.

Well, almost. Enter NVidia G-Sync, or AMD's FreeSync which is actually backed by a VESA standard. Dynamically variable refresh rate. We have no way yet for timing display updates in DRM. All we can do is kick out a display update, and it will hopefully land on the next vblank, whenever that is. But we can't tell the DRM when we would like it to be. Everything so far assumes, that the display refresh rate is a constant, apart from an explicit mode switch. Though I have heard that e.g. Chrome for Intel (i915, LVDS/eDP reclocking) has some hacks that opportunistically drops the refresh rate to save power.

There is also a culprit in the DRM of today (Jun 3rd, 2014). You can schedule a pageflip, but if you have pending rendering on that framebuffer for the same GPU as were you are presenting it, the pageflip will not happen until the rendering completes. And you do not know when it will complete, which means you do not know if you will hit the very next vblank or something later.

If the rendering GPU is not the same graphics device that presents the framebuffer, you do not get synchronization at all. That means that you may be scanning out an incomplete rendering for a frame or two, or you have to stall the GPU to make sure it is done before scheduling the page flip. This should be fixed with the fences related to dma-bufs (Hi, Maarten Lankhorst).

And so the unicorn keeps on running.

by pq (noreply@blogger.com) at June 05, 2014 11:00 AM