Planet Collabora

January 31, 2020

Andrew Shadura

Using gcc sanitisers to get a nasty bug fixed

A couple of days ago a colleague at Collabora asked me to help create a Debian package for the tool he needed to complete his task. The tool happened to be an NXP code signing tool, used to sign OS images to be run on i.MX systems using ‘High Assurance Boot’.

As it often happens, the tool was distributed in a manner typical for corporations like NXP: no direct link to the tarball, custom buildsystem, compiled binaries in the same tarball as the sources. A big relief was that the tool has been distributed under a three-clause BSD license since version 3.3.0 (the sources were not provided at all before that).

The custom buildsystem meant it was not using the Debian build flags (CFLAGS := …, CC := gcc), so I had to plug a custom ‘toolchain’ Makefile into the right place with definitions relevant to Debian:


include /usr/share/dpkg/

LD = $(CC)

COPTIONS += -std=c99 -D_POSIX_C_SOURCE=200809L -fPIC $(shell dpkg-buildflags --get CPPFLAGS) $(shell dpkg-buildflags --get CFLAGS)

LDOPTIONS += $(shell dpkg-buildflags --get LDFLAGS)

After some time of playing with the build system, the package was done and sent to a colleague pending an upload to Debian. What did worry me a bit was a weird warning the compiler gave me every time:

../../code/front_end/src/acst.c: In function ‘encrypt_images’:
../../code/front_end/src/acst.c:791:25: warning: argument 1 value ‘18446744073709551615’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
  791 |         uint8_t *hash = malloc(hash_size);
      |                         ^~~~~~~~~~~~~~~~~

We didn’t use encryption, so I didn’t feel motivated to investigate.

Bad move.

Next day, the colleague comes back to me with this:

I did test builds in Apertis SDK, Debian buster, Fedora 31, ALT Linux 9 -- only the latter environment produced the cst tool which generated correct signature for the file.

The rest produce a bit different signature with zeroes for the current kernel:

000007E9 00 30
000007EA 00 82
000007EB 00 01
000007EC 00 F9
00000D41 00 30
00000D42 00 82
00000D43 00 01
00000D44 00 F9

I found that the problem is related to compilation of code/front_end/src/cst.c.

The colleague sent me a tarball with the working binary and a simple test case.

First attempt

Okay, let’s see. What are those eight different bytes? The signing tool comes with a utility, csf_parser to look inside of the signed file and extract bits and pieces. With debug logs on, for the correctly signed file the logs say this:

Parsing following values
0xD8 0x02 0x01 0x40 0x30 0x82 0x01 0xF9 0x06 …

The incorrectly signed one has this:

Parsing following values
0xD8 0x02 0x01 0x40 0x00 0x00 0x00 0x00 0x06 …

What’s 0xD8? hab_types.h says:

#define HAB_TAG_SIG  0xd8         /**< Signature */

Apparently, the first four bytes are a header of the data chunk, so the differing bytes are part of the signature itself.

Let’s add some debug prints to see what’s going on. Oh no: now the difference is not just 8 bytes, but much more. Looks like there’s some memory corruption or undefined behaviour or maybe both.

Meanwhile, the colleague chimes in: he tried linking cst.o from ALT Linux with the rest of the program built on Debian, and produced a working binary.

The dead end: disassembly

My first thought was: oh, it’s just some code relying on undefined behaviour, what if I decompiled both object files and compared them?

In the past, when I needed to disassemble something I used IDA, but since it’s proprietary, I never had a license for it, and I don’t have it anymore anyway, it was not an option. It did, however, have a very good decompilation module which produced mostly readable C-like pseudocode.

When a couple of years ago I looked for a free software replacement for IDA, radare seemed like it had some potential, but it didn’t seem ready, and it wasn’t fully packaged for Debian. The things have changed, and not only radare is in Debian, but it also comes with a GUI, Cutter, which is very good, even though there’s always toom for improvement.

To my surprise, it turned out radare also has some decompilation functionality which worked really well.

Anyway, I won’t go further into it since this experiment didn’t give me any useful data to work with.

Trying it with gdb

Good, so the code at fault is somewhere close to the signing functions, so let’s try and debug it.

I rarely use the command-line gdb for anything else than printing backtraces, so here’s a small cheatsheet for the next time I need it:

  • break -func create_sig_file: set breakpoint at a function
  • finish: ‘step out’, run to the end of the current function and stop right after it returns
  • x/10xb pointer: print 10 hexadecimal bytes

This is what I have figured during my debugging session: the data gets corrupted at the end of the signing sequence, inside or before the cms_to_buf function invocation, which gets the signature data copied from the internal OpenSSL data structure into the output buffer:

(gdb) finish
Run till exit from #0  cms_to_buf (cms=cms@entry=0x555555598a60, bio_in=bio_in@entry=0x555555597ff0, 
    data_buffer=data_buffer@entry=0x7fffffffd380 "\003", data_buffer_size=data_buffer_size@entry=0x7fffffffd37c, 
    flags=flags@entry=17090) at ../../code/back_end/src/adapt_layer_openssl.c:364
0x0000555555564399 in gen_sig_data_cms (sig_buf_bytes=0x7fffffffd37c, sig_buf=0x7fffffffd380 "", 
    hash_alg=<optimised out>, key_file=0x555555578f70 "IMG1_1_sha256_2048_65537_v3_usr_key.pem", 
    cert_file=0x555555591440 "IMG1_1_sha256_2048_65537_v3_usr_crt.pem", in_file=<optimised out>)
    at ../../code/back_end/src/adapt_layer_openssl.c:487
487    in ../../code/back_end/src/adapt_layer_openssl.c
Value returned is $2 = 0
(gdb) x/10xb 0x7fffffffd380
0x7fffffffd380:    0x00    0x00    0x00    0x00    0x06    0x09    0x2a    0x86
0x7fffffffd388:    0x48    0x86

(gdb) finish
Run till exit from #0  cms_to_buf (cms=0x445290, bio_in=0x444800, 
    data_buffer=0x7fffffffcbe0 "P\320\377\377\377\177", data_buffer_size=0x7fffffffcbcc, flags=17090)
    at ../../code/back_end/src/adapt_layer_openssl.c:365
0x00000000004109f6 in gen_sig_data_cms (in_file=0x416553 "imgsig.bin", 
    cert_file=0x43dc50 "IMG1_1_sha256_2048_65537_v3_usr_crt.pem", 
    key_file=0x43d640 "IMG1_1_sha256_2048_65537_v3_usr_key.pem", hash_alg=SHA_256, 
    sig_buf=0x7fffffffcbe0 "0\202\001\371\006\t*"..., 
    sig_buf_bytes=0x7fffffffcbcc) at ../../code/back_end/src/adapt_layer_openssl.c:487
487    in ../../code/back_end/src/adapt_layer_openssl.c
Value returned is $5 = 0
(gdb) x/10xb 0x7fffffffcbe0
0x7fffffffcbe0:    0x30    0x82    0x01    0xf9    0x06    0x09    0x2a    0x86
0x7fffffffcbe8:    0x48    0x86

Trying it with splint and valgrind

Having failed to find the data corruption by manually inspecting the source, I tried enabling every compiler warning I could, which gave me no results, and then decided to use external tools to help me with that.

First comes splint, a C linter. It produced about a thousand warnings and errors which, while probably were at least partially valid, completely overwhelmed me.

Next came valgrind, which, even at the most aggressive settings only pointed me at uninitialised padding data in some structures:

number:  NUMBER
            $$ = malloc(sizeof(number_t));
            $$->next = NULL;
            $$->num_value = $1;

And, of course, OpenSSL was reading uninitialised data to improve the randomness.

Not very helpful.

Almost giving up

Since I have spent most of the day debugging, valgrinding, reading code and worrying about my soon to be missed connection to Brussels in Frankfurt, I became very frustrated about having no useful results and was close to giving up and calling it a day.

As a last chance attempt, I decided to try all hardening options I could find in the GCC manual page and on the Debian wiki.

First, DEB_BUILD_MAINT_OPTIONS = hardening=+all, -fPIE -pie. Not much changed. Or rather nothing at all.

Then, I found sanitisers. Okay, dump it all into the makefile: -fsanitize=address -fsanitize=undefined -fsanitize=leak -fno-omit-frame-pointer. Compile, link, run: nothing works. Apparently, they need link options as well: -static-libasan -static-libubsan -static-liblsan

Colourful address sanitiser outputColourful address sanitiser output

Cracked it!

 * sig_size as input to gen_sig_data shows the size of buffer for
 * signature data and gen_sig_data returns actual size of signature
 * data in this argument
uint32_t sig_size = SIGNATURE_BUFFER_SIZE;


 * Calling gen_sig_data to generate signature for data in file using
 * certificate in cert_file. The signature data will be returned in
 * sig and size of signature data in sig_size
ret_val = gen_sig_data(file, cert_file, hash, sig_fmt,
    sig, (size_t *)&sig_size, g_mode);

Of course, size_t is 32-bit on 32-bit systems, but significantly wider on 64-bit ones, so no surprise this code would overwrite something else on the stack, in this case the signature data.

One of the lessons I learnt from this way: never underestimate the compiler’s built-in tools. At least, not the sanitisers!

Oh, just in case you want to see more: the patches currently live at the Apertis GitLab.

by Andrej Shadura at January 31, 2020 11:35 AM

January 30, 2020

Sebastian Reichel

Collabora & Linux Kernel 5.5

Linus tagged the 5.5 kernel release, so let’s have a look at Collabora’s participation. Adrian, Martyn and Gustavo Padovan worked on the Broadcom brcmfmac WiFi driver fixing suspend support for some platforms which cut power during sleep to save battery and removing an incorrect warning message. These were done as part of a broader effort to improve upstream support of peripherals used together with the i.MX 6 family of processors.

January 30, 2020 02:42 PM

January 28, 2020

Andrew Shadura

FOSDEM by train

I’ve always loved train journeys, but with flygskam changing people’s travel preferences across Europe (and possibly worldwide, though probably not that much), I decided to take train to FOSDEM this time.

When I first went to FOSDEM which, just in case you don’t know, happens each February in Brussels at ULB, I flew with Ryanair from Bratislava to Charleroi because it was cheaper. After repeating the same journey a couple of times and I once nearly missed the last bus coach to Brussels because of a late flight, and decided to rather pay more but travel with more comfort to Brussels Zaventem, the main airport of Brussels. It’s well-connected with Brussels, trains run fast and run often, which is a significant upgrade in comparison to Charleroi, where the options were limited to bus coaches and a slow train connection from Charleroi the town.

As some of my readers may know, my backpack was stolen from me after FOSDEM two years ago, and with it were gone, among other things, my passport and my residence permit card. With my flight home having been planned two and half hours from the moment when I realised my things are gone, I couldn’t get a replacement travel document quickly enough from the embassy, so I had to stay at my friends in Vilvoorde (thanks a lot again, Jurgen!) and travel with the cheapest ground transportation I could find. In my case, it was a night RegioJet coach to Prague with a connection to (again) RegioJet train to Bratislava. (I couldn’t fly even though I already had my temporary travel document since I might need to somehow prove that I’m allowed to be in the Schengen zone, which is difficult to do without a valid residence permit.) Sleeping on a bus isn’t the best way to travel for long distances, and I was knackered when I finally dropped on my sofa in Bratislava next morning. However, what I learnt was that it was possible, and were it a bit more comfortable, I wouldn’t mind something like this again.

Here I must admit that I’ve travelled by long-distance trains quite a fair bit: in my childhood we went by train to summer holidays to Eupatoria in Crimea and Obzor in Bulgaria (through Varna). Both journeys took days, and the latter also involved a long process of changing the bogies on the Moldovan-Romanian border (Giurgiulești/Galați). Since I moved to Slovakia, I many times took the night train from Minsk to Warsaw with a connection to Bratislava or Žilina, a journey which usually takes at least 18 hours. Which is to say, I’m not exactly new to this mode of travel.

With the Austrian railway company ÖBB expaning their night train services as a part of their Nightjet brand, the Vienna to Brussels sleeper returned to service last week. Prices are still a bit higher than I would have preferred (at the time of me writing this, ticket in a compartment with 6 couchettes start at €79, but it’s not as bad as it could be (apparently the last minute price is more than €200). Anyway, when I decided to go to Brussels by train, this service didn’t exist yet, so instead I followed the very useful tips from the Man in the Seat 61 and booked a day-time connection: Bratislava to Vienna to Frankfurt to Brussels.

Train route from Bratislava to Vienna to Frankfurt to BrusselsTrain route from Bratislava to Vienna to Frankfurt to Brussels

Date Station Arrival Departure Train
30.1 Bratislava 9:38 REX 2511
Wien Hbf 10:44 11:15 ICE 26
Frankfurt am Main Hbf 17:36 18:29 ICE 10
Bruxelles-Nord / Brussel Noord 21:26 21:37 IC 3321
Vilvoorde 21:45
Date Station Arrival Departure Train
4.2 Bruxelles-Midi 8:23 ICE 13
Frankfurt am Main Hbf 11:31 12:22 ICE 27
Wien Hbf 18:45 19:16 R 2530
Bratislava 20:22

I’ve booked two through tickets since in this case the Super Sparschiene discount offered lower prices than normally an international return ticket would offer. For some reason neither ZSSK (Slovak railways) nor ÖBB offered ticket online (or for a comparable price in a ticket office anyway), so I booked online with Deutsche Bahn for €59.90 each way. This sort of ticket, while bookable online, had to be posted for a €5.90 extra.

Since I’m staying the first night at friend’s in Vilvoorde again, I also had buy a ticket for this small stretch of the track from the Belgian railways directly.

See you at FOSDEM!

Building AWBuilding AW
Inside the K buildingInside the K building
Grand PlaceGrand Place

by Andrej Shadura at January 28, 2020 07:18 PM

December 16, 2019

Alexandros Frantzis

Efficient saving of multi-source buffers


Several years ago I was working on libbls, a library implementing editable buffers that can efficiently hold contents originating from both memory and file sources. Typical clients of this libraries are programs that need to edit arbitrary portions of very large files, like hex editors or wave editors, while minimizing the required memory usage.

One of the more interesting problems I encountered was devising an efficient way to save buffers to their corresponding files in place, while minimizing the extra memory (or disk space) used in the process. This post describes how this problem, which turns out to be NP-hard, can be modeled as a graph, and how we can provide a reasonable solution by using a variation of a standard graph algorithm.


Buffers in libbls are described in terms of segments that contain data from either memory or file sources. In order to conserve memory (and sometimes to even make the editing feasible) the data from file sources is not loaded into main memory. Instead, the buffer keeps information about the contained files and ranges and reads the data in vm-page-sized chunks when needed. So, a buffer could look like:

B:  |F1:0-9   |M1:10-18|F1:10-19 |M2:10-19|F2:0-6|

Where F1 and F2 are file sources, M1 and M2 are memory sources and S:B-E denotes a segment representing bytes from the range [B, E] of source S.

In the simple case, saving a buffer to a file consists of just reading data from the various sources and writing them out to the target file. Things become more complicated when the target file happens to be one of the file sources used in the buffer.

This post aims to illustrate the issues that can arise in such cases and propose an elegant way to resolve them.

Illustrating the problem

Saving the buffer to a file typically involves reading each segment in order and writing it out to the file. However, if the segment source is the same as the file that we are writing to, and depending on the ordering of segments, we may face a serious problem.

To illustrate the problem, consider a buffer B containing segment [0-9] of file F at range [10-19] of the buffer, and assume that we want to write the buffer back to file F:

F:  |0-9      |.........|

B:  |.........|F:0-9    |
    ^         ^         ^
    0         10        20

Writing the buffer segments in order leads to a situation where we first write out buffer range [0-9] to file F, overwriting the contents of the file at that range. So, when moving on to the next segment, trying to write out the range [10-19] of the buffer, the data we are going to read from the file (and therefore write) is not the original data of [F:0-9], but the data belonging to the previous segment of the buffer we just wrote at positions [0-9].

The solution in this case is to first write out range [10-19] of the buffer to the file, so that the data we use is the original file data, and only then write out any other segments.

The problem can become even more complicated when we have more segments from the target file. A particularly interesting case arises when a segment from the buffer which belongs to the target file happens to overlap with another segment of F which is also present in the buffer:

    S                   T
F:  |0-9      |.........|20-29    |.........|
            T                     S 
B:  |.......|F:20-29  |...........|F:0-9    |
    ^       ^         ^           ^
    0       8         18          30

In this case segment S in F overlaps with segment T in B. We can simply try to adapt the strategy used in the previous case and first write out the two target file segments. This works but only if we are extra careful. In this case, if we first write segment T then when we try to read the data of segment S we will read wrong data (file range 8-9 will contain data from segment T). If we do it the other way around everything works wonderfully.

Taking the last example one step further, consider what happens if we have cyclic overlaps:

    S                   T
F:  |0-9      |.........|20-29    |....|

            T                S 
B:  |.......|F:20-29  |......|F:0-9    |
    ^       ^         ^      ^     ^
    0       8         18     25    30

Now segment S in F overlaps with segment T in B, and segment T in F overlaps with segment S in B. In this case there is no way to order the writes of segments S and T so that we end up with the correct data.

A straightforward but wasteful way of tackling this problem is to save the whole buffer to a temporary file and then move the file to its final location. This works, and in some cases is preferred since it has the benefit of atomicity, but requires extra space for the temporary file. When buffers reach many GiB in size this method may prove to be unfeasible.

A more efficient method is to try to eliminate the cyclic overlaps by removing at least one of the overlaps involved. In the previous case we can replace segment T in B with two segments M and T1 so that no overlap occurs. Segment M will contain a part of T's data but will actually store them in memory (or a temporary file if necessary) and the T1 segment will just be a T minus the range stored in M. This approach still requires some amount of extra memory, but in most cases this amount is much smaller than the size of the whole buffer. Using this scheme we have:

    S                     T1
F:  |0-9      |...........|22-29  |....|

            M T1              S 
B:  |.......|.|F:22-29|......|F:0-9    |
    ^       ^         ^      ^     ^
    0       8         18     25    30

We have managed to eliminate the first overlap and now we can safely save the buffer by first writing T1 then S and then the remaining segments of B.

Modeling the problem

The examples presented in the previous section show representative but simple cases of the problems we can run into. In order to be able to handle cases of arbitrary complexity it is useful to create a model of the problem. We can then use this model to come up with algorithms that provide generic solutions.

The model we are going to present uses graphs. In these overlap graphs vertices represent segments from the file we are going to save to, that are also present in the buffer we are going to save. Edges between vertices denote segment overlap: an edge from vertex P to vertex Q denotes that segment P in the target file overlaps with segment Q in the buffer. Every edge carries with it the size of the overlapping range.

Here are some buffers and their respective overlap graphs (P* denotes a self-loop on P):

    P              Q                   R
F:  |0-9      |....|15-24    |.........|35-42  |

B1: |........................|F:0-9    |.......|   P

              P              Q
B2: |.........|F:0-9     |...|F:15-24  |.......|   Q --> P

         P                Q
B3: |....|F:0-9    |......|F:20-29  |..........|   P* Q*

    R        P                    Q
B4: |F:35-42|F:0-9    |...........|F:20-29  |..|   R <-- P* <-- Q

Solving the problem

Using the overlap graph we can now answer the question: in what order should the vertices (segments) be written to the file so that the correct data is written?

If the graph has no cycles then we can process the vertices in topological order. Topological order guarantees that each vertex is processed only when its dependencies have already been processed and written to file, and thus that no unwritten file segment in the buffer overlaps with the destination range of that vertex.

As a special case, we can process a vertex that has a self-loop but no other incoming edge. This is achieved by first writing to the file the non-overlapping part of the vertex and then the rest.

In order to handle graphs with cycles (except self-loops) we must find a way to break the cycles. This can be achieved by removing the correct edges. An edge from P to Q can be removed by replacing Q in the buffer by the segments Q1 and M as described previously. M contains an in memory copy of data of the overlapping range. Q1 is the part of Q that remains. This technique eliminates the overlap (because the overlapping part is now stored in memory) and removes the edge. By choosing the edges to remove wisely, we can minimize the amount of extra memory we are going to need.

Once we have an acyclic graph (self-loops allowed) we can process it as described previously.

Let's manually apply this method to an example:

    P              Q                   R
F:  |0-9      |....|15-24    |.........|35-42  |

    R        P                    Q
B4: |F:35-42|F:0-9    |...........|F:20-29  |..|   R <-- P* <-- Q

We first have to make the graph acyclic. We have a few choices for which edge to break:

  • Q->P, a 3 byte overlap
  • P->R, a 7 byte overlap
  • R->Q, a 5 byte overlap

To minimize the required extra memory we break the Q->P edge by storing the last three bytes of P in memory as segment M, so we get:

    P1             Q                   R
F:  |0-6   |.......|15-24    |.........|35-42  |

    R       P1     M              Q
B4: |F:35-42|F:0-6 |..|...........|F:20-29  |..|   Q <-- R <-- P1

Note that coincidentally we also resolved the self-loop of P. Now that we have an acyclic graph we can process the buffer segments in topological order, thus we first store P1 in range [8-14] of the file, next is R at [0-7] and finally Q at range [30-40] followed by all other portions of the buffer.

Implementation Details

libbls implements the solution described above when saving a file in place. The high level logic for the algorithm resides in bless_buffer_save. The implementation of the algorithm within that function can be conceptually split into three steps. Some implementation details for each step are given below, along with links to the functions that provide the relevant functionality.

Step 1: Creating the overlap graph

Relevant functions:

buffer_file.c: create_overlap_graph
overlap_graph.c: overlap_graph_add_segment

To create the overlap graph we have to scan the buffer segment collection for segments that belong to the file we want to save to. Each new segment that is found must be checked against all previous found file segments for overlap. This is an O(n + k*f(k)) algorithm where n is the number of segments in the segment collection and k the number of segments belonging to the file. If we check for overlap naively then f(k) = O(k) and therefore the algorithm is O(n + k*k). We can use an interval tree so that f(k) = O(logk) and the algorithm becomes O(n + k*logk). As we don't expect k to grow too large, the implementation currently uses the naive approach for simplicity.

Step 2: Making the graph acyclic

Relevant functions:

overlap_graph.c: overlap_graph_remove_cycles

In the second step the goal is to remove cycles from the graph, while trying to minimize the total weight of the removed edges in order to minimize extra memory usage. However, it turns out that the problem we are trying to solve, minimum feedback arc set, is NP-Hard! Don't despair though; we can get reasonable results with only little extra work.

The undirected counterpart of the minimum feedback arc set problem is the minimum (or maximum) spanning tree (MST) problem, which can be efficiently solved by using various methods. We can use these methods to give a solution (but not the best) for our problem, too. We can do that because MST algorithms remove all undirected cycles and therefore all directed ones, too. The caveat is that they remove more that we need them to; undirected cycles that are not directed.

MST algorithms work by processing edges in non-decreasing or non-increasing order of weight and adding them to the already formed tree (Prim) or trees (Kruskal) if they aren't already connected to them by an undirected path. Because our goal is to avoid directed cycles we can be more lax with the rules of adding an edge. A simple but useful observation is that if the destination node of an edge has no outgoing edges then a directed cycle cannot be formed, regardless of whether there is a pre-existing undirected path between the two nodes. Similarly, if the source node of an edge has no incoming edges a directed cycle cannot be formed either, regardless of the pre-existence of an undirected path between the source and destination nodes.

We can use the previous observations to decrease the number of edges that are incorrectly deleted by the MST algorithms when acting on directed graphs. Although we don't completely eliminate the erroneous deletions, these rules give reasonable results while keeping the complexity the same as in the undirected case.

Of the MST algorithms, Kruskal's algorithm can be used unchanged in directed graphs because it works directly on edges and doesn't care about their direction. It also works on disconnected graphs, since it finds a spanning forest. For these reasons, it was selected as the basis for the algorithm used by libbls.

Step 3: Removing edges and saving to file

Relevant functions:

overlap_graph.c: overlap_graph_get_removed_edges
buffer_file.c: break_edge
buffer_util.c: segcol_store_in_memory, segcol_store_in_file
overlap_graph.c: overlap_graph_get_vertices_topo

In the previous step we calculated which edges would need to be removed to create an acyclic graph. In this step we first perform the actual removal of these edges by storing the associated data in memory or a temporary file as described previously.

Removing the edges has the unfortunate side effect of further altering the graph as segments are removed, added or changed in the segment collection. This means that after removing the edges we must re-calculate the overlap graph for the segment collection. There may be some way to avoid re-calculating the whole graph, making only local changes to the existing graph as vertices are altered but we will use the simple (although more costly) way for now.

We then use this new overlap graph, which is guaranteed not to have any cycles, to get the vertices in topological order and save them to the target file.

by Alexandros Frantzis at December 16, 2019 12:00 AM

October 28, 2019

Guillaume Tucker

KernelCI and the Linux Foundation

A KernelCI developer's view on the project joining the Linux Foundation


As announced today at the Open Source Summit in Lyon (see the official press release), the KernelCI project which powers with automated testing for the upstream Linux kernel has now joined the Linux Foundation. This is the new home the project has finally found after sailing through uncharted waters for over five years.

What does it actually mean for the project?

Given the huge scale at which the Linux kernel is being used, achieving comprehensive test coverage for it is an incredibly challenging goal. Based on the open source philosophy principles, KernelCI's distributed architecture makes this possible by enabling the whole kernel community to collaborate around a single upstream CI system. Becoming part of the Linux Foundation will let the project flourish and become in turn integral part of the Linux kernel development workflow.

Some actual results of this move can already be seen with the new common database for storing test results from KernelCI and other related projects that share a common goal such as Red Hat's CKI and Linaro's LKFT. It is an experimental step towards expanding KernelCI in a modular way to interact with other existing systems. There are as many different ways to test the kernel as there are use-cases for it, and many types of specialised systems to cover: a CPU architecture such as 0-day, a Linux distribution such as CKI, a range of reference platforms such as LKFT...

What happens next?

This is only a new beginning, many things are yet to come with many decisions to be made: how to use the budget available from the membership scheme, how to make the infrastructure more sustainable and scalable, how to compromise and address the needs of all the members joining the project... Answers to all these questions are likely to appear as the coming months and years unfold.

One thing we can be sure of is that there is no reason for the current development plans to stop or be impacted in any way - on the contrary. There is indeed a strong need to extend test coverage and the capabilities of KernelCI at large, with a huge potential to improve upstream kernel development as a direct result. Becoming part of the Linux Foundation should be about facilitating progress in this direction above all.

What does Collabora plan to do?

Collabora has been involved with KernelCI since almost the beginning, providing some of the servers running the services and a large LAVA test lab. We've also become the biggest contributor to KernelCI development with myself as de facto maintainer for the core repositories on Github and in charge of the weekly updates.

Among all the things we have done in the last couple of years, these are probably the most significant ones:

  • Enabled automated bisection for boot regressions

  • Added v4l2-compliance test suite and a few more minor ones (suspend/resume...)

  • Enabled "Depthcharge" bootloader in LAVA to run KernelCI tests on Chromebooks

  • Portable command-line tools and YAML-based configuration to perform all the KernelCI steps in any environment

Going forward, here are some clear items on the project's current roadmap which will carry on full steam ahead with a growing team of developers:

  • Further improve handling of test results: web interface, email reports, regression tracking

  • Extend coverage of test suites: kselftest, LTP, KUnit, i-gpu-tools and many more

  • Deploy new bisection tool to cover complex test suite results

  • Improve modular core architecture to enable inter-operability between the various elements that form the overall KernelCI pipeline

Then like with every open source project, other contributors will also add more features, enable new platforms in the global test hardware pool or bring new ideas. The combined efforts of all parties is what makes the project unique and mature enought to reach its ultimate goal. We will make sure we continue to be among the top active players within the growing KernelCI community, as a founding member of the Linux Foundation project as well as with myself being both a maintainer and on the technical steering committee.

by Guillaume Tucker ( at October 28, 2019 12:00 PM

October 24, 2019

Erik Faye-Lund

Zink: Fall Update

I recently went to XDC 2019, where I gave yet another talk about Zink. I kinda forgot to write a blog-post about it, so here’s me trying to make up for it… or something like that. I’ll also go into some more recent developments as well.

My presentation was somewhat similar to the talk I did at SIGGRAPH this year, but with a bit more emphasis on the technical aspect, as the XDC audience is more familiar with Mesa internals.

If you’re interested, you can find the slides for the talk here. The talk goes through the motivation and basic approach. I don’t think I need to go through this again, as I’ve already covered that before.

As for the status, Zink currently supports OpenGL 2.1 and OpenGL ES 2.0. And there’s no immediate plans on working on OpenGL 3.0 and OpenGL ES 3.0 until Zink is upstream.

Which gets us to the more interesting bit; that I started working on upstreaming Zink. So let’s talk about that for a bit.

Upstreaming Zink

So, my current goal is to get Zink upstream in Mesa. The plan outline in my XDC talk is slightly outdated by now, so here I’ll instead say what’s actually happened so far, and what I hope will follow.

Before I could add the driver itself, I decided to send all changes outside of the driver as a set of merge-requests:

The last one is probably the most interesting one, as it moves a lot of fixed-function operations into the state-tracker, so individual drivers won’t have to deal with them. Unless they choose to, of course.

But all of these has already been merged, and there’s just one final merge-request left:

This merge-request adds the driver in its current state. It consists of 163 commits at the time of writing, so it’s not a thing of beauty. But new drivers usually aren’t, so I’m not too worried.

When this is merged, Zink will finally be a “normal” part of Mesa… Well, sort of anyway. I don’t think we’ll enable Zink to be built by default for a while. But that’ll just be a matter of adding zink to the -Dgallium-drivers meson-option.

Testing on CI

The current branch only adds building of Zink to the CI. There’s no testing being done yet. The reasons for this is two-fold:

  1. We need to get a running environment on CI. Rather of bringing up some hardware-enabled test-runner, I intend to try to set up SwiftShader as a software rasterizer instead, as that supports Vulkan 1.1 these days.
  2. We need some tests to run. Zink currently only supports OpenGL 2.1, and sadly the conformance test suite doesn’t have any tests for OpenGL versions older than 3.0. Piglit has some, but a full piglit-run takes significantly more time, which makes it tricky for CI. So right now, it seems the OpenGL ES 2.0 conformance tests are our best bet. We’ll of course add more test-suites as we add more features.

So, there’s some work to be done here, but it seems like we should be able to get something working without too much hassle.

Next steps

After Zink is upstream, it will be maintained similarly to other Mesa drivers. Practically speaking, this means that patches are sent to the upstream repo rather than my fork. It shouldn’t make a huge difference for most users.

The good thing is that if I go away on vacation, or are for some other reason unavailable, other people can still merge patches, so we’d slightly reduce the technical bus-factor.

I’m not stopping developing Zink at all, but I have other things going on in my life that means I might be busy with other things at times. As is the case for everyone! :wink:

In fact, I’m very excited to start working on OpenGL 3.x and 4.x level features; we still have a few patches for some 3.x features in some older branches.

The future is bright! :sunny:

by Erik Faye-Lund at October 24, 2019 02:22 PM

October 07, 2019

Alexandros Frantzis

mda-rs: Custom Mail Delivery Agents in Rust

I had been stubbornly using procmail for two decades to filter my email, before deciding that it's finally time for a change. Procmail served me well through the years, but, little by little, the annoyances started adding up. The fact that procmail is unmaintained since 2001 inspired less and less confidence as years passed, and the requirement in many cases for external processing before any useful matching could be performed (e.g., dealing with MIME content transfer encoded data, or non us-ascii character sets) was a constant pain point. As the complexity of my rules grew, I even had to resort to an external program (a custom C++ program in my case) to perform some of the mailbox selection logic, using the AUTOFOLDER feature of procmail.

At that point, and given all the functionality that I had to implement myself, I seriously started questioning the value procmail was providing to me and started looking for alternatives. I evaluated fdm and maildrop, finding in both a lot that I liked; first and foremost a less arcane filtering language. In the end, I found maildrop to be a closer match to my preferences and requirements, and I especially appreciated the automatic MIME content decoding.

I briefly considered switching to maildrop, but a few experiments on my set of filtering rules indicated that maildrop's performance was significantly worse compared to procmail, even though for procmail I had to call out to a few more external programs to achieve similar functionality. I also gave fdm a go, and it was even slower than maildrop. I receive a lot of emails each day, mostly from various FOSS mailing lists, and the performance penalty adds up. While waiting for a few dozen seconds daily wouldn't have been the end of the world, the thought that my original and much older solution was faster, disturbed me.

Meanwhile, I noticed that the control flow statements in maildrop's filtering language were similar to that of a general purpose programming language, and, in fact, with the integration with regular expressions, they looked a bit like perl. So, I thought, what if instead of using a domain specific language, I could use a general purpose language, augmented by some kind of library to implement the same filtering and delivery functionality. My thoughts then drifted to other things, but the seed of that idea had already been planted in my mind it seems, as a few years later I found myself implementing the code that would become mda-rs.

With mda-rs I wanted to create an experience as close as possible to using an interpreted domain specific language, the approach follow by typical MDAs, while still having the performance and power of a full, compiled language at the fingertips. One aspect of this experience was providing an API that feels like natural fit for the intended purpose. The other aspect was providing a straightforward way to build a custom MDA. For this second aspect, the simplicity of Rust's cargo was one of the reasons I decided to use Rust for this project.

Another decision catering to a similar ease-of-use goal was that the user shouldn't be required to use external programs to transform the data just so they could perform basic matching. To this end, mda-rs, like maildrop, normalizes the email before processing, by decoding and converting all text MIME content parts (including MIME encoded-words in headers) to plain UTF-8.

Of course, I also wanted the resulting custom MDAs to be fast; performance was my main disappointment with other MDAs after all. Performance requires an efficient implementation, but also and an API that encourages performant use. An example of the effect of such a concern on the mda-rs API, is that the API provides access to the header fields by name, so that one can perform targeted per-header-field string searches, which can be much faster than regular expression searches of the whole header.

Finally, an important requirement for all MDAs is that the email delivery is durable, i.e., that no email will be lost in case of a system failure. In other words, at any point in time at least one of the local filesystem and the email server should have a complete and fully accessible copy of the email.

While investigating the best way to provide such durability guarantees, I noticed that all three MDAs mentioned previously are fsyncing the written email file, as expected. However, they are not fsyncing the containing directory, which could potentially lead to the file not appearing on the filesystem 1. The exact guarantees in this scenario seem to be dependent on the filesystem, but, to provide maximum durability, mda-rs fsyncs both the file and the containing directory by default. This does incur a performance penalty, so I have also included an option to use the file-sync-only approach if preferred.

Since mda-rs was written to cater primarily to my personal needs, it currently has some limitations, or unimplemented functionality, if you will. The most prominent one is that it delivers to maildir only, since that's the only mailbox format I use. The second is that there is no built-in way to change the email data (e.g., add a header field) except by filtering through an external tool, since this is another feature I don't use.

Here is a small taste of how a custom MDA would look like with mda-rs:

use std::path::PathBuf;

use mda::{Email, EmailRegex, Result, DeliveryDurability};

fn main() -> Result<()> {
    let root = PathBuf::from("/tmp/my-personal-mail");

    let mut email = Email::from_stdin_filtered(&["/usr/bin/bogofilter", "-ep"])?;
    // Use quicker (but possibly less durable) delivery.

    let from = email.header_field("From").unwrap_or("");
    let bogosity = email.header_field("X-Bogosity").unwrap_or("");

    if bogosity.contains("Spam, tests=bogofilter") ||
       from.contains("") {
        return Ok(());

    let cc = email.header_field("Cc").unwrap_or("");
    let to = email.header_field("To").unwrap_or("");

    if to.contains("") ||
       cc.contains("") {
        if email.body().search("URGENCY RATING: (CRITICAL|URGENT)")? {
        } else {
        return Ok(());



and a corresponding minimal Cargo.toml:

name = "my-mda"
version = "0.1.0"
edition = "2018"

mda = "0.1"

To provide an idea of the performance gains to expect, I benchmarked a us-ascii only version of my personal mail filtering rules on a sample of 250 of my recently received emails using all the aforementioned MDAs. Of course, the performance results will vary depending on both the rules and the email themselves, but in my experience what is presented below seems to be a typical outcome. I'd be very interested to know how mda-rs worked for others on the performance front.

In the benchmark I included runs both with and without filtering for spam, since such filtering takes up a significant amount of processing and affects the relative results. For mda-rs I included two versions, one that mostly uses per-header-field searches, and a second one that uses regular expressions exclusively, and thus is a bit closer to how the other MDAs work. For each mda-rs version I ran the benchmark both in file sync only mode, which is how the others MDAs work, and full sync (file and directory) mode, which is the default for mda-rs. Note that, for this benchmark, both maildrop and mda-rs performed MIME decoding and translation to UTF-8 (internally and automatically), whereas neither procmail nor fdm did so, and no external program was used to provide such functionality. I verified that the delivery results were the same for all MDAs.


mda-rs wins hands down when operating in file sync mode, while at the same time doing more work (normalizing the email) compared to the next faster contestant, procmail. In full sync mode, the extra directory syncs prove to be expensive enough to overshadow other mda-rs performance gains, and procmail takes the lead. Still, even in this case, the performance is acceptable and much better compared to both maildrop and fdm.

If you would like to learn more about how to use mda-rs, you can find detailed documentation at:

The mda-rs project repository is at:

The mda-rs entry is at:


1 See

by Alexandros Frantzis at October 07, 2019 12:00 AM

September 25, 2019

Andrew Shadura

Rust-like enums in Kotlin

Rust has an exciting concept of enumeration types, which is much more powerful than enums in other languages. Notably C has the weakest type of enum, since there’s no type checking of any kind, and enum values can be used interchangeably with integers:

enum JobState {

You can opt for manually assigning integers instead of leaving this to the compiler, but that’s about it.

Higher level languages like Python and Java treat enumeration types as classes, bringing stricted type checking and better flexibility, since they can be extended nearly as any other classes. In both Python and Java individual enumerated values are singleton instances of the enumeration class.

class JobState(Enum):
    PENDING = auto()
    STARTED = auto()
    FAILED = auto()
    COMPLETED = auto()
enum JobState {

Since enumerations are classes, they can define extra methods, but because the enum values are singletons, they can’t be coupled with any extra data, and no new instances of the enum class can be created.

In contrast with Python and Java, Rust allows attaching data to enumerations:

enum JobState {

This allows us to store the error message in the same value as the job state, without having to declare a structure with an extra field which would be used only when the state in Failed.

So, what Kotlin has to offer? Kotlin has a language feature called sealed classes. A sealed class is an abstract class with limited interitance: all of its subclasses have to be declated in the same file. In a way, this is quite close to the Rust enums, even though sealed classed look and behave a bit differently.

sealed class JobState {
    object Pending : JobState()
    object Started : JobState()
    object Completed : JobState()
    data class Failed(val errorMessage: String) : JobState()

Declared this way, JobState can be used in a way similar to Rust’s enums: a single variable of this type can be assigned singletons Pending, Started or Completed, or any instance of Failed with a mandatory String member:

val state: JobState = JobState.Failed("I/O error")

when (state) {
    is JobState.Completed ->
        println("Job completed")
    is JobState.Failed ->
        println("Job failed with an error: ${state.errorMessage}")

This usage resembles the regular Java/Kotlin enums quite a bit, but alternatively, Pending and friends can be declared outside of the sealed class, allowing them to be used directly without the need to add a JobState qualifier.

A slightly simplified real life example from a Kotlin project I’m working on, where a separate coroutine handles I/O with a Bluetooth or a USB device:

sealed class Result
object Connected : Result()
data class Failed(val error: String) : Result()

sealed class CommServiceMsg
data class Connect(val response: CompletableDeferred<Result>) : CommServiceMsg()
object Disconnect : CommServiceMsg()
data class Write(val data: ByteArray) : CommServiceMsg()

fun CoroutineScope.bluetoothServiceActor(device: BluetoothDevice) = actor<CommServiceMsg>(Dispatchers.IO) {
    val socket: BluetoothSocket = device.createSocket()

    process@ for (msg in channel) {
        when (msg) {
            is Connect -> {
                with(socket) {
                    msg.response.complete(try {
                    } catch (e: IOException) {
                        val error = e.message ?: ""
            is Disconnect -> break@process
            is Write -> {

Here, we can talk to bluetoothServiceActor using messages each carrying extra data; if the coroutine needs to talk back (in this example, the result of a connection attempt), it uses a CompletableDeferred<> value of the Result type, which can hold an error message when needed.

With that in place, we can write something like this:

val bluetoothService = bluetoothServiceActor(device)
val response = CompletableDeferred<Result>()

var result = response.await()
when (result) {
    is Connected -> {
        bluetoothService.send(Write(byteArrayOf(42, 0x1e, 0x17)))
    is Failed ->
        println("error occurred: ${result.error}")

by Andrej Shadura at September 25, 2019 01:31 PM

July 25, 2019

Erik Faye-Lund

Zink: Summer Update and SIGGRAPH 2019

Here’s an overview of the recent changes in Zink since the previous post, as well as an exciting announcement!

What’s new in the world of Zink?

OK, so I haven’t been as good at making frequent updates on as I was hoping, but let’s try to make up for it:

Since last time, there’s quite a lot of things that has been resolved:

  • We now do proper control-flow. This means things like if-statements, for-loops etc. There might be some control-flow primitives missing still, but that’s because I haven’t encountered any use yet.
  • Alpha testing has been implemented.
  • Client-defined clip-planes has been implemented.
  • Support for gl_FrontFacing has been implemented.
  • Lowering of glPointSize() to gl_PointSize has been implemented. This means you can use glPointSize() to specify sizes instead of having to write the gl_PointSize-output from the vertex shader.
  • Support for gl_FragDepth has been implemented.
  • Two-sided lighting has been implemented.
  • Shadow-samplers has been implemented.
  • Support for 8-bit primitive indices has been implemented.
  • Occlusion queries has been implemented correctly across command buffers. This includes the ability to pause / restore queries.
  • The compiler has been ported to C.
  • The compiler no longer lowers IO, but instead process derefs directly.
  • The compiler now handles booleans properly. It’s no longer lowering them to floats.
  • David Airlie has contributed lowering of glShadeModel(GL_FLAT) to flat interpolation qualifiers. This still doesn’t give us the right result, because Vulkan only supports the first vertex as the provoking vertex, and OpenGL defaults to the last one. To resolve this in a reasonable way, we need to inject a geometry shader that reorders the vertices, but this hasn’t been implemented yet. You can read more about this in this ticket
  • … and a boat-load of smaller fixes. There’s just too many to mention, really.

In addition to this, there’s been a pretty significant rewrite, changing the overall design of Zink. The reason for this, was that I made some early design-mistakes, and after having piled a bit too many features on top of this, I decided that it would be better to get the fundamentals right first.

Sadly, not all features have been brought forward since the rewrite, so we’re currently back to OpenGL 2.1 support. Fixing this is on my list of things I want to do, but I suspect that cleaning things up and upstreaming will take presedence over OpenGL 3.0 support.

And just to give you something neat to look at, here’s Blender running using Zink:

Blender on Zink Blender on Zink

Khronos Vulkan BoF at SIGGRAPH 2019

Khronos has been nice enough to give me a slot in their Vulkan Sessions at the Khronos BoFs during SIGGRAPH 2019!

The talk will be a slightly less tech-heavy introduction to Zink, what it does and what the future holds. It will focus more on the motivation and use cases than the underlying technical difficulties compared to my previous talks.

So, if you’re in the area please drop by! A conference pass is not required to attend the BoF, as it’s not part of the official SIGGRAPH program.

by Erik Faye-Lund at July 25, 2019 07:38 PM

July 11, 2019

Helen Koike

Como começar a contribuir com Open Source

Collabora is participating in the Linux Developer Conference Brazil and this post is written in Portuguese to serve as a guide for the attendees to learn more about how to start contributing to Open Source Software.

Muita gente pensa que para começar a contribuir com um projeto de FOSS (Free and Open Source Software) tem que saber codar, isso é um mito, as pessoas precisam conhecer como o projeto é estruturado como uma comunidade, e muitas vezes para contribuir nem é necessário saber escrever código, pois reportar bugs, realizar testes, contribuir com o design da interface, revisar e arquivar bug reports obsoletos, traduzir o software ou ajudar a organizar times ou conferência são contribuições muito bem-vindas.

Compilando o software a partir do código-fonte

Geralmente o projeto possui alguma página web com diversas informações, inclusive com instruções de como fazer o download do código-fonte. A maioria dos projetos usam algum sistema de controle de versão, atualmente o Git é mais popular, mas pode ser Svn, Cvs, Mercurial e outros. Entre no site do projeto desejado, verifique qual sistema é usado e familiarize-se com as ferramentas necessárias.

Todo projeto é diferente, mas provavelmente você irá encontrar alguns dos seguintes arquivos na base do projeto:

  • README (txt): Contém explicações iniciais do projeto. Comece por aqui, já que usualmente esse arquivo possui informações de como compilar e instalar o software do código-fonte.
  • LICENSE ou COPYING (txt): Possui informações sobre a licença na qual o projeto é distribuído.
  • MAINTAINERS (txt): Descreve quais pessoas são responsáveis por qual parte do projeto.
  • CONTRIBUTING (txt): Descreve mais informações de como participar da comunidade e contribuir.
  • Documents ou docs (pasta): Contém diversas documentações sobre o projeto, tanto de usabilidade quanto sobre a parte técnica.

Quando nos referimos ao projeto "mainline" ou "upstream", significa que é o projeto oficial, onde o desenvolvimento de novas funcionalidades está acontecendo e o que contém as modificações mais recentes. Em geral, ao fazer alguma modificação (patch) no código, ela só será considerada oficial depois de entrar na versão mainline. Por exemplo, o browser que vem em uma distribuição de Linux não é a mainline. Apesar de se basear em uma versão específica do projeto mainline do Linux, a comunidade da distribuição usualmente aplica diversas modificações tanto no código-fonte quanto nas configurações do kernel para atender as necessidades específicas daquela comunidade . Portanto, ao encontrar um bug, é importante testar o código mainline para ver se ele já foi corrigido ou se afeta apenas a versão da sua distribuição.

Como buscar ajuda

Ao buscar ajuda, tenha em mente que a maioria das pessoas da comunidade também são voluntárias como você e não são obrigadas a atender às suas demandas, logo seja educado, verifique se o projeto define algum tipo de código de conduta, mas não tenha medo de perguntar, mostre que você fez uma rápida investigação, isso indica que você está correndo atrás e as pessoas em geral gostam de incentivar gente nova e interessada, por exemplo:

"Olá, eu sou novo no projeto, queria entender sobre X, achei o artigo Y mas ele não parece explicar o que eu gostaria de saber, alguém poderia me explicar ou me indicar onde posso ver essa informação?"

"Olá, estou tentando entender como o código X funciona, me parece que faz a tarefa Y mas estou incerta, existe alguma documentação sobre isso? Procurei e não encontrei. Agradeço se alguém me ajudar"

Se você não tiver certeza que está perguntando na lista de email certa, ou no canal de IRC certo, pergunte onde seria mais apropriado postar a sua pergunta. Caso você não obtenha resposta, não assuma que está sendo ignorado por ter começado agora, as pessoas são ocupadas, espere um pouco (algumas horas no IRC ou uma semana no email) e refaça a pergunta.

Onde buscar ajuda

IRC: Muitos projetos possuem um canal de bate-papo no IRC para a comunidade se coordenar e se ajudar. Veja se o seu projeto possui um canal, baixe um cliente de IRC, conecte no servidor e se junte ao canal. Por exemplo, no servidor da Freenode você pode encontrar os canais #freebsd, #ubuntu, #debian, #python, #docker, há também canais mais específicos, por exemplo, o Debian se organiza por times, logo você pode encontrar no servidor da OFTC os canais #debian-cloud, #debian-mirrors, #debian-ftp entre outros. Muitas vezes o projeto possui canais específicos para quem está começando, como o #kernelnewbies na OFTC.

Guia para configurar o seu IRC

Listas de emails / fóruns: Procure se o projeto possui alguma lista de email ou fórum para discussão, por exemplo, o Kernel possui uma lista para cada subsistema. Procure a lista apropriada e se inscreva, muitas listas disponibilizam os archives dos emails passados, útil quando está procurando sobre algum tópico que já foi discutido. A dica aqui é fazer bottom post (responder emails em baixo ou entre a cópia) utilizado pela maioria dos projetos. Caso não obtenha resposta em uma ou duas semanas, verifique se mandou a sua pergunta para a lista de email mais apropriada ou as vezes as pessoas estão simplesmente ocupadas, eu geralmente respondo a mesma thread the email com a palavra "ping" para relembrar as pessoas de responderem.

Discussões em algum sistema: alguns projetos usam o GitHub diretamente para perguntas e discussões, verifique se o projeto usa algum sistema específico para discussões e participe.

Pull requests

Pull request é quando você requisita que suas mudanças seja incluído na mainline. Cada projeto possui a sua maneira de enviar modificações (patches) de código para o projeto, no Linux Kernel por exemplo, você deve mandar os patches no texto do email no formato do git-format-patch, já no FreeBSD, você deve anexar o patch em formato diff unified no sistema de controle de bugs, alguns outros projetos aceitam pull requests pelo sistema do GitHub, verifique com o seu projeto como você deve enviar os patches para a comunidade.

Estrutura da comunidade

Cada comunidade se organiza de uma forma diferente, podemos encontrar os diferentes papeis dentro da comunidade

  • Autor: quem começou o projeto
  • Commiter: quem possui o acesso de commit na mainline
  • Mantenedor: o responsável por revisar e aplicar patches de alguma subparte do projeto ou no projeto todo
  • Colaboradores: que ajudam o projeto em diversos aspectos
  • Time: um subgrupo de colaboradores que fazem alguma tarefa específica do projeto, podendo até fazer o papel de um mantenedor
  • Usuários

É importante conhecer a estrutura da comunidade para saber pra quem fazer perguntas, pedir revisões ou mandar contribuições para o time ou grupo de pessoas trabalhando na área relacionada. Lista de emails ou canais de IRC com escopo muito genérico será mais difícil encontrar alguém que revise e aplique um patch, ou responda uma pergunta muito específica sobre algum assunto.

Exemplos de como algumas comunidades funcionam


A comunidade é organizada de maneira bem democrática, o líder do projeto é eleito por voto anual, os trabalhos são divididos por times (Ex. time de mirrors, time DSA para a infraestrutura, time de release que coordena o lançamento da próxima versão), e cada pacote no Debian pode ter como responsável um mantenedor específico ou um time. Logo ao encontrar um bug em um determinado pacote, verifique quem é o responsável, entre em contato e envie seus patches para a pessoa, time ou lista de email certa.

Linux Kernel:

O projeto é mantido por Git, o único commiter da mainline é o Linus Torvalds, o projeto é dividido em diversos subsistemas, cada subsistema possui um mantenedor em que o Linus Torvalds confia e aceita seus pull requests. A organização do desenvolvimento de cada subsistema é bem variado e cada um tem suas regras, tem subsistemas que possuem co-mantenedores e outros que não, cada subsistema normalmente tem um canal de IRC e uma lista de email e documentação.

Aprofundando no código

A maioria das pessoas começam contribuindo com algo tão simples quanto corrigir um erro ortográfico, essa simples contribuição trará conhecimento do workflow completo de como trabalhar com a comunidade, mas muitas vezes encontrar problemas técnicos à serem resolvidos nem sempre é fácil e exige um conhecimento maior do projeto.

Ao se aprofundar no código, verifique quais são os métodos de debug que o projeto utiliza, essas técnicas vai ajudá-lo a entender melhor o código e à informar com mais detalhes o seu problema para outras pessoas. Pesquise onde você consegue visualizar os logs de erro, como incluir no código alguma mensagem de log, veja se consegue executar o projeto passo à passo com ferramentas como GDB, Python Trace. Alguns projetos já possuem testes inclusos, veja também se a comunidade usa alguma ferramenta externa para teste, aprenda como reproduzir os testes e à depurar o código.

Achar um problema a ser resolvido

Caso você tenha encontrado um mal funcionamento no projeto de interesse, comece por aí, verifique se alguém já reportou o bug em alguma lista de email, fórum ou no próprio sistema de controle de bugs, entre em contato com as pessoas envolvidas e peça mais informações. Caso não saiba por onde começar a olhar no código, pergunte às listas de email ou canais de IRC, normalmente as pessoas te apontarão para onde olhar a grosso modo, e assim comece a sua investigação do problema. Reporte o bug para a comunidade para que saibam que o problema já está sendo investigado e que podem te contactar para trabalhar em conjunto, evitando assim retrabalho.

Caso você não tenha dado a "sorte" de encontrar um bug, muitos projetos já possuem uma lista de bugs conhecidos só esperando alguém para adotá-los, procure onde está lista se encontra, analise algum bug que consiga reproduzir e não tenha medo de fazer perguntas.

Dependendo do projeto, muitas vezes dar os passos acima é muito complicado e exige muito conhecimento prévio para entender um bug, no Linux Kernel por exemplo, essa lista de problemas já conhecidos mal existe, as que existem só possuem problemas difíceis para um iniciante. O que eu sugiro nesse caso é que você mude a abordagem, ao invés de tentar achar um problema, estude o código, quando você estiver familiarizado o suficiente vai provavelmente visualizar que o código não é perfeito e ver vários pontos de melhorias. Uma dica é pegar algum código (alguma função, classe, módulo ou driver do projeto) e tente reescrever esse código do zero, utilizando o código original apenas como referência, fazendo perguntas para a comunidade das partes que não entende. O conhecimento adquirido neste exercício vai proporcionar uma visão melhor do código, das APIs internas, expor possíveis problemas, além de integrá-lo melhor na comunidade.

Estágios pagos com mentoria

Uma ótima forma de começar à contribuir com FOSS é através de um estágio direcionado. Há algumas empresas ou fundações de FOSS que financiam programas de estágio remoto de aproximadamente 3 meses, onde o mentor é normalmente um voluntário que propõe uma determinada tarefa dentro do projeto. Com isso você já tem uma direção no que contribuir, ter alguém que você possa fazer perguntas e assumir que é sim o papel delas te responder, acompanhar o seu progresso periodicamente, além de ser pago por isso.

Google Summer of Code (GSoC): Estágio remoto em algum projeto de FOSS pago pela Google durante 3 meses de Maio à Julho para estudantes, confira quais projetos participam, se interessar por algum, verifique as propostas feitas pelos mentores voluntários, veja o processo de seleção, normalmente há algumas tarefas que você precisa realizar na aplicação.

Outreachy: Organizado pela Software Freedom Conservancy, similar ao GSoC para grupos sub-representados na comunidade, não precisa ser estudante, acontece duas vezes ao ano (Maio à Junho, e Dezembro à Fevereiro).

Endless Vacation of Code (EVoC): A Fundação tem o próprio programa pra universitários que querem começar a contribuir. O EVoC pode começar em qualquer mês do ano.


Muitos projetos de FOSS organizam conferências para reunirem a comunidade e discutirem problemas atuais de forma colaborativa. Ir à conferências é uma ótima forma de se familiarizar com o projeto e conhecer pessoalmente as pessoas com quem você interage online. Verifique quais são as conferências que o projeto no qual você se interessa realiza, ou quais as principais conferências que as pessoas que você interage participa.

Ajuda de custo para conferências

O problema é que a maioria dessas conferências são fora do Brasil e a viagem fica muito cara, principalmente para estudantes. Felizmente, muita dessas conferências distribuem bolsas para ajuda de custo, a Linux Foundation por exemplo disponibiliza um formulário para requisitar ajuda de custo com passagem de avião e hotel, também há ajuda para grupos sub-representados para incentivar a diversidade na comunidade, e as vezes o próprio projeto possui algum fundo para bolsa. O Debian por exemplo, pode pagar a sua viagem, principalmente se você é uma pessoa que já está ajudando a comunidade, mas mesmos novatos podem conseguir.

Outra forma de conseguir ajuda de custo é se voluntariar par ajudar na organização da conferência, mande um email para a equipe de organização e pergunte se há essa possibilidade.

Espero que essas dicas ajudem, caso tenha alguma dúvida entre em contato ou deixe um comentário

by Helen M. Koike Fornazier ( at July 11, 2019 02:01 AM

June 26, 2019

Alyssa Rosenzweig

GNOME Meets Panfrost

In my last Panfrost blog, I announced my internship goal: improve Panfrost to run GNOME3. GNOME is a popular Linux desktop making heavy use of OpenGL; to use GNOME with only free and open-source software on a machine with Mali graphics, Panfrost is necessary.

Two months ahead-of-schedule, here I am, drafting this blog post from GNOME on my laptop running Panfrost!

A tiled architecture

Bring-up of GNOME required improving the driver’s robustness and performance, focused on Mali’s tiled architecture. Typically found in mobile devices, tiling GPU architectures divide the screen into many small tiles, like a kitchen floor, rendering each tile separately. This allows for unique optimizations but also poses unique challenges.

One natural question is: how big should tiles be? If the tiles are too big, there’s no point to tiling, but if the tiles are too small, the GPU will repeat unnecessary work. Mali offers a hybrid answer: allow lots of different sizes! Mali’s technique of “hierarchical tiling” allows the GPU to use tiles as small as 16x16 pixels all the way up to 2048x2048 pixels. This “sliding scale” allows different types of content to be optimized in different ways. The tiling needs of a 3D game like SuperTuxKart are different from those of a user interface like GNOME Shell, so this technique gets us the best of both worlds!

Although primarily handled in hardware, hierarchical tiling is configured by the driver; I researched this configuration mechanism in order to understand it and improve our configuration with respect to performance and memory usage.

Tiled architectures additionally present an optimization opportunity: if the driver can figure out a priori which 16x16 tiles will definitely not change, those tiles can be culled from rendering entirely, saving both read and write bandwidth. As a conceptual example, if the GPU composites your entire desktop while you’re writing an email, there’s no need to re-render your web browser in the other window, since that hasn’t changed. I implemented an initial version of this optimization in Panfrost, accumulating the scissor state across draws within a frame, rendering only to the largest bounding box of the scissors. This optimization is particularly helpful for desktop composition, ideally improving performance on workloads like GNOME, Sway, and Weston.

…Of course, theory aside, mostly what GNOME needed was a good, old-fashioned bugfixing spree, because the answer is always under your nose. Turns out what really broke the desktop was a trivial bug in the viewport specification code. Alas.


Looking forward to sophisticated workloads as this open driver matures, I researched job “scoreboarding”. For some background, the Mali hardware divides a frame into many small “jobs”. For instance, a “vertex job” executes a vertex shader; a “tiler job” executes tiling (sorting geometry job into tiles at varying hierarchy levels). Many of these jobs have to execute in a specific order; for instance, geometry has to be output by a vertex job before a tiler job can read that geometry. Previously, these relationships were hard-coded into the driver, which was okay for simple workloads but does not scale well. 

I have since replaced this code with an elegant dependency management system, based on the hardware’s scoreboarding. Instead of hard-coding relationships, the driver can now specify high level dependencies, and a generic algorithm (based on toplogical sorting) works out the order of submission and scoreboard flags necessary to actualize the given requirements. The new scoreboarding implementation has enabled new features, like rasterizer discard, to be implemented with ease.

With these improvements and more, several new features have landed in the driver, fixing hundreds of failing dEQP tests since my last blog post, bringing us nearer to conformance on OpenGL ES 2.0 and beyond.

Originally posted on Collabora’s blog

June 26, 2019 04:00 AM

June 05, 2019

Alyssa Rosenzweig

Joining Collabora for a summer of Panfrost

Hello, I’m Alyssa Rosenzweig, a student, the lead developer of the open-source Panfrost graphics driver… and now a Collaboran!

Years ago, I joined the open-source community with a passion and a mission: to enable equal access to high-quality computing via open-source software. With this mission, I co-founded Panfrost, aiming to create an open-source driver for the Mali GPU. Before Panfrost, users of Mali GPUs required a proprietary blob, restricting their ability to use their machines as they saw fit. Some users were unable to run Linux, their operating system of choice, with the display system of their choosing, simply because there were not blobs available for their particular configuration. Others wished to use an upstream kernel; yet others held a deep philosophical belief in free and open-source software. To each users’ driver problem, Panfrost seeks to provide a solution.

Days ago, I joined Collabora with the same passion and the same mission. Collabora was founded on an “open first” model, sharing my personal open source conviction. Collabora’s long-term vision is to let open-source software blossom throughout computing, fulfilling my own dream of an open-source utopia.

With respect to graphics, Collabora has shared my concerns. After all, we’re all on “Team Open Source” together! Collabora’s partners make awesome technology, often containing a Mali GPU, and they need equally awesome graphics drivers to power their products and empower their users. Our partners and our users asked, and Panfrost answered.

At Collabora, I am now a full-time Software Engineering Intern, continuing throughout the summer to work on Panfrost. I’m working alongside other veteran Panfrost contributors like Collaboran Tomeu Vizoso, united with open-source community members like Ryan Houdek. My focus will be improving Panfrost’s OpenGL ES 2.0 userspace, to deliver a better experience to Panfrost users. By the end of the summer, we aim to bring the driver to near conformance, to close any performance gaps, and through this work, to get GNOME Shell working fluidly on supported Mali hardware with only upstream, open-source software!

Supporting GNOME in Panfrost is a task of epic proportions, a project dream since day #1, yet ever distant in the horizon. But at Collabora, we’re always up for the challenge.

Originally posted on Collabora’s blog

June 05, 2019 04:00 AM

April 20, 2019

Alexandros Frantzis

Dynamic programming for fun and profit

Imagine that you decide to make an investment. You have 1720 currency units (CU) at your disposal and the investment options that are available are:

Option#min amountmax amountinterestsign up bonus

You may use multiple investment options, each one possibly multiple times. Each option accepts a minimum and maximum amount of CUs, and provides an interest on that amount plus an additional bonus amount which does not depend on the invested amount. How will you invest your money? Take your time and continue reading when ready...

Let see how you did! If you invested all 1720 CU in option 5 then... you chose poorly. Your profit is going to be 38.4 CU — not bad, but you can do better. If you invested 1020 in option 5 and 700 in option 4, that's a bit better at 39.5 CU, but still not optimal. What about 1020 in option 5, 500 in option 4 and 200 in option 2? That's actually a bit worse at 39.1 CU! I'll spare you the agony and reveal the best solution: 1020 in option 1, 300 in option 3 twice, and 100 in option 2, giving a profit of 40.5 CU. This is about 5.5% better than the straightforward solution of assigning everything to option 5.

As this short exercise has hopefully exhibited, finding the optimal portofolio of options is far from trivial. Is there a way to programmatically find the optimal solution?

The road to dynamic programming

First, let's define what are trying to find. A portofolio is a multiset of ( o,n ) pairs, each pair representing that we are assigning n CU to option o . For this solution to be acceptable, all the pairs need to be valid and the total sum of all pairs must not exceed the total amount of money we want to use. We are trying to find the portofolio which provides the maximum profit.

Brute force

To understand the solution space a bit more let's devise a way to enumerate all the portofolios, which is in fact not trivial.

One way to think about this is that to produce a portofolio we first have to split our total amount in some way, and assign an option to each element in the split. Each such split is formally called a partition of n . The number of partitions for a number n is given by the partition function p ( n ) which becomes spectacular very quickly. For each such partition we also need to factor in all the different acceptable option permutations.

Another way to think about this, which turns out to be more helpful, is with a recursive, constructive view of the problem. Let P n be the set of portfolios for n total CU. From this set, the subset of porfolios that use option o with k CU consists of all portofolios with n k total CU combined with the option ( o,k ) . Expressing this with set notation we have ( is the multiset sum operator, which in our case is used to add items to the multiset):

P n ( o,k )= { Q { ( o,k ) }| Q P n k }

The whole set P n is the union of all P n ( o,k ) sets for all valid combinations of o and k :

O k = { Options that accept k }
P n = 1 k n o O k P n ( o,k )

Using this recursive set description of the solution space we can write a program to enumerate all the solutions for a particular amount and set of options. This description provides a straightforward, albeit rather inefficient, way to solve this problem: by brute force — evaluate all solutions and use the best. In pseudocode this would look like:

    solutions = {}

    for i in [1..n]:
        for o in OPTIONS:
            if o accepts i:
                for s in (solve_brute_force(n-i) ∪ {∅}):
                    solutions = solutions ∪ (s ⊎ {(o,k)})

    return solutions

    return argmax(solve_brute_force(n), profit)

Optimal substructure

Let's see if this problem has any interesting properties we can take advantage of to find the optimal solution more efficiently.

If we look at P n ( o,k ) we see that we are combining ( o,k ) with all portofolios for n k , and we ultimately check all these combinations to find the best solution. This is wasteful, however, since the choice of ( o,k ) doesn't affect the profit of any of the members of P n k . To put it more formally, the profit function is an additive multiset function:

Pr of it ( P n ( o,k ))= Pr of it ( Q { ( o,k ) } )= Pr of it ( Q )+ Pr of it (( o,k )) , Q P n k

We can thus get the best total value by using one of the members of the P n ( o,k ) set with the highest profit. Any other choice from that set would yield a lower total profit. We have thus shown that our problem exhibits the optimal substructure property: an optimal solution includes optimal solutions to subproblems.

Armed with this revelation we can now produce a much more efficient formulation:

P n ( o,k )= Q { ( o,k ) } ,Q is any one el ement of ar g max S P n k Pr of it ( S )

This leads us to the following pseudocode:

    solutions = {}

    for i in [1..n]:
        for o in OPTIONS:
            if o accepts i:
                solutions = solutions ∪
                            (solve_optimal_substructure(n-i) ⊎

    return argmax(solutions, profit)

    return solve_optimal_substructure(n)

Overlapping subproblems

Another interesting thing to note is that to solve for n we need to solve for n 1 .. . 1 . Similarly, to solve for n 1 we need to solve for n 2 .. . 1 and so on. We are solving the same problems over and over again! Our problem thus exhibits the property of overlapping subproblems. We can take advantage of this property by storing results in a cache and reusing them instead of recalculating them.

Updating our original pseudocode to use a cache we have:

    if n in CACHE: return CACHE[n]

    solutions = {}

    for i in [1..n]:
        for o in OPTIONS:
            if o accepts i:
                for s in (solve_overlapping_subproblems(n-i) ∪ {∅}):
                    solutions = solutions ∪ (s ⊎ {(o,k)})

    CACHE[n] = solutions
    return CACHE[n]

    return argmax(solve_overlapping_subproblems(n), profit)

Dynamic programming

If we combine both optimizations, taking advantage of both the optimal substructure property and the overlapping subproblems property, we reach the dynamic programming solution:

    if n in CACHE: return CACHE[n]
    solutions = {}

    for i in [1..n]:
        for o in OPTIONS:
            if o accepts i:
                solutions = solutions ∪
                            (solve_dp_recursive(n-i) ⊎ {(o,k)})

    CACHE[n] = argmax(solutions, profit)
    return CACHE[n]

    return solve_dp_recursive(n)

Alternatively, we can express this without recursion:

    solutions = {}

    for i in [1..n]:
        for o in OPTIONS:
            if o accepts i:
                solutions = solutions ∪ (CACHE[n-i] ⊎ {(o,k)})

    return argmax(solutions, profit)

    for i in [1..n]:
        CACHE[i] = solve_dp_iterative(i)
    return CACHE[n]

Scaling down

Instances of this problem with a large n can still be prohibitive to solve even when using the dynamic programming approach I just described. Not all is lost, however, if we are willing to relax our requirement for a guaranteed optimal solution.

One approach to cutting down our solution time, while still producing a possibly good solution, is to scale down our problem. We do this by dividing the start, end and bonus values of all options, and also the target number for which we solve for by a particular scale factor. Another way to view this is that scaling down by scale corresponds to performing the algorithms with a minimum step of value scale instead of value 1.

For example if we scaled down our original problem by 10 we would solve for 172 (= 1720 / 10) and use the following scaled down options:

Option#min amountmax amountinterestsign up bonus

When we find the optimal solution we can scale it back up using the same scale factor.

The expectation is that the optimal solution for the scaled down version of our problem is going to be in the proximity of the optimal solution for the original version, and assuming that the Profit function is generally smooth, the profit is also going to be near the optimal one.

Getting greedy

Another approach to solving this problem faster is to explore whether there is greedy solution for it. Below I will describe an approach that works in many cases, but is not guaranteed to provide the optimal solution.

Taking inspiration from the greedy solution to the fractional knapsack problem, at each step we greedily select to put as many CUs as possible in the option with the best marginal profit. The key observation here is that in our case each option has two different marginal profits. The first one involves investing in a new instance of an option using its minimum acceptable amount. The second one involves investing more in an instance of an option we have already invested in. In the first case the sign up bonus kicks in and increases the marginal profit. In the second case the marginal profit is simply the interest.

For our original scenario we have:

Option#min amountmax amountmin marginal profitadd marginal profit

For a total amount of 1720 this method works flawlessly. We first select to add ( o 5 , 1000) to our portofolio for a marginal profit of 2.4%. From the remaining 720 we add ( o 3 , 300) for a marginal profit of 2.333%. From the remaining 420 we again choose ( o 3 , 300) . We now have 120 left, for which we choose ( o 3 , 100) , and the final 20 we add to the ( o 5 , 1000) instance we already have.

Unfortunately, this method doesn't find the optimal solution in all cases. Take for example a total amount of 500. The greedy algorithm chooses ( o 3 , 300) , ( o 2 , 100) , ( o 2 , 100) for a total profit of 11.2. Alas, the optimal solution is simply ( o 4 , 500) for a total profit of 11.4.

Perhaps there is some other greedy approach that provides optimal solutions in all cases. Still, the described greedy approach is useful if we want to quickly get a possibly good solution. We could also try both the greedy approach and the scaling approach to raise our confidence level in the generated solution.

Can we do better?

We have mostly focused on solving our investment problem using a dynamic programming approach. Taking a step back, we may ask if this is indeed the best way to get an optimal solution. Can we approach this problem from a completely different angle to improve performance?

It turns out that the answer is yes! With some creativity our problem can be expressed as an integer linear program (ILP), for which we have algorithms that work quite efficiently in practice. I plan to present this approach in more detail in a future post.

An ounce of action...

If you want to try out the ideas mentioned in this post, I have created a program implementing all the algorithms and variations. You can find it at:


by Alexandros Frantzis at April 20, 2019 03:00 PM

April 16, 2019

Jeremy Whiting

Uninitialized member variables

Dear lazyweb,

In the past week or so I've been bitten twice by failing to initialize member variables in a C++ class's constructor. So I went looking for compiler options, static analyzers, etc. to tell me when I fail to do this. So far I've found nothing that correctly reports to me that I forgot to add m_foobar initialization to my constructor. /Wall on msvc -Weff-c++, cppcheck, etc. all fail me here. Isn't there something out there that will say "Jeremy, you dork, you forgot to initialize m_startCount as 0, you'll get garbage" (sometimes and only on M$ Windows, but still) ?

by Jeremy Whiting ( at April 16, 2019 03:55 PM

April 01, 2019

Alyssa Rosenzweig

Kodi and SuperTuxKart on Panfrost

Back in October, Panfrost ran some simple benchmarks, like glmark. Five months later, Panfrost has grown from running benchmarks to real-world apps, like Kodi, and 3D games like SuperTuxKart and Neverball.

Since the previous post, there have been major improvements across every part of the aspect culminating in this milestone. On the kernel side, my co-contributors Tomeu Vizoso and Rob Herring have created a modern kernel driver, suitable for mainline inclusion. Panfrost now uses this upstream-friendly driver, rather than relying on a modified legacy kernel as in the past. The new kernel module is currently under review for mainline inclusion. You can read more about this progress on Tomeu’s blog.

Outside the kernel, however, the changes have been no less significant. Early development was constrained to our own project repositories, as the code was not yet ready to general users. In early February, thanks in part to the progress on the kernel-space, we flew out of our nest, and Panfrost was merged into upstream Mesa, the central repository for free software graphics drivers. Development now occurs in-tree in Mesa.

We have continued decoding new aspects of the hardware and implementing support in the driver. A few miscellaneous additions include cube maps, gl_PointSize and gl_PointCoord, linear depth rendering, performance counters, and new shader instructions.

One area of particular improvement has been our understanding of the hardware formats (like “4-element vector of 32-bit floats” or “single 16-bit unsigned normalized integer”). In Panfrost’s early days, we knew magic numbers to distinguish a few of the most common formats, but the underlying meanings of the voodoo patterns were elusive. Further, the format bits for textures and attributes were not unified, further hindering the diversity of supported formats available. However, Connor Abbott and I have since identified the underlying meaning of the format codes for textures, attributes, and framebuffers. This new understanding allows for the magic numbers to be replaced by a streamlined format selection routine, mapping Gallium’s formats to the hardware’s and supporting the full spectrum of formats required for a conformant driver. Panfrost is now passing texture format tests for OpenGL ES 2.0.

From a performance standpoint, various optimizations have been added. In particular, a fast path likely relating to the “tiler” in the hardware was discovered. When this fast path is used, performance on geometry heavy scenes skyrockets. In one extreme demo (shading the Stanford bunny), performance more than tripled, and these gains trickle down to real-world games.

Features aside, one of the key issues with an early driver is the brittleness and instability. Accordingly, to guarantee robustness, I now test with the drawElements Quality Program (dEQP), which includes comprehensive code correctness tests. Although we’re still a while away from conformance, I now systematically step through identified issues and resolved the bugs, translating to fixes across every aspect of the driver.

One real-world benefactor of these fixes is the Kodi media center, which today works well using Panfrost to achieve a fluid interface on Midgard devices. For standalone installations of Kodi, today there are experimental images featuring Kodi and Panfrost. To further improve fluidity, Kodi and Panfrost can even interoperate with the video decoding acceleration, contingent on cooperative kernel drivers.

For users more inclined to gaming, some 3D games are beginning to show signs of life with Panfrost. For instance, the classic (OpenGL ES 2.0) backend of the ever-popular kart racing game, SuperTuxKart, now renders with some minor glitches with Panfrost. Performance is playable on simple tracks, though we have many opportunities for optimization. To bring up this racing game, I added support for complex control flow in the compiler. Traditionally, control flow is discouraged in graphics, due to the architecture of desktop GPUs (thread “warps”). However, Midgard does not feature these traditional optimizations, negating the performance penalty for branching from control flow. The implementation required new bookkeeping in the compiler, as well as an investigation into long jumps due to the size of the game’s “uber-shader”. In total, this compiler improvement – paired with assorted bug fixes – allows SuperTuxKart to run.

Likewise, Neverball is playable (and fun!) with Panfrost, although there are rendering anomalies relating to the currently unimplemented legacy feature “point sprites”. In contrast to Kodi and SuperTuxKart, which make liberal use of custom shaders, Neverball is implemented with purely fixed-function desktop OpenGL. This poses an interesting challenge, as Midgard is designed specifically for embedded graphics; the blob does not support this desktop superset. But that’s no reason we can’t!

Like most modern free software OpenGL drivers, Panfrost is built atop the modular “Gallium” architecture. This architecture abstracts away interface details, like desktop versus embedded OpenGL, normalizing differences to allow drivers to focus on the hardware itself. This abstraction means that by implementing Panfrost as an embedded driver atop Gallium, we get a partial desktop OpenGL implementation “free”.

Of course, there is functionality in the desktop superset that does not exist in the embedded profile. While Gallium tries to paper over these differences, the driver is required to implement features like point sprites and alpha testing to expose the corresponding desktop functions. So, the bring-up of desktop OpenGL applications like Neverball has led me to implement some of this additional functionality. Translating the “alpha test” to a conditional discard instruction in the fragment shader works. Similarly, translating “point sprites” to the modern equivalent, gl_PointCoord, is planned.

Interestingly, the hardware does support some functionality only available through the full desktop profile. It is unknown how many “hidden features” of this type are supported; as the blob does not appear to use them, these features were discovered purely by accident on our part. For instance, in addition to the familiar set of “points, lines, and triangles”, Midgard can natively render quadrilaterals and polygons. The existence of this feature will suggested by the corresponding performance counters, and the driver-side mechanics were determined by manual bruteforce of the primitive selection bits. Nevertheless, now that these bonus features are understood, quads can be drawn from desktop applications without first translating to indexed triangles in software. Similarly, it appears in addition to the embedded standard of boolean occlusion queries, setting a chicken bit enable the hardware’s hidden support for precise occlusion counters, a useful desktop feature.

Going forward, although the implementation of OpenGL ES 2.0 is approaching feature-completeness, we will continue to polish the driver, guided by dEQP. Orthogonal to conformance, further optimization to improve performance and lower memory usage is on the roadmap.

It’s incredible to reflect back and realise just one year ago, work had not even begun on writing a real OpenGL driver. Yet here we are today with an increasingly usable, exclusive free software, hardware-accelerated desktop with Mali Midgard graphics.

Frost on.

April 01, 2019 04:00 AM

March 05, 2019

Tomeu Vizoso

Panfrost update: a new kernel driver

The video

Below you can see the same scene that I recorded in January, which was rendered by Panfrost in Mesa but using Arm's kernel driver. This time, Panfrost is using a new kernel driver that is in a form close to be acceptable in the mainline kernel:

The history behind it

During the past two months Rob Herring and I have been working on a new driver for Midgard and Bifrost GPUs that could be accepted mainline.

Arm already maintains a driver out of tree with an acceptable open source license, but it doesn't implement the DRM ABI and several design considerations make it unsuitable for inclusion in mainline Linux.

The absence of a driver in mainline prevents users from keeping their kernels up-to-date and hurts integration with other parts of the free software stack. It also discourages SoC and BSP vendors from submitting their code to mainline, and hurts their ability to track mainline closely.

Besides the code of the driver itself, there's one more condition for mainline inclusion: an open source implementation of the userspace library needs to exist, so other kernel contributors can help verifying, debugging and maintaining the kernel driver. It's an enormous pile of difficult work to reverse engineer the inner workings of a GPU and then implement a compiler and command submission infrastructure, so big thanks to Alyssa Rosenzweig for leading that effort.

Upstream status

Most of the Panfrost code is already part of mainline Mesa, with the code that directly interacts with the new DRM driver being in the review stage. Currently targeted GPUs are T760 and T860, with the RK3399 being the SoC more often used for testing.

The kernel driver is being developed in the open and though we are trying to follow the best practices as displayed by other DRM drivers, there's a number of tasks that need to be done before we consider it ready for submission.

The work ahead

In the kernel:
- Make MMU code more complete for correctness and better performance
- Handle errors and hangs and correctly reset the GPU
- Improve fence handling
- Test with compute shaders (to check completeness of the ABI)
- Lots of cleanups and bug fixing!

In Mesa:
- Get GNOME Shell working
- Get Chromium working with accelerated WebGL
- Get all of glmark2 working
- Get a decent subset of dEQP passing and use it in CI
- Keep refactoring the code
- Support more hardware

Get the code

The exact bits used for the demo recorded above are in various stages of getting upstreamed to the various upstreams, but here are in branches for easier reproduction:

by Unknown ( at March 05, 2019 06:33 AM

February 23, 2019

Martyn Welch

Modeling Parts for 3D Printing with Shortcuts

A while back the kerf plate on my chop saw got damaged. It’s not a well known make (at least, not to me, it’s a “Rexon M2500AE” that I got second hand), so sourcing a replacement wouldn’t be trivial. As time has gone on, it’s got more and more damaged, to the point that the saw is getting dangerous to use. So rather than use it in it’s increasingly dangerous state or replacing the saw, I decided to try and make a replacement on my 3D printer.

February 23, 2019 10:55 PM

January 24, 2019

Andrew Shadura

Bug Squashing Party: Bratislava, 9/10 February 2019

As I previously announced, I’m organising a BSP in Bratislava during the weekend following FOSDEM. It will be happening at the same time as the BSP in Berlin, so if it’s not practical or possible for you to come to Berlin, consider coming around here.

The venue this time is, café/coworking space/maker space in the centre of Bratislava in the building of the old market hall.

The old market hall in the winter


The venue located at the boundary of the Old Town, so while it is easily accessible by foot or public transport, it is a bit complicated to reach by car.

If you’re not from Bratislava, you will likely be arriving by train (to the main station, Bratislava or bus/coach (to the bus station). From the train station, take tram number 1 to the stop Námestie SNP; the tram will stop in from of the market hall. If arriving by bus, it’s best to walk, but if you prefer buses, take either 205 or X72 to the final stop Nemocnica sv. Michala.

If you’re coming from Vienna, both train and coach timetables are available from, e.g. by using this link.

If you decide you’d like to attend, please sign up here: Please indicate the which days you can attend, and whether you will be attending in before the lunch time or after or both.

For more up-to-date information, see the wiki page:

See you at the bug squashing party!

by Andrej Shadura at January 24, 2019 10:49 AM

January 09, 2019

Andrew Shadura

resvg: worth having in Debian?

Yesterday I have discovered resvg, an MPL 2.0-licensed SVG rendering and optimisation library and a tool, written in Rust. It is said to be faster than some SVG renderers while currently slower than librsvg:

Speed comparison for the Elementary Icon Theme with resvg/cairo being faster than QtSvg but slower than librsvg and resvg/qt being slower than QtSvg Speed comparison for the Oxygen Icon Theme with resvg/cairo being faster than QtSvg but slower than librsvg and resvg/qt being slower than QtSvg

It aims to support the static subset of SVG better than other libraries:

SVG test suite results: resvg 249, Inkscape 253, librsvg 222

The author writes:

One of the major differences from other rendering libraries is that resvg does a lot of preprocessing before rendering. It converts shapes to paths, resolves attributes, removes groups and invisible elements, fixes a lot of issues in malformed SVG files. Then it creates a simple render tree with all elements and attributes resolved. And only then it starts to render. So it's very easy to implement a new rendering backend.

  • librsvg, currently, is heavily tied to the cairo library, unlike resvg
  • librsvg is heavily tied to GNOME which makes it painful to distribute outside the Linux ecosystem
  • librsvg doesn't really preprocess input files, rendering them as is
  • librsvg has a minimal support of the edge-cases, which leads to rendering errors

I’m thinking of packaging this for Debian, but I would be interested to know what others think of this.

by Andrej Shadura at January 09, 2019 10:05 AM

Help the Conservancy raise the remaining $14 000

The Software Freedom Conservancy is having the last 7 days to collect the remaining less than $14 000 of the fundraiser generously matched by Private Internet Access. All donations up to $90 000 will be matched until 15 January.

Conservancy is an organisation sponsoring nearly 50 free software projects helping them, most importantly with accounting, paying developers and defending their trademarks and ensuring license compliance.

Conservancy is currently home to almost fifty member projects

Read more about what the Conservancy does on their website. The matching funds expire soon, so if you can, please donate directly before 15 January.

by Andrej Shadura at January 09, 2019 09:37 AM

January 07, 2019

Tomeu Vizoso

A Panfrost milestone

The video

Below you can see glmark2 running as a Wayland client in Weston, on a NanoPC -T4 (so a RK3399 SoC with a Mali T-864 GPU)). It's much smoother than on the video, which is limited to 5FPS by the webcam.

Weston is running with the DRM backend and the GL renderer.

The history behind it

For more than 10 years, at Collabora we have been happily helping our customers to make the most of their hardware by running free software.

One area some of us have specially enjoyed working on has been open drivers for GPUs, which for a long time have been considered the next frontier in the quest to have a full software platform that companies and individuals can understand, improve and fix without having to ask for permission first.

Something that has saddened me a bit has been our reduced ability to help those customers that for one reason or another had chosen a hardware platform with ARM Mali GPUs, as no open driver was available for those.

While our biggest customers were able to get a high level of support from the vendors in order to have the Mali graphics stack well integrated with the rest of their product, the smaller ones had a much harder time in achieving that level of integration, which manifested in reduced performance, increased power consumption and slipped milestones.

That's why we have been following with great interest the several efforts that aimed to come up with an open driver for GPUs in the Mali family, one similar to those already existing for Qualcomm, NVIDIA and Vivante.

At XDC last year we had the chance of meeting the people involved in the latest effort to develop such a driver: Panfrost. And in the months that followed I made some room in my backlog to come up with a plan to give the effort a boost.

At that point, Panfrost was only able to get its bits in the screen by an elaborate hack that involved copying each frame into a X11 SHM buffer, which besides making the setup of the development environment much more cumbersome, invalidated any performance analysis. It also limited testing to demos such as glmark2.

Due to my previous work on Etnaviv I was already familiar with the abstractions in Mesa for setups in which the display of buffers is performed by a device different from the GPU so it was just a matter of seeing how we could get the kernel driver for the Mali GPU to play well with the rest of the stack.

So during the past month or so I have come up with a proper implementation of the winsys abstraction that makes use of ARM's kernel driver. The result is that now developers have a better base on which to work on the rendering side of things.

By properly creating, exporting and importing buffers, we can now run applications on GBM, from demos such as kmscube and glmark2 to compositors such as Weston, but also big applications such as Kodi. We are also supporting zero-copy display of GPU-rendered clients in Weston.

This should make it much easier to work on the rendering side of things, and work on a proper DRM driver in the mainline kernel can proceed in parallel.

For those interested in joining to the effort, Alyssa has graciously taken the time to update the instructions to build and test Panfrost. You can join us at #panfrost in Freenode and can start sending merge requests to Gitlab.

Thanks to Collabora for sponsoring this work and to Alyssa Rosenzweig and Lyude Paul for their previous work and for answering my questions.

by Unknown ( at January 07, 2019 12:33 PM

January 02, 2019

Andrew Shadura

wpa-supplicant and hostapd 2.7 in Debian

Hostapd and wpa-supplicant 2.7 have been in Debian experimental for some time already, with snapshots available since May 2018, and the official release since 3 December 2018. I’ve been using those 2.7 snapshots myself since May, but I do realise my x250 with an Intel Wi-Fi card is probably not the most representative example of hardware wpa-supplicant would often run on, so before I upload 2.7 to unstable, it would be great if more people tested it. So please try to install it from experimental and see if it works for your use cases. In the latest upload, I have enabled a bunch of new upstream features which previously didn’t exist or were still experimental, so it would be great to give them a go.

by Andrej Shadura at January 02, 2019 12:29 PM

Bye-bye binary vconfig(1)

This morning I have decided that this is the time. The time to finally remove the binary vconfig utility (which used to help people configure VLANs) from Debian. But fear not, the command isn’t going anywhere (yet), since almost six years ago I’ve written a shell script that replaces it, using ip(8) instead of the old and deprecated API.

If you’re still using vconfig, please give it a test and consider moving to better, newer ways of configuring your VLANs.

If you’re not sure whether you’re using it or not, mostly likely not only you aren’t, but it’s quite possible that you may not even need the vlan package that ships vconfig, since the most important functionality of it has since been implemented in ifupdown, networkd and NetworkManager.

by Andrej Shadura at January 02, 2019 12:01 PM

December 27, 2018

Martyn Welch

Long Overdue Website Update Complete

I’ve had a couple of domains for quite a few years now and have been hosting websites on them of some description since I got them. Over time my requirements and importantly the time, energy and enthusiasm I have available to maintain them has changed. I haven’t been a care-free uni student for well over a decade now and other interests and responsibilities now demand time I’d once happily devote to them.

December 27, 2018 06:13 PM

November 15, 2018

Martyn Welch

Management Hacking

In an ideal world, everyone would implicitly understand that it just makes good business sense to upstream some of the modifications made when creating your Linux powered devices. However this is a long way from being common knowledge and is still something that a lot of managers that will need convincing that this infact in their best interests. Just so that we are clear, I’m not suggesting here that your next Linux powered device should be an entirely open design.

November 15, 2018 04:51 PM

November 06, 2018

Xavier Claessens

Speed up your GitLab CI

GNOME GitLab has AWS runners, but they are used only when pushing code into a GNOME upstream repository, not when you push into your personal fork. For personal forks there is only one (AFAIK) shared runner and you could be waiting for hours before it picks your job.

But did you know you can register your own PC, or a spare laptop collecting dust in a drawer, to get instant continuous integration (CI) going? It’s really easy to setup!

1. Install docker

apt install

2. Install gitlab-runner

Follow the instructions here:

(Note: The Ubuntu 18.04 package doesn’t seem to work.)

3. Install & start the GitLab runner service

sudo gitlab-runner install
sudo gitlab-runner start

4. Find the registration token

Go to your gitlab project page, settings -> CI/CD -> expand “runners”

5. Register your runner

sudo gitlab-runner register --non-interactive --url --executor docker --docker-image fedora:27 --registration-token **

You can repeat step 5 with the registration token of all your personal forks in the same GitLab instance. To make this easier, here’s a snippet I wrote in my ~/.bashrc to register my “builder.local” machine on a new project. Use it as gitlab-register .

function gitlab-register {

  case "$host" in

  cmd="sudo gitlab-runner register --non-interactive --url $host --executor docker --docker-image fedora:27 --registration-token $token"


  ssh builder.local -t "$cmd"

Not only will you now get faster CI, but you’ll also reduce the queue on the shared runner for others!

by xclaesse at November 06, 2018 09:15 PM

November 03, 2018

Alexandros Frantzis

Metrics for test suite comprehensiveness

In a previous post I discussed a few FOSS specific mentalities and practices that I believe play a role in discouraging adoption of comprehensive automated testing in FOSS. One of the points that came up in discussions, is whether the basic premise of the post, that FOSS projects don't typically employ comprehensive automated testing, including not having any tests at all, is actually true. That's a valid concern, given that the post was motivated by my long-term observations working on and with FOSS and didn't provide any further data. In this post will try to address this concern.

The main question is how we can measure the comprehensiveness of a test suite. Code coverage is the standard metric used in the industry and makes intuitive sense. However, it presents some difficulties for large scale surveys, since it's not computationally cheap to produce and often requires per project changes or arrangements.

I would like to propose and explore two alternative metrics that are easier to produce, and are therefore better suited to large scale surveys.

The first metric is the test commit ratio of the codebase — the number of commits that affect test code as a percentage of all commits. Ideally, every change that adds a feature or fixes a bug in the production code should be accompanied by a corresponding change in the test code. The more we depart from this ideal, and, hence, the less often we update the test code, the less comprehensive our test suite tends to be. This metric is affected by the project's particular commit practices, so some amount of variance is expected even between projects considered to be well tested.

The second metric is the test code size ratio of the codebase — the size of the test code as a percentage of the size of all the code. It makes sense intuitively that, typically, more test code will be able to test more production code. That being said, the size itself does not provide the whole picture. Depending on the project, a compact test suite may be adequately comprehensive, or, conversely, large test data files may skew this metric.

Neither of these metrics is failproof, but my hypothesis is that when combined and aggregated over many projects they can provide a good indication about the overall trend of the comprehensiveness of test suites, which is the main goal of this post.

Let's see what these metrics give us for FOSS projects. I chose two software suites that are considered quite prominent in the FOSS world, namely GNOME and KDE, which together consist of over 1500 projects.

A small number of these projects are not good candidates for this survey, because, for example, they are empty, or are pure documentation. Although they are included in the survey, their count is low enough to not affect the overall trend reported below.

Here is the distribution of the percentages of commits affecting test code in GNOME and KDE projects:


Here is the distribution of the percentages of test code size in GNOME and KDE projects:


The first thing to notice is the tall lines in the left part of both graphs. For the second graph this shows that a very large percentage of the projects, roughly 55%, have either no tests at all, or so few as to be practically non-existent. Another interesting observation is the position of the 80% percentile lines, which show that 80% of the projects have test commit ratios less than 11.2%, and test code size ratios less than 8.8%.

In other words, out of ten commits that change the code base, only about one (or fewer) touches the tests in the majority of the projects. Although this doesn't constitute indisputable proof that tests are not comprehensive, it is nevertheless a big red flag, especially when combined with low test code size percentages. Each project may have different needs and development patterns, and these numbers need to be interpreted with care, but as a general trend this is not encouraging.

On the bright side, there are some projects with higher values in this distribution. It's no surprise that this set consists mainly of core libraries from these software suites, but does not include many end-user applications.

Going off on a slight tangent, one may argue that the distribution is unfairly skewed since many of these projects are GUI applications which, according to conventional wisdom, are not easy to test. However, this argument fails on multiple fronts. First, it's not unfair to include these programs, because we expect no less of them in terms of quality compared to other types of programs. They don't get a free pass because they have a GUI. In addition, being a GUI program is not a valid excuse for inadequate testing, because although testing the UI itself, or the functionality through the UI, may not be easy, there is typically a lot more we can test. Programs provide some core domain functionality, which we should be able to test independently if we decouple our core domain logic from the UI, often by using a different architecture, for example, the Hexagonal Architecture.

After having seen some general trends, let's see some examples of individual codebases that do better in these metrics:


This graph displays quite a diverse collection of projects including a database, graphics libraries, a GUI toolkit, a display compositor, system tools and even a GUI application. These projects are considered to be relatively well tested, each in its own particular way, so these higher numbers provide some evidence that the metrics correlate with test comprehensiveness.

If we accept this correlation, this collection also shows that we can achieve more comprehensive testing in a wide variety of projects. Depending on project, the trade-offs we need to make will differ, but it is almost always possible to do well in this area.

The interpretation of individual results varies with the project, but, in general, I have found that the test commit ratio is typically a more reliable indicator of test comprehensiveness, since it's less sensitive to test specifics compared to test code size ratio.

Tools and Data

In order to produce the data, I developed the git-test-annotate program which produces a list of files and commits from a git repository and marks them as related to testing or not.

git-test-annotate decides whether a file is a test file by checking for the string "test" anywhere in the file's path within the repository. This is a very simple heurestic, but works surprisingly well. In order to make test code size calculation more meaningful, the tool ignores files that are not typically considered testable sources, for example, non-text files and translations, both in the production and the test code.

For commit annotations, only mainline commits are taken into account, To check if a commit affects the tests the tool checks if it changes at least one file with "test" in its path.

To get the stats for KDE and GNOME I downloaded all their projects from their github organizations/mirrors and ran the git-test-annotate tool on each project. All the annotated data and a python script to process them are available in the foss-test-annotations repository.


I hope this post has provided some useful information about the utility of the proposed metrics, and some evidence that there is ample room for improvement in automated testing of FOSS projects. It would certainly be interesting to perform a more rigorous investigation to evaluate how well these metrics correlate with code coverage.

Before closing, I would like to mention that there are cases where projects are primarily tested through external test suites. Examples in the FOSS world are the piglit suite for Mesa, and various tests suites for the Linux kernel. In such cases, project test comprehensiveness metrics don't provide the complete picture, since they don't take into account the external tests. These metrics are still useful though, because external suites typically perform functional or conformance testing only, and the metrics can provide information about internal testing, for example unit testing, done within the projects themselves.

by Alexandros Frantzis at November 03, 2018 02:00 PM

October 31, 2018

Erik Faye-Lund

Introducing Zink: OpenGL on Vulkan

For the last month or so, I’ve been playing with a new project during my work at Collabora, and as I’ve already briefly talked about at XDC 2018, it’s about time to talk about it to a wider audience.

What is Zink?

Zink is an OpenGL implementation on top of Vulkan. Or to be a bit more specific, Zink is a Mesa Gallium driver that leverage the existing OpenGL implementation in Mesa to provide hardware accelerated OpenGL when only a Vulkan driver is available.

glxgears on Zink glxgears on Zink

Here’s an overview of how this fits into the Mesa architecture, for those unfamiliar with it:

Gallium OpenGL State Tracker
Other Gallium drivers
Architectural overview

Why implement OpenGL on top of Vulkan?

There’s several motivation behind this project, but let’s list a few:

  1. Simplify the graphics stack
  2. Lessen the work-load for future GPU drivers
  3. Enable more integration
  4. Support application porting to Vulkan

I’ll go through each of these points in more detail below.

But there’s another, less concrete reason; someone had to do this. I was waiting for someone else to do it before me, but nobody seemed to actually go ahead. At least as long as you don’t count solutions who only implement some variation of OpenGL ES (which in my opinion doesn’t solve the problem; we need full OpenGL for this to be really valuable).

1. Simplifying the graphics stack

One problem is that OpenGL is a big API with a lot of legacy stuff that has accumulated since its initial release in 1992. OpenGL is well-established as a requirement for applications and desktop compositors.

But since the very successful release of Vulkan, we now have two main-stream APIs for essentially the same hardware functionality.

It’s not looking like neither OpenGL nor Vulkan is going away, and the software-world is now hard at work implementing Vulkan support everywhere, which is great. But this leads to complexity. So my hope is that we can simplify things here, by only require things like desktop compositors to support one API down the road. We’re not there yet, though; not all hardware has a Vulkan-driver, and some older hardware can’t even support it. But at some point in the not too far future, we’ll probably get there.

This means there might be a future where OpenGL’s role could purely be one of legacy application compatibility. Perhaps Zink can help making that future a bit closer?

2. Lessen the work-load for future GPU drivers

The amount of drivers to maintain is only growing, and we want the amount of code to maintain for legacy hardware to be as little as possible. And since Vulkan is a requirement already, maybe we can get good enough performance through emulation?

Besides, in the Open Source world, there’s even new drivers being written for old hardware, and if the hardware is capable of supporting Vulkan, it could make sense to only support Vulkan “natively”, and do OpenGL through Zink.

It all comes down to the economics here. There aren’t infinite programmers out there that can maintain every GPU driver forever. But if we can make it easier and cheaper, maybe we can get better driver-support in the long run?

3. Enable more integration

Because Zink is implemented as a Gallium driver in Mesa, there’s some interesting side-benefits that comes “for free”. For instance, projects like Gallium Nine or Clover could in theory work on top of the i965 Vulkan driver through Zink. Please note that this hasn’t really been tested, though.

It should also be possible to run Zink on top of a closed-source Vulkan driver, and still get proper window system integration. Not that I promote the idea of using a closed-source Vulkan driver.

4. Support application porting to Vulkan

This might sound a bit strange, but it might be possible to extend Zink in ways where it can act as a cooperation-layer between OpenGL and Vulkan code in the same application.

The thing is, big CAD applications etc won’t realistically rewrite all of their rendering-code to Vulkan in a wave of a hand. So if they can for instance prototype some Vulkan-code inside an OpenGL application, it might be easier to figure out if Vulkan is worth it or not for them.

What does Zink require?

Zink currently requires a Vulkan 1.0 implementation, with the following extensions (there’s a few more, due to extensions requiring other extensions, but I’ve decided to omit those for simplicity):

  • VK_KHR_maintenance1: This is required for the viewport flipping. It’s also possible to do without this extension, and we have some experimental patches for that. I would certainly love to require as few extensions as possible.
  • VK_KHR_external_memory_fd: This is required as a way of getting the rendered result on screen. This isn’t technically a hard requirement, as we also have a copy-based approach, but that’s almost unusably slow. And I’m not sure if we’ll bother keeping it around.

Zink has to my knowledge only been tested on Linux. I don’t think there’s any major reasons why it wouldn’t run on any other operating system supporting Vulkan, apart from the fact that some window-system integration code might have to be written.

What does Zink support?

Right now, it’s not super-impressive: we implement OpenGL 2.1, and OpenGL ES 1.1 and 2.0 plus some extensions. Please note that the list of extensions might depend on the Vulkan implementation backing this, as we forward capabilities from that.

The list of extensions is too long to include here in a sane way, but here’s a link to the output of glxinfo as of today on top of i965.

Here’s some screenshots of applications and games we’ve tested that renders more or less correctly:

OpenArena on Zink OpenArena on Zink
Weston on Zink Weston on Zink
Quake 3 on Zink Quake 3 on Zink
Extreme Tux Racer on Zink Extreme Tux Racer on Zink

What doesn’t work?

Yeah, so when I say OpenGL 2.1, I’m ignoring some features that we simply do not support yet:

  • glPointSize() is currently not supported. Writing to gl_PointSize from the vertex shader does work. We need to write some code to plumb this through the vertex shader to make it work.
  • Texture borders are currently always black. This will also need some emulation code, due to Vulkan’s lack of arbitrary border-color support. Since a lot of hardware actually support this, perhaps we can introduce some extension to add it back to the API?
  • No control-flow is supported in the shaders at the moment. This is just because of lacking implementation for those opcodes. It’s coming.
  • No GL_ALPHA_TEST support yet. There’s some support code in NIR for this, we just need to start using it. This will depend on control-flow, though.
  • glShadeModel(GL_FLAT) isn’t supported yet. This isn’t particularly hard or anything, but we currently emit the SPIR-V before knowing the drawing-state. We should probably change this. Another alternative is to patch in a flat-decoration on the fly.
  • Different settings for glPolygonMode(GL_FRONT, ...) and glPolygonMode(GL_BACK, ...). This one is tricky to do correct, at least if we want to support newer shader-stages like geometry and tessellation at the same time. It’s also hard to do performant, even without these shader-stages, as we need to draw these primitives in the same order as they were specified but with different primitive types. Luckily, Vulkan can do pretty fast geometry submission, so there might be some hope for some compromise-solution, at least. It might also be possible to combine stream-out and a geometry-shader or something here if we really end up caring about this use-case.

And most importantly, we are not a conformant OpenGL implementation. I’m not saying we will never be, but as it currently stands, we do not do conformance testing, and as such we neither submit conformance results to Khronos.

It’s also worth noting that at this point, we tend to care more about applications than theoretical use-cases and synthetic tests. That of course doesn’t mean we do not care about correctness at all, it just means that we have plenty of work ahead of us, and the work that gets us most real-world benefit tends to take precedence. If you think otherwise, please send some patches! :wink:

What’s the performance-hit compared to a “native” OpenGL driver?

One thing should be very clear; a “native” OpenGL driver will always have a better performance-potential, simply because anything clever we do, they can do as well. So I don’t expect to beat any serious OpenGL drivers on performance any time soon.

But the performance loss is already kinda less than I feared, especially since we haven’t done anything particularly fancy with performance yet.

I don’t yet have any systematic benchmark-numbers, and we currently have some kinda stupid bottlenecks that should be very possible to solve. So I’m reluctant to spend much time on benchmarking until those are fixed. Let’s just say that I can play Quake 3 at tolerable frame rates right now ;)

But OK, I will say this: I currently get around 475 FPS on glxgears on top of Zink on my system. The i965 driver gives me around 1750 FPS. Don’t read too much into those results, though; glxgears isn’t a good benchmark. But for that particular workload, we’re about a quarter of the performance. As I said, I don’t think glxgears is a very good benchmark, but it’s the only thing somewhat reproducible that I’ve run so far, so it’s the only numbers I have. I’ll certainly be doing some proper benchmarking in the future.

In the end, I suspect that the pipeline-caching is going to be the big hot-spot. There’s a lot of state to hash, and finally compare once a hit has been found. We have some decent ideas on how to speed it up, but there’s probably going to be some point where we simply can’t get it any better.

But even then, perhaps we could introduce some OpenGL extension that allows an application to “freeze” the render-state into some objects, similar to Vertex Array Objects, and that way completely bypass this problem for applications willing to do a bit of support-code? The future will tell…

All in all, I’m not too worried about this yet. We’re still early in the project, and I don’t see any major, impenetrable walls.

How to use Zink

Zink is only available as source code at the moment. No distro-packages exits yet.


In order to build Zink, you need the following:

  • Git
  • Build dependencies to compile Mesa
  • Vulkan headers and libraries
  • Meson and Ninja


The code currently lives in the zink-branch in my Mesa fork.

The first thing you have to do, is to clone the repository and build the zink-branch. Even though Mesa has an autotools build-system, Zink only supports the Meson build-system. Remember to enable the zink gallium-driver (-Dgallium-drivers=zink) when configuring the build.

Install the driver somewhere appropriate, and use the $MESA_LOADER_DRIVER_OVERRIDE environment variable to force the zink-driver. From here you should be able to run many OpenGL applications using Zink.

Here’s a rough recipe:

git clone mesa-zink
Cloning into 'mesa-zink'...
Checking out files: 100% (5982/5982), done.
$ cd mesa-zink
$ git checkout zink
Branch 'zink' set up to track remote branch 'zink' from 'origin'.
Switched to a new branch 'zink'
$ meson --prefix=/tmp/zink -Dgallium-drivers=zink build-zink
The Meson build system
Found ninja-X.Y.Z at /usr/bin/ninja
$ ninja -C build-zink install
ninja: Entering directory `build-zink'
installing /home/kusma/temp/mesa-zink/build-zink/src/gallium/targets/dri/ to /tmp/zink/lib64/dri/
$ LIBGL_DRIVERS_PATH=/tmp/zink/lib64/dri/ MESA_LOADER_DRIVER_OVERRIDE=zink glxgears -info
GL_RENDERER   = zink (Intel(R) UHD Graphics 620 (Kabylake GT2))
GL_VERSION    = 2.1 Mesa 18.3.0-devel (git-395b12c2d7)
GL_VENDOR     = Collabora Ltd
GL_EXTENSIONS = GL_ARB_multisample GL_EXT_abgr ...

Submitting patches

Currently, the development happens on #dri-devel on Freenode. Ping me (my handle is kusma) with a link your branch, and I’ll take a look.

Where do we go from here?

Well, I think “forwards” is the only way to move :wink:. I’m currently working 1-2 days per week on this at Collabora, so things will keep moving forward on my end. In addition, Dave Airlie seems to have a high momentum at the moment also. He has a work-in-progress branch that hints at GL 3.3 being around the corner!

I also don’t think there’s any fundamental reason why we shouldn’t be able to get to full OpenGL 4.6 eventually.

Besides the features, I also want to try to get this upstream in Mesa in some not-too-distant future. I think we’re already beyond the point where Zink is useful.

I also would like to point out that David Airlie of RedHat has contributed a lot of great patches, greatly advancing Zink from what it was before his help! At this point, he has implemented at least as many features as I have. So this is very much his accomplishment as well.

by Erik Faye-Lund at October 31, 2018 04:52 PM

October 25, 2018

Tomeu Vizoso

How continuous integration can help you keep pace with the Linux kernel

Almost all of Collabora's customers use the Linux kernel on their products. Often they will use the exact code as delivered by the SBC vendors and we'll work with them in other parts of their software stack. But it's becoming increasingly common for our customers to adapt the kernel sources to the specific needs of their particular products.

A very big problem most of them have is that the kernel version they based on isn't getting security updates any more because it's already several years old. And the reason why companies are shipping kernels so old is that they have been so heavily modified compared to the upstream versions, that rebasing their trees on top of newer mainline releases is so expensive that is very hard to budget and plan for it.

To avoid that, we always recommend our customers to stay close to their upstreams, which implies rebasing often on top of new releases (typically LTS releases, with long term support). For the budgeting of that work to become possible, the size of the delta between mainline and downstream sources needs to be manageable, which is why we recommend contributing back any changes that aren't strictly specific to their products.

But even for those few companies that already have processes in place for upstreaming their changes and are rebasing regularly on top of new LTS releases, keeping up with mainline can be a substantial disruption of their production schedules. This is in part because new bugs will be in the new mainline release, and new bugs will be in the downstream changes as they get applied to the new version.

Those companies that are already keeping close to their upstreams typically have advanced QA infrastructure that will detect those bugs long before production, but a long stabilization phase after every rebase can significantly slow product development.

To improve this situation and encourage more companies to keep their efforts close to upstream we at Collabora have been working for a few years already in continuous integration of FOSS components across a diverse array of hardware. The initial work was sponsored by Bosch for one of their automotive projects, and since the start of 2016 Google has been sponsoring work on continuous integration of the mainline kernel.

One of the major efforts to continuously integrate the mainline Linux kernel codebase is, which builds several configurations of different trees and submits boot jobs to several labs around the world, collating the results. This is being of great help already in detecting at a very early stage any changes that either break the builds, or prevent a specific piece of hardware from completing the boot stage.

Though can easily detect when an update to a source code repository has introduced a bug, such updates can have several dozens of new commits, and without knowing which specific commit introduced the bug, we cannot identify culprits to notify of the problem. This means that either someone needs to monitor the dashboard for problems, or email notifications are sent to the owners of the repositories who then have to manually look for suspicious commits before getting in contact with their author.

To address this limitation, Google has asked us to look into improving the existing code for automatic bisection so it can be used right away when a regression is detected, so the possible culprits are notified right away without any manual intervention.

Another area in which is currently lacking is in the coverage of the testing. Build and boot regressions are very annoying for developers because they impact negatively everybody who work in the affected configurations and hardware, but the consequences of regressions in peripheral support or other subsystems that aren't involved critically during boot can still make rebases much costlier.

At Collabora we have had a strong interest in having the DRM subsystem under continuous integration and some time ago started a R&D project for making the test suite in IGT generically useful for all the DRM drivers. IGT started out being i915-specific, but as most of the tests exercise the generic DRM ABI, they could as well test other drivers with a moderate amount of effort. Early in 2016 Google started sponsoring this work and as of today submitters of new drivers are using it to validate their code.

Another related effort has been the addition to DRM of a generic ABI for retrieving CRCs of frames from different components in the graphics pipeline, so two frames can be compared when we know that they should match. And another one is adding support to IGT for the Chamelium board, which can simulate several display connections and hotplug events.

A side-effect of having continuous integration of changes in mainline is that when downstreams are sending back changes to reduce their delta, the risk of introducing regressions is much smaller and their contributions can be accepted faster and with less effort.

We believe that improved QA of FOSS components will expand the base of companies that can benefit from involvement in development upstream and are very excited by the changes that this will bring to the industry. If you are an engineer who cares about QA and FOSS, and would like to work with us on projects such as, LAVA, IGT and Chamelium, get in touch!

by Unknown ( at October 25, 2018 07:46 AM

October 20, 2018

Alyssa Rosenzweig

A Panfrostian October

In the past month-and-a-half since the previous update on the free (open source) Panfrost driver for Mali GPUs, progress has been fervent! During the interim, Lyude Paul and I presented about the driver at the X.Org Developer’s Conference. Check out the talk (slides). Benchmarks included!

Since the talk, we’ve been working on Panfrost like mad. Literally – I was pretty angry by that distorted texture bug! The culprit proved to be the compiler emitting imov (integer move) instructions rather than fmov (floating-point move), apparently causing precision degradation while processing texture coordinates.

In any event, implementing a few new features culminated in the above jellyfish demo. This jellyfish scene is a part of glmark2-es2, a set of OpenGL ES programs used to test and benchmark GPU drivers. For Panfrost, I use glmark as a set of milestones to work towards while developing the driver. glmark is comprehensive and can do just about anything I need while developing graphics drivers, except for making me a peanut butter and jellyfish sandwich. Er, never mind, it looks like they just added an OpenGL sandwich scene. Bonus!

Aside from the texturing issue, the next major challenge faced by the jellyfish scene was “uniform spilling”. Many GPUs require a special “load uniform” instruction in the shader to access a uniform; Midgard does not… usually. Instead, up to 16 (32-bit vec4) uniforms can be preloaded into registers, which makes them free to access in shaders. Yay performance!

The problem with this uniform loading scheme is when a shader needs more than 16 uniforms, which is frequent in vertex shaders loading large matrices. A single 4x4 matrix eats up 4 uniforms; 4 such matrices and there are no uniforms left for driver-internal state. Oops.

The solution is surprisingly simple: rather than using this register-based fast path, use the traditional “load uniform” instructions which have no practical limit on the number of uniforms, at the expense of a nominal performance hit. Specifically, the driver apparently switches from using uniforms to internally generating their crazy cousin, uniform buffer objects. This technique correctly handles these complex shaders.

However, Panfrost goes one step further than the blob appears to! Since uniforms and uniform buffer objects can be used simultaneously, we can use the same memory block for both kinds of uniforms – two for the price one! Mapping the uniform memory like this, the compiler is free to use the register fast path for the first 16 uniforms and seamlessly fallback on the general slow path for the rest, shaving off sixteen “load uniform” instructions on a shader that otherwise has to “spill” to dedicated instructions. A win!

Another major change prompted by a bug bringing up jellyfish surrounded register allocation. Previously, Panfrost used an ersatz, home-rolled register allocator, good enough for spinning gears but a little clunky. Debugging jellyfish uncovered a buggy corner case; the naive solution unfortunately increased register pressure, a performance issue. But there’s no need to settle for naive!

“Bayes”ing the work on Mesa’s classy register allocation library, I scrapped the old code and wrote a more robust, sophisticated register allocator. The new allocator, using graph coloring rather than ad hoc liveness checks, already generates better code with lower register pressure, neatly resolving the issue found in jellyfish. Plus, it will permit further register-related progress when we tackle register spilling and parallelism in stores and texture instructions. The new code is considerably longer, but it will pay off as shader complexity creeps up.

A few miscellaneous bugs later, et voila, Jellyfish running on GPU-accelerated using only free software on a Rockchip RK3399 laptop.


It’s a few hops away from the jellyfish, yet I carrot believe Panfrost is running glmark’s refract demo!

Like jellyfish, uniform spilling is needed for refract. We have been able to render the bunny itself for quite a while, as demoed at the X.Org Developer’s Conference, linked above. Where’s the issue?

Framebuffer objects.

Remember how I mentioned back in April that framebuffer objects were one of the few major features missing in Panfrost?

We can strike that off the list now!

For those unfamiliar, by default, a 3D application renders to the screen. When the application instead renders off-screen, to be later read as a texture, it’s said to render to a “framebuffer object” in graphics lingo. Since Mali is a “render-only” GPU, independent of the display, we already have the freedom to render to arbitrary buffers. Theoretically, rather than telling the GPU to render on-screen, we simply it to render literally anywhere else but the screen. Pretty simple!

Of course, it’s never that simple. To reduce power consumption by saving memory bandwidth, Mali GPUs feature “ARM Framebuffer Compression”, or AFBC for short. As advertised, AFBC is a lossless compression scheme for compressing the framebuffer in-flight from one part of the chip to another.

Only across parts of the chip? Always interested in reducing power, it turns out the blob is aggressive about enabling AFBC anywhere it can. Previously, we could ignore AFBC entirely, since the X11 blob is apparently unable to use AFBC for display. However, since framebuffer objects are off-screen and stored internal to the driver, it doesn’t matter to the rest of the system what format is used. Accordingly, we have never seen a framebuffer object that doesn’t use AFBC, so I had to implement AFBC support in the driver.

Furthermore, framebuffer objects mean juggling multiple framebuffers at once. Together with AFBC support, this additional complexity required a considerable refactor of the command stream driver code to support the extra bookkeeping. Many coding afternoons and a transatlantic flight later, Panfrost implemented preliminary (color) framebuffer objects.

However, to add to the joy, refract does not only use a color attachment to a framebuffer object, but also a depth attachment. Conceptually, the two modes of operation are identical; in practice, they require a different encoding in the command stream and an additional increase in bookkeeping leading to yet another (small) refactor. Plenty of keyboard mashing and brain freezing later, Panfrost supported depth attachments to framebuffer objects as well as color. And poof, a refraction bunny!

Now, for the piece de resistance, rather than a glmark demo, we have the reference Wayland compositor, implementing a simple GPU-accelerated desktop. With a small workaround added to the driver, Panfrost is able to run …


October 20, 2018 04:00 AM