Return of the Bad Guys: A tale about a little interface

Once upon a time, The Fathers were dissatisfied with the then current situation of graphics support on Linux. At those ancient times, there were basically two options: svgalib or X. It was an or, not an and, because these two didn't mix terribly well.

So The Fathers set out to fix this situation, and created a sophisticated scheme for a General Graphics Interface. It was designed to encompass the existing options in a nice overall framework, allowing the mixed use of both svagalib and X. It should allow running svgalib applications in X, and X on top of svgalib; even nesting them at will. In short, it should allow seamlessly running any graphical application in any environment, all reliably and securely in parallel.

Fulfilling those goals basically required two major components. The environment independance would be implemented by a library abstracting the different backend targets in a common interface, plus a set of associated frontends to allow using all the existing applications on top of it. (svgalib wrapper for running svgalib applications, and XGGI for running X.) Of course, applications could also use the native GGI interface directly, making use of the backend abstraction it provides.

libggi only as an abstraction for X and svgalib targets however couldn't fulfill all of the desired goals: Having no provisions for secure sharing in svgalib, and sharing only between its clients but not with the outside world in X, reliably using an X server and some svgalib programs on the console in parallel is not possible. Doing this required a more sophisticated, native target implemented mostly in libggi, but requiring some kernel support for secure sharing. That support would be accessed by a Kernel Graphics Interface -- the second major component -- which would implement the actual hardware access and sharing in a kernel driver, just like other drivers do. It would cover just the stuff really necessary for secure sharing (mode setting, framebuffer setup, acceleration pipe access), but not any higher-level logic that can do without kernel support. (It would also require some rework of the console system, to work with KGI.)

So what happend when The Fathers introduced GGI/KGI? Well, libggi found it's niche as a nice multi-target graphics library when used directly throuh the GGI API; it is still active and becoming more and more powerful to this day. The little sister KGI however had less luck: It was immediately faced with strong backlash, from several sides. The X window folks considered X to be The One And Only (TM) graphics interface, that should be used by everything. The kernel guys opposed the idea of integrating graphics drivers into the kernel, maybe because "putting graphics in the kernel" is considered a windows thing or something, and people suggesting it were considered the Bad Guys; maybe because people didn't realize the fundamental difference between low level graphics drivers -- which like other drivers belong into the kernel -- and higher level graphics handling that obviously does not.

Maybe the time wasn't right for an idea that seemed so radical back then, when graphics support was still considered something very special that is best handled by an external entity; when people still believed X could be integrated better into the system over time, removing the entry barriers and making other approaches unnecessary.

While The Fathers struggled on afterwards, the steam was mostly out; KGI soon lost momentum.

Today, things are quite different. In the meantime, the simple fact that some platforms just do not have such a thing as a text mode, forced addition of framebuffer support in the kernel; but once there, it was warmly received -- it turned out many users *want* that kernel support, avoiding the considerable problems associated with the pure userspace implementation done by X. In fact, people even created a complete graphics system called DirectFB, which hacked accelerated graphics support on top of the kernel framebuffer interface. (But lacking kernel support for the acceleration features, inherited many of the problems of svgalib.)

Also, it turned out even X required some kernel support for efficient 3D acceleration, introducing the DRI/DRM interface.

In short, today kernel graphics is a widely accepted fact. Today, people no longer concentrate on preventing graphics support from entering the kernel, but on how to implement a *clean* interface; discussing a mode setting API and everything.

Maybe it's time for realizing that the little baby child KGI, while quite lifeless from the bad treatment it received, is still around; that it's not that ugly after all, but on the contrary offers quite exactly what people are looking for now: A clean, generic, well thought out interface for supporting mode setting, framebuffers, and acceleration feature access in the kernel; and that in fact it has done so all the time, though not recognized for that.

Maybe it's time to give this little child a hand, to let it grow, shape it a bit maybe -- so it can become really great.

The next step

Most people consider the Hurd only a project to replace monolithical kernels. IMHO, it is more.

On the irc.freenode.net##hurd channel, we just had another discussion on the X window system. (You can read it up in the channel log.) Which seems a good occasion to summarize some of my thougths here.

While there are a number of other things that are flawed about X (which I may touch in other posts sooner or later), there is one really fundemental problem: The X server is basically a gigantic monolithic beast, suffering from much the same problems as monolithic kernels. (Flexibility, extensibility, robustness, usability, security, etc.) And it needs to be fixed in much the same manner.

The nice thing is that the underlying Hurd concepts (RPC, translators, etc.) not only give the foundation for reimplementing the functionality of monolithic kernels with a multi-server system in userspace, but also for refactoring monolithic higher-level infrastructure components like X -- just like the Hurd is splitting monolithical kernels into a set of interacting servers handling individual parts of the functionality, a hurdish windowing system will split the functionality of X into individual servers. (And just like the Hurd uses libc to implement POSIX interfaces on top of the multi-server system, allowing for a smooth and flexible transition to more powerful concepts, we will need a replacement X library implementing X interfaces on top of the multi-server windowing system.)

Unite and Conquer

Returning to my POSIX level driver proposal: Many people expressed concerns that the standard filesystem semantics I want to use for (most) driver communication are too slow and not really appropriate.

The proposal explicitely mentions the possibility of using shortcuts wherever we experience serious performance problems with FS sementaics. Now recently I had some intitial discussion with Peter de Schrijver (p2-mate) at freenode.net#hug, on what specifically are the problems with the POSIX interfaces. It turns out that mostly the drivers have some generic, quite similar requirements. This means that instead of creating specific shortcut protocols only for some extremely demanding drivers, we should probably rather focus on a few generally useful extensions. I like this :-)

For one, drivers are often serving quite a large amount of very small requests. Pure POSIX semantics would introduce quite a considerable overhead here, because each single request needs to establish a session (open()/close()), unless it already has a permanent one; do addressing and other setup (seek()/ioctl()); and finally do the actual data transfer (read()/write()). In POSIX semantics, we need an extra RPC for each of those steps, plus some bookkeeping overhead. (If we want to avoid ioctl()s for the setup -- because they aren't very transparent, killing the major advantage of filesystem semantics -- there is even more overhead, as we need to introduce an additional file descriptor for setting request options.) A more appropriate protocol would wrap all of this in a single RPC.

Well, there is an important observation to make here: The optimization is useful not because we are dealing with drivers here, but because the drivers have a specific requirement (efficient handling of many small requests), which could also emerge in any other program -- I'm pretty sure many higher-level translators will profit from an optimization for that just as much. Which confirms my view that drivers aren't fundametally different from other programs, and we really want generic extensions to the POSIX/Hurd interfaces, rather than a special interface for drivers.

I guess we could implement this extension mostly transparently, with servers implementing it optionally as an optimization. Maybe even handle it in the FS server libraries, so translators not aware of the shortcut will just get the single RPC presented as a number of independant callbacks. Not sure about the exact implications, though.

Another property of drivers is that typically they are working with data that is structured in blocks, and doesn't really fit well with the POSIX assumption of all data being represented as sequential streams. Serializing the stuff (using XML or whatever) would by tremendously expensive, considering that drivers usually aren't processing the data at all, but only working with some status information, and passing the data on. What we really want is a memory container for the actual data, and a second independant memory container with per-block status information. (Including block boundaries in the case of variable-sized blocks.)

Again, we have a requirement that isn't really specific to drivers at all: Quite a lot of high-level programs are actually working with similar non-serial data, and would greatly profit from a generic extension allowing for passing several memory containers in a single read()/write() RPC.

Another possibility I'm considering, instead of adding a number of independant extensions handling different aspects that need optimization, could be just creating some generic method for merging several POSIX calls in a single RPC. This would be extremely flexible and powerful; however, I'm not sure it could be implemented without being too awkward. Also, the fact that it would need to handle the requests in a very generic fashion, might skyrocket complexity and nullify some of the performance gains we are striving for.

And Now For Something Completely Different

The last two days I spent a considerable amount of time thinking about acoustic processors. Not sure this really fits here, but maybe some consider it interesting nevertheless.

An acoustic processor is normally a box you integrate into your stereo to improve the sound. Specifically, by adjusting the audio signal in such a manner as to balance anomalies created by your hearing room and/or your equipment.

Now acoustic processors aren't terribly popular, so far. There are probably several reasons for that. For one, high end audio purists are generally sceptical about any modifications to the audio signal. Also, such a box isn't easy to integrate: For several reasons (price, calibration, distortion), such a processor can feasibly only work on digital signal. However, while CDs have been around for quite a while now, only recently most other audio sources are becoming digital. And even with digital sources like CD players, the signal is usually already converted in the source and passed on as analog. No home for the poor little acoustic processor.

Last but not least, the price tag: Such a box is all but simple, and consequently all but cheap. Those who need it most -- with cheap equipment and inappropriate hearing rooms -- can't afford it; those who could, have fairly little need.

Being one of those who'd need it but can't afford (my equipment isn't bad, but far from perfect in the low bass area; and my current hearing room is terrible), I've been playing for quite a while with the idea of going a route even I could afford: Use software to do the acousting processing offline. Grab all my CDs, torture them in the offline acousic processor, and burn the result again -- producing CDs that will sound terrible in any other setup, but should be perfect with my equipment and hearing room.

So, how does such an acoustic processor work? Well, the simplest variant is just an equalizer with a lot of bands (128 or so), and an auto-calibration system. (Measurement microphone in conjuction with a program to run a test and adjust the parameters.)

However, this simplistic variants do not work terribly well. The problem is that just adjusting absolute volumes doesn't help too much. Temporal effects play a big role: For one, if the speakers or the hearing room generate resonances, the effective volume may depend on the length of the sound. Even more importantly, psychoacoustic effects make sounds prolonged due tue resonances/reverberation seem relatively louder.

(Furthermore, it's desirable to correct phase discrepancies between channels and frequency bands, for improved positioning and naturalness; to correct dynamics for improved vitality and resolution; and so forth... But that's definitely beyond my amateur means.)

So what we want to do is adjust the volumes of individual frequency bands depending on the signal levels. When a sound sets in, the relevant frequency band's volume is adjusted by the stored volume factor for short sounds in this band; when it persists for a longer time, we successively move to the factors for longer sounds. Well, at least that's my idea on how it should work.

My major problem is my very limited knowledge of acoustics, psychoacoustics and digital signal processing. As a layman, I believe we first need a frequency analyzer, continuously tracking the signal level per frequency band in the input signal over time. This frequency analyzer needs to have similar properties to our hearing, I guess. Now using some function involving the different adjustment factors and the signal level history, we can determine the necessary current volume adjustment for each band. These levels are perpetually fed into an equalizer, processing the input audio signal.

Well, so much for the theory; now if someone could tell me exactly how to implement this...

Another complication is that having no measurement equipement, I'm trying to determine all the necessary volume adjustment factors by hand/ear, using various test sound. So far, my experiments were rather discouraging; but I still have hope... (For the first time in my life, I'm considering a wireless keyboard.)

And well, once remastering all my CDs in this manner, I'd like to use the occasion also to fix some evident recording errors... Most notably, undo this abominable moronic dynamic compression most CDs are fucked up with. How do we do that, again?...

Design by Bulldozer

In the previous post, I mentioned there are some very fundamental advantages in my POSIX level driver proposal, related to usability.

Now I want to pick up on one of those, which will be a recurring theme in this blog, being an extremely important issue: Accessibility.

I do not mean accessibility in the usual interface-related meaning of posing no barriers to people with disabilities. I mean accessiblity in the sense of not posing barriers to about everyone.

Most developers, and even many UI designers, seem completely unaware how extremely important accessibility is. Somehow they assume if something makes sense to them, it is good. The thought doesn't even cross their minds, that it might mean considerable work for others to learn the concept.

Look at which technologies are successful. The WWW is popular because it has low entry barriers. People are more likely to participate in wikis than contribute to static pages because they have a lower entry barrier. And so on.

Look at Firefox. Why is it so popular? Because it focuses on those features that are easily accessible. Popup blocking is a good feature, because it is obvious. So are tabs. Or the search bar.

Compare this to Opera. If you configure it to ask about setting cookies, for example. You get a dialog that presents you with more than half a dozen of options how to handle each cookie, some of them I don't even understand. (At least in the versions I tried.) And Firefox? It presents you with a very simple dialog, having only a few obvious options. Maybe its slightly less powerful; but still covers what you want in about 99% of all situations, and is at least three times simpler. Meaning about an order of magnitude more useful.

There are many other examples of features in Opera that are quite interesting, but so hard to use that they have no practical value. Features so complicated or obscure that hardly anybody will bother to learn them, are just useless. Sure, there are always a few nuts taking considerable pains learning even the most obscure feature of some program. However, if it takes more trouble to discover, learn, configure and get used to some feature than it saves in the long run, this is just an end to itself. You can boast how powerful your program is and/or how well you master it. But that's about all the value you will ever get out of it.

Accessibility is important not only for GUIs, but really at all levels of the system. Let's take one example from the driver proposal: Among many other possibilities, it allows control of who is allowed to run what drivers, simply by changing file permissions on the underlying device nodes. Now, of course, you could implement some kind of permission system in any other driver framework... But requiring some obscure special mechanism, with some kind of config files in the background or somehting, not only is it considerably less flexible, but actually much much harder to set up in the first place. Unix file permissions on the other hand are obvious and a well know concept to every Unix admin -- there is nothing you need to learn or remember; just by looking at the nodes you can guess what to do.

And it goes even further down. All of this is true for the system internals, programming interfaces, everything. Making functionality accessible, tearing down entry barriers, is among the most important design principles in about any kind of software developement. (I'll show how this applies in various contexts in other posts on more specific topics.)

You will be assimilated

I thought about a number of things today, but no time for writing it up, because I had to write a long answer mail regarding my POSIX level driver proposal for Hurd on L4.

The foundation of this proposal is the fact that on a multi-server microkernel system, we have quite a lot of freedom about how to implement hardware drivers -- my take on it being to treat them just like ordinary applications, with only a minimal set of special mechanisms for driver-specific stuff, on the premise that drivers actually aren't that much special in their nature, and shouldn't be in the implementation. This offers a large number or advantages over other approaches: To users, admins, and system distributors, and even application and driver developers. (Of course, the advantages mostly relate to usability :-) )

The most important of those advantages are of a very generic nature -- stuff that I will cover in other posts sooner or later. (Though maybe in different contexts.)

The linked document is a longish, very technical description of my proposal, explaining what it is about, trying to outline some of the advantages (though probably not very well), and describing my ideas on a possible implementation on Hurd/L4 in quite a lot of detail. Especially the last section requires some knowledge of the Hurd, L4, and the Hurd port to L4.

Now probably I will be dreaming about competing driver framework proposals for Hurd/L4. Not sure yet whether this will be pleasent dreams or nightmares.

Exposition

After explaining why I consider weblogs a good idea in general, it's probably about time I explain what I started this particular one for.

My personal problem is that I'm spending a lot of my time thinking. I mean not thinking about concrete stuff I'm working on right now; I mean thinking about various tricky problems I just happen to be interested in generally. And I don't mean just letting my thougts wander a bit from time to time, like most everyone does sometimes. I mean really thinking a lot. Usually for hours and hours each day.

In fact I'm thinking so much it actually distracts me and makes it nearly impossible to get anything concrete done, preventing me from implementing any of the stuff I spend so much time thinking about...

In other words, all the time I spend thinking is really plainly wasted. What good does thinking about stuff do, if my ideas will never see the light of day?

This realization slowly trickling into my awareness, I was increasingly inclined to find a way to share my ideas. I was thinking about writing essays; but there are considerable drawbacks to this. With the aforeposted discovery on the usefulnes of blogs, it didn't take me long to realize a weblog is what I want. A weblog doesn't require well thought out ideas, which would make writing essays a considerable effort, and exclude many "smaller" thoughts. On the contrary, by it's very nature a weblog embraces brainstorming. It doesn't require covering a topic in any completeness. Just dump the ideas as they come. And pick them up later, adding new thoughts. Or move along and let others pick them up.

So what to expect here? Most of my thoughts revolve more or less around user interface/usability issues -- in a very broad sense, ranging from the actual design of specific GUIs, all the way down to the internals of operating systems design. After all, some users like to or have to work close to the OS core (think of admins, shell users etc.), and even if you don't, the core design indirectly affects what you can do in the end in very fundamental ways. So this blog will mostly revolve around UIs and OS design, and related topics.

I'm still not sure whether I should keep it strictly on topic, or also cover other less related stuff I happen to think about.

The beginning might be a bit strange, as I won't only write about new ideas I just came up with, but also try to catch up with all the more or less firm ideas that formed over the last couple of years.

PS. Thanks to Gianluca Guida for helping with naming this beast.

Innovation, Free Software, and Blogs

Often we hear claims that free software is only good at cloning existing products, but doesn't create any innovation. Of course, this statement in its broadness is patently wrong; there are lots of lots of features, ideas, solutions present in various free software components, that didn't exist before. (As we all know, the whole internet was built on free software from the beginning, just to name one prominent example.)

What is true, is that the most visible, large voluntary projects (like KDE for example), tend to copy existing stuff. Why is that? Well, we have this more or less famous claim (don't ask me by whom), that real innovation always comes from individuals or small, tightly coupled groups of smart people. I couldn't agree more. (UNIX is a nice example, or the GUI inventors at Xerox PARC.)

If a project is small enough to be implemented by a single person, there is no problem -- just follow your vision and you are fine. Free software is actually in an advantage here, as the developement model makes core developers much more productive, allowing for considerably larger projects with a single major developer.

Larger projects requiring several developers on the other hand, are possible only if all the developers share a common vision. This is easy when cloning something existing -- everyone knows what the result is supposed to be. When someone has an innovative idea however, he is out of luck. He'll have a very hard time convincing the others to work on it, or even explain what he wants to achieve, as they do not share his vision. Only in a tightly coupled group it is possible for several developers to pick up new ideas, amplify and refine them; still share a common vision, when drifting further and further away from trodden tracks. Only in such a setup massive innovation has a chance. Voluntary internet projects are left out in the cold.

So, what's the moral? Well, there are two. One is that we probably need more commercial free software. (Letting people form tightly coupled groups by working physically close together, just like in proprietary software companies...) The other one is that we need more blogs.

With dedicated, competent people from all around the world sharing common interests coming together, voluntary internet projects have an advantage that easily makes up for the disadvantages of distributed development. And they have found methods to cope with the great disadvantages of missing direct communication surprisingly well. (With mailinglists, bugtrackers etc.) Well, with that one ugly exception: Effective exchange of innovative ideas. Blogs to the rescue.

Like no other medium, weblogs faciliate people sharing similar interests around the globe automagically forming ad-hoc communication networks -- thus introducing the great strength of voluntary internet projects to exchange of ideas. Like no other medium, weblogs faciliate brainstorming and spreading of ideas, picking up the best ones, refining and extending them, putting them in new contexts, drawing new conclusions, sparking off other, even more advanced ideas, in an indirect asynchronous manner -- thus making up for the lack of direct communication in distributed internet projects.

Maybe blogs can help remedy the lack of innovation in many larger free software projects.

Or maybe I'm just daydreaming, because I get too little sleep at night.