triky Concepts

Declaration of Unoriginality

2010-11-15T18:42:00.002+01:00

So, in the previous post I raved like a lunatic about the concept of declarative UI languages -- and QML in particular. It turns out that apparently I got excited about old wine in new skins. Which isn't exactly unusual either :-)

More specifically, I recently chatted with a certain developer -- and he pointed out that Edje (one of the various pieces in the EFL stack) has (supposedly) provided the same stuff for years...

This is a bold claim of course. Scepticism rears its head... However, judging from a quick glance at least, there are indeed striking similarities between QML and Edje Data Collections. Now I should dig a bit deeper, to find out how far the similarities go. Only I'm too lazy to do that, until I get to actually use either of them :-)

Someone also threw in XAML, which is used (among other things) for declaratively describing user interfaces in Microsoft's WPF. While I originally understood WPF to be one of the crazy frameworks for doing desktop applications in HTML, it turns out that with XAML as the language for UI descriptions, it is related to DHTML (i.e. HTML/CSS+JavaScript) only in spirit; while the actual implementation is designed from scratch, and thus probably saner... Or let's say: it has the potential for being saner -- but being created by Microsoft, it's as likely as not they actually screwed it up anyways :-)

Obviously, XAML being XML-based, it doesn't look very similar to either Edje Data Collections or QML (and is barely human-readable in fact) -- but from a cursory glance, the fundamental concepts behind them are quite similar.

What's more, XAML also forms the basis for Workflow Foundation, which some described as monads in disguise (no, I do not remember where I read that) -- i.e. related to functional programming. I don't know how these pieces fit together exactly (nor am I much inclined to seriously study such proprietary abominations... I mean technologies); but by the sound of it, this might allow for the kind of declarative UI descriptions with functional-style behaviour specification, that I was musing about -- especially when combined with F# for the actual application logic.

It's rather chilling to see that apparently Microsoft is kinda taking the lead here... So let's change topic quickly -- I'm freezing by now!

To avoid serious confusion, I feel obliged to point out that storing UI definitions in a data file (rather than building the UI elements one by one with function calls in the main program) is not a new concept by itself. Point-and-click GUI builders have done this back in the nineties, if not earlier. However, elevating the UI descriptions to actual source code -- which can be viewed and modified by the programmer directly, rather than only through some point-and-click tool -- totally changes the game.

For one, the UI definition becomes a first-class part of the program. It can be handled with a text editor like the rest of the source code; it can be properly versioned. The connections between UI definition to main program, and the workings of the UI in general, become much more transparent.

Moreover, the UI definition itself becomes more powerful. There is only so much behaviour you can reasonably describe with a point-and-click tool; any non-trivial interaction requires calling back into the main program. When on the other hand the UI definition is handled as a true source code file, it becomes natural to implement complex interactions directly there as well; so the whole UI definition can be contained in the same source file, and the main program really only has to handle actual program logic. That's where these new declarative UI frameworks excel.

By the way: I learned in the meantime that functional programming is generally considered a subclass of declarative programming -- so my intuition about this was quite on spot :-)

Declaring World Domination

2009-07-10T01:33:00.002+02:00

So, you came here looking for a receipe for achieving world domination? We don't have one! You fell for our PR stunt!

But then, what you will find here is almost as good... ;-)

Let's start at the beginning. I was at LinuxTag 2009. Hurray.

So, what was I doing there? Well, that's rather obvious: meeting nice girls. Why else would anyone go to a major free software event, with 95%-or-so male geek population?... ;-)

Quite surprisingly -- in view of the above numbers -- I still managed to attend a few interesting talks. One of them was the keynote on QML. They did no less than declare a new paradigm: declarative UI programming!

Admittedly, it's not really all that new. In fact, it has been around for a while -- this is essentially what web pages are. The reason people find it easy to get going with HTML is, as everyone knows, the lovely syntax of HTML...

OK, just kidding. The really nice thing about HTML and CSS is that they are declarative. I must admit that I can't quite explain why declarative languages are so intuitive and nice -- but they are. And I'm not just saying that because I'm a raving lunatic either. Promise! ;-)

Indeed people tend to like this declarative stuff a lot. So much in fact that there have been some attempts to bring HTML-based applications to the desktop. (!!!) I'm not kidding now. Maybe you heard about it. I guess some people are just crazy -- and not *all* of them in a positive way ;-)

If declarative UI programming is so attractive, that people are willing to go through this kind of hoops and even put up with Web standards, the logical conclusion must be: to create a proper language for declarative UI design -- but unlike previous attempts, actually designing it to be sane, instead of trying to build something on top of the HTML legacy...

I contemplated something like that for a while, and now the ex-trolls have invented just that: a (hopefully) sane declarative language for creating proper user interfaces.

It's not all bliss, though: while simple interactions can be described in a purely declarative manner using builtin functionality and standard modules, more complex stuff is implemented using JavaScript. Ouch.

Aside from JavaScript specifically being a glorious achievment in backwards evolution -- it almost, though not quite, reaches the standard of sophisticated ugliness set forth by such historic highlights as COBOL or ADA -- I always felt that imperative scripting languages generally do not really fit in with a declarative markup language.

Functional programming much more seems a logical complement to declarative languages. Both describe how the result relates to the input state; without needing to specify in what order individual calculation steps are to be performed. The purely declarative part describes states, while the purely functional part describes relations between states. It seems to me that when a declarative language evolves toward more sophisticated state transformations, the desription of these relations will naturally look more and more like a full-blown functional language.

This paradigm is actually not limited to GUI programming: I have been feeling for a while now, that the reason Hurd translator programming is rather tricky, is related to the imperative languages used: translators tend to describe functional relations between the presented file system, and some underlying state -- I'm pretty sure these could be expressed much more naturally with a declarative/functional approach.

But back to QML and the glorious keynote: at the end, the speaker's great conclusion was that he considers this a paradigm shift, similar to the shift towards object-oriented programming that happened in the past... This conclusion shocked me a bit. Why so negative?!

I never took to this "object-oriented programming" silliness: it always struck me as the kind of questionable abstraction, which manages to do the amazing trick of obscuring the internal workings, and limiting possibilities, without actually hiding any complexity in exchange...

Declarative programming on the other hand -- as I have *discreetly* hinted at during the course of this article ;-) -- is something I actually do consider a great idea.

So indeed, a paradigm shift it is -- but not at all like the one towards OOP!

One Shell to Rule Them All

2009-06-20T00:29:00.004+02:00

Just when I lauded GNOME for (slowly) moving in the right direction, the GNOMEs failed me: I heard that the Nautilus CD Burner was dropped in favor of a "traditional" CD burning application. Oh ungrateful world!

This will teach me not to trust Swiss bankers.

Wait, wrong link... Let's try again: this will teach me not to trust mythical creatures characterized by their extremely small size and subterranean lifestyle. That's better.

I briefly blabbered about this in my article on DeepaMehta: traditional applications just do not make sense. No really, they don't. No sense at all.

I do not believe there was ever actually any technical or otherwise practical reason for having applications. Rather, it's just nuclear fallout from the proprietary software world: when one has a compulsive need for selling "products" (I'm convinced it is some kind of mania -- all that talk about business model is just alibi :-) ), then one needs to offer something tangible; something that the lemmings using it can associate with the neat (for some value of "neat") package they got from the maniac... err, I mean vendor -- and probably shelled out some money for. (Pun not intended, honestly :-) )

This is an exquisite demonstration of how formidably the proprietary model fails to produce real value; how the vendors' interests work in perfect opposition against the users': what we really want are *not* clearly distinguished applications. Quite the reverse: we want additional functionality to integrate as seamlessly as ever possible; to become an organic part of the system -- to become unexistent as an entity of its own. And in a free software world, once the proprietary "product" fallout clears, we can indeed attain this goal.

So, what's this about nuclear fallout; where does this applied hate come from, err I mean that hate against applications? Just what makes them such a first-order nuclear meltdown? Quite simple: it's just too many shells.

(Don't get me wrong -- I actually like seafood ;-) )

A shell, in this context, is the part of an application that reads interactive user commands, and invokes the corresponding functionality -- the spell casting interpreter so to say. But why does every application need it's own shell? Winning hint: It doesn't. There is really no good reason. Unless you like pain -- plenty of that in here. A genuine pain factory indeed.

For starters, having many shells naturally breeds inconsistence. (It's indeed a law of nature. Goes by the name of Entropy.)

Even if you mange -- by threats, pleas and bribes -- to keep the actual interfaces consistent, the user experience inevitably still will be inconsistent: simply because of having to open the same file in various applications to do certain things on it. There is no escaping the pain.

Multiple shells also inevitably result in redundancy; and thus bloat, confusion, and more pain in general: it is never quite clear which functionality best should be accessible from which shell. There is always a tendency to add more and more stuff to each one, to avoid the situation where you have to use another shell (application) just for this one feature...

This also goes for the main system shell, i.e. the file manager: which functionality should be available there? Surely it's useful to have a preview of images for example; but once you have that, how about slide shows? Or functionality to rotate images? And once you have rotation, why not other editing features? Where to stop?

The obvious answer is: don't stop at all. (Well, it is obvious, isn't it?... :-) ) Just put all the functionality in the main shell, thus avoiding the need for any other ones.

This way, there are no applications in the traditional sense anymore. All additional software just plugs into the main navigation facility. (Normally the file manager, though theoretically other object systems are possible as well... Except that using anything but the file system as the primary facility for managing objects, is probably an idea almost as bad as applications :-) )

If you install an SVG editor for example, you just get the SVG editing abilty available from the main shell. Simple and consistent -- no more pain. Life is good.

Activate me!

2009-05-17T18:13:00.002+02:00

In the past I have been complaining about GNOME's lack of innovation; and now I stumbled over a project called GNOME Shell...

So, this is the moment: this is when I have to revise my world view; when I have to apologize and praise the GNOME folks for their innovative ideas...

Nah, just kidding :-) But I have to admit that I was surprised -- and this is the incredible part -- in a positive way for a change.

Most of it is still rather vague (i.e. remarkably like my own ideas...); and the ideas presented there are not exactly revolutionary -- but one thing is clear: For the first(?) time GNOME folks indeed seem to be thinking outside the Windows (TM)... err... I mean outside the box ;-) ; for the first time they really try to come up with something new, rather than just doing cosmetics to well-known (stupid) approaches...

Do you hear this noise?... It's me applauding.

One thing that caught my attention in particular are some ideas regarding Activities: remarkably similar in some regards to my own ideas regarding session management...

This confirms an observation I'm recently making again and again: Slowly, very slowly, most things in the free software world tend to be moving in the right direction. Maybe in just another 20 years or so we will have a sane desktop environment! ;-)

A Plea for Reinventing The Wheel

2009-04-18T19:45:00.002+02:00

Let me talk a matter I have been pondering more than once. (How unusal, eh?... ;-) )

The latest incident, which prompted me to write this article, was at a (somewhat bizarre) presentation of Protonet. (Which is essentially a WLAN meshing appliance.) There was argument about whether the Protonet guys should have used existing Freifunk stuff, instead of creating their own infrastructure. While most of the geeks present were arguing that it was stupid and pointless and evil overall to reinvent the wheel, I was pleading Protonet's case... (I'm not associated with either Protonet nor Freifunk, BTW.)

So, why did I do that? Just for the sake of trolling, of cource... Err wait, did I really say that aloud? That's obviously not what I mean! :-)

The truth is that sometimes reinventing the wheel -- or rather, inventing a new variation on the wheel theme from scratch -- is indeed a good thing. Reinventing the wheel is not always just ignorance or Not Invented Here syndrome (no, really!) -- there are various totally valid reasons for doing so.

There are -- surprisingly -- technical ones: while it certainly seems a terrible waste to create something new, when more or less the same functionality has already been implemented elsewhere, this is a very superficial view. Often it's not really a waste, because implementing the core functionality from scratch can actually be less effort than working with an existing framework!

An existing framework, that has matured over years, tends to have all kinds of features and quirks, to handle all possible aspects of the problem; to cover all possible use cases. (And usually, some impossible ones as well -- after all, we want to be really complete, dont' we?... ;-) )

Now of course you think that this is a good thing, and precisely the reason for using an existing framework. (See, I can read your mind! ;-) )

However, when I want to create something new, the completeness is not helpful. When I want to create something new, I want to focus on my new hotness, not trying to cover all freaky obscure use cases. ("Freaky obscure use cases" obviously being anything I don't need myself ;-) )

I actually want to ignore most aspects: "ignorance is bliss". I want it to be incomplete on purpose. I don't want to waste my energy on learning all the mundane aspects of the existing framework, and trying to figure out how to fit my new ideas into it, without breaking existing functionality -- the functionality that someone, somewhere, has learned to adore, and will fight for it tooth and claw... Being the egotist that I am, I want to spend my energy on my new ideas instead.

Your next objection surely is that this is shortsighted, and will come back and bite me in the arse: because -- if I want my new stuff to be generally useful -- I will ultimately have to cover all of these aspects anyways. (This is your next objection, isn't it? How predictable you are! ;-) )

And -- prepare for a surprise here -- you are totally right. Didn't expect that, eh? :-)

It is true that I can't really avoid dealing with all the aspects the existing framework covers. All I can do is postpone; but sooner or later I will have to deal with them. And then it's time to look at the existing framework, and see how my ideas can be integrated there. Only then my ideas are already tried and tested; only then I know exactly how things should work; only then I know which aspects are really important, and which can be traded. Only then I can show my ideas, instead of just trying to explain them; only then I can prove that they work; only then others can try them out, and see for themselfs that they are useful; only then I can point to existing hitmen^H^H^H^H^H^Husers, who like the new ideas, and want to see them implemented in the existing framework; only then I can expect help from others with this daunting task.

Yeah, sometimes being shortsighted is useful.

Of course this means that most likely I will have to throw away part of my code; perhaps even all of it. So? Luckily, I didn't spent too much effort on it in the first place... Call it a prototype, proof of concept, whatever. Surely you won't question prototyping being a good thing?

The code doesn't count much. It's the ideas that count; having inspired a group of minions^H^H^H^H^H^H^Hfollowers sharing my vision; having gained enough momentum to overcome technical and social obstacles...

And here we are already happily in the midst of the second catergory of (valid) reasons for reinventing the wheel: the social aspects. (Ha! I know you like these, like we all do!)

These are often even more important than the technical ones. Working with an existing framework means working with an existing community -- a community that has it's own priorities, goals, conventions, deities... Not a good environment for creating something new: you spend your energy on dealing with conflicts (religious and other), instead of actually creating stuff.

Let's take a look at the worst case. It's not even uncommon -- I've seen it happen. You have a group of people, having an interesting idea. They all have a common goal, a shared vision. They are very enthusiastic, and want to make it happen. Ideas are thrown around, people start setting things up and working on stuff... In other words, pure awesomeness, life is good etc.

And then, people from a somewhat related, established project come around, and start discussing. (Yeah, discussing -- it's every bit as bad as it sounds! ;-) )

First they will say, "See, what you are trying to do here is interesting; but it's essentially the same as what we are doing. Why don't you join our ranks and we can work together?" "Indeed, why not?" you will think, naive as you are. So you stop the stuff that was already going on, and instead talk to this established group about what needs to be done.

But now you discover, the hard way, that they aren't really that much interested in your ideas after all. Although they are doing something similar, it's not quite the same. They have their own ideas, their own priorities, their own goals, their own deities... They tell you how you should do things differently from what you intended; more like they want them to be -- arguing that it's The Better Approach (TM). They will tell you to focus on different aspects; to work on different things. In short, they will patiently explain to you that what you really want to work on, is evidently not what you thought that you wanted, but rather what they consider right.

They drown you out. They are an established religion, with firm dogmatas; while you struggle to articulate your fresh heretic vision, and to hold your own little group together. Some of you will hold firm to your original beliefs, spending all your energy vainly trying to convince the other group of their value; finally giving up exhausted and dismayed. Others will seemingly convert to the established religion, agreeing to work on other stuff; but inwardly feeling that it's not really what they set out to do; consequently lacking enthusiasm, and ultimately just dropping off as well. (So the other project doesn't gain anything from it either -- their hope of annexing your group to work on their stuff is frustrated too. They only loose time and energy as well -- serves them right, bastards!)

The result is total disaster -- your enthusiasm lost, your vision in shatters, your people dispersed; leaving nothing behind but a universal feeling of disappointment...

You might try to avoid interaction with the established religion by forking. However -- aside from the fact that without interacting with the developers, building on an existing framework is even more problematic technically -- this doesn't help much either: some people, when seeing your dissenting, will come over and whine, why are you forking instead of "cooperating"? They will go on a crusade, actively trying to harass your group. Not exactly helpful for productivity...

In other words, if you are trying to create something new, you initially need to isolate from other similar projects as much as possible. Only once you have working code, a community of followers, enough momentum to hold your ground -- only then you can talk to established projects on an equal footing.

So, let's happily invent new wheels. Mine is pentagonal -- how about yours? ;-)

Shedding Light on Mozilla

2008-11-20T20:55:00.003+01:00

Someone just pointed my to Ubiquity, which is a Firefox extension offering an alternative way of issuing browser commands, using a kind of command line. At a quick glance it looks quite promising.

There are several interesting aspects to it, but I don't want to go into all of them. The one that definitely stands out is that these people seem to have realized something very fundamental: Textual command interfaces can be more efficient and intuitive than the ubiquitous (pun intended) point-and-click interfaces. Woohoo!

Smartass Software

2008-06-05T01:10:00.002+02:00

"The trouble with computers is that they do what you tell them, not what you want." -- D. Cohen

This lovely little quote is quite brilliant: It immediately strikes the hearts of every single computer user. "This is so true..."

But then, I'm brilliant too. I really am. And in fact I'm going to prove it right now: I'll pour some of my genius into making the statement even more brilliant, by means of an addendum:

The trouble with the previous statement is that many people attempt to remedy it.

Eh? What is that supposed to mean?!

(See? That immediately strikes your heart as well! ;-) )

But let's start at the beginning. Computers being able to perform extremely complex and varied tasks, we intuitively assume they must be pretty intelligent. And yet they are so immensly dumb. We have to explain what we expect in every detail -- things that often seem so obvious. This discrepancy is very annoying.

So, can computers -- or actually programs running on them -- be made smarter? This seems a rather logical conclusion: If programs are so annoying because of their dumbness, wouldn't they get more useful if they get smarter? Shouldn't we try our best to make them so?

The answer to that question is quite clear, and if you think you can guess it, you are probably wrong. The clear answer is "no". (Beware of conclusions that strongly suggest themself, and yet are totally wrong :-) )

The truth is that their apperent stupidity is actually one of the major strenghts of computers: The fact that they are perfectly deterministic; that -- if you understand the interface -- you always know exactly what effect a command will have.

It is annoying to have a dumb interface, which requires a lot of repetitive work every time you perform some command. But it's even more annoying to have a "smart" interface, which tries to guess the user's wishes: The problem being that inevitably it will sometimes guess wrong. "Nobody is perfect."

The smarter the software gets -- the more it tries to guess the user's wishes -- the less predictable it gets; the harder to control; the more frustrating. It can save a considerable amount of tedious repetitive actions, but the price is high: While the repetitive actions, being repetitive (am I repeating myself?...), will soon go almost unnoticed, the loss of predictability in "smart" software means that you have always to check whether it's doing what you want; you always have to think about it -- it always takes part of your attention. Not really a net win.

The "T9" text entry system for mobile phones is a typical example: The traditional way, where you have to press the keys the right number of times to get the desired letter, requires a lot more key presses in total -- but it's perfectly deterministic; you can even type blindly. (Really -- I do that sometimes. And I'm not saying that because I want to appear cool... Well, at least not only because of that :-) )

With T9 on the other hand, you have to check every word (except of course for the most common ones, which you know by heart); you have to loop on the feedback, sometimes multiple times, until you get the desired result. (Well, unless -- like many people tend to do -- you skip that part, and send messages that will pose a challenge to a cryptoanalyst, or else could pass for some form of modern art...)

It gets even more wretched when you want to type some word the T9 software doesn't yet know: You have to go back, change the mode, and type it again from the beginnig using the traditional way.

Or you can engage in some absurd manoeuvres, to trick it into giving the desired results: I have seen people try typing a similar but different word which T9 happens to know about, and then go back and fix it into the one they actually wanted to type. Or writing the individual constituents of a long word seperately, and then going back to join them together. (You must know, the German language has this interesting property that you can name pretty much anything with a single word, by connecting several other words into one. Just like Lego -- except that it blows up in your face if you don't follow the man page. Which reminds me of a toaster...)

I had some other good examples on my mind, but unfortunately I forgot them. I know that's a lame excuse, but it has a substantial advantage over any number of brilliant other excuses I could come up with: This one is true! I had to postpone finishing this post for more than a week, and that was admirably effective in making me totally forget what other example I wanted to present. I suck. Here it is, now I've said it. Are you satisfied? :-)

In consequence of this personal failure, finding other examples is conveniently left as an exercise for the reader. But then, I trust you are all smart people; one mind-bogglingly great example surely does suffice to convince you of the ultimate truth? :-)

A while back I wrote about DeepaMehta. While chiefly dwelling on the object navigation mechanism that forms the heart of DeepaMehta, I mentioned that there are some other ideas I like about it. One of them is considering the computer as a tool that is employed by the user to perform his work more effectively, rather than a cheap "assistent" that tries to do his work for him. (And -- like anything that is too cheap -- most likely falls short of any satisfying result...)

Now does this mean that everyone has to be content with dumb interfaces that require us to do a lot of tedious repetitive work? Certainly not -- we should do anything in our power to cut down on such redundancy: Streamlining the interface, providing shortcuts etc. -- anything that reduces the number of key presses and/or mouse clicks necessary to perform frequent tasks; yet always doing exactly what the user asks for, rather than trying to guess his wishes. That's the way to enlightement. See you on the other side ;-)

A Case Against initrd

2008-03-29T01:49:00.002+01:00

I never really liked initial ramdisks. It always felt like a dirty, hackish solution. It tends to slow down the boot process, and it requires maintaining a complete second system environment -- which has to be kept in sync with the main system on upgrades and configuration changes... Rather surprisingly, I don't use initrd on my system :-)

Some other considerations now prompted me to think about this in more depth. And I know you want to read my earthshaking conclusions :-)

During a typical boot process nowadays, we have a succession of four different environments: First one is the firmware. (BIOS in standard PCs.) Second is bootloader. Third is ramdisk. And finally, the fourth is the real working environment.

This is clearly too much. It creates complexity. It creates redundancy -- not so much in code, but in configuration -- a maintenance nightmare. (Be honest: Are you *not* dreaming of broken ramdisks and bootloader entries by night?...)

So, which of these phases could be rationalized? The first one is obviously necessary. (Well, unless we want to store a complete image of the system environment in flash memory, and update it on every upgrade and configuration change :-) ) The fourth one is what we want in the end. But what about the two intermediate stages, poor things, can we cut down on these?

Leaving them both out seems pretty much impossible in practice. That would require the firmware to provide drivers for pretty much any device the user might want to load the system components or information about the startup from (users have an annoying tendency to come up with the most surprising setups :-) ); and the firmware would need to be smart enough to construct and launch the image for a completely working system instance, based on the provided startup information.

I heard that OpenFirmware is/was quite powerful. I doubt though that it had enough drivers to completely avoid the need for additional stages; and I'm also not convinced that it was smart enough to completely load various systems without too much hassle. Of course, it's possible to do about anything with a sufficient amount of Forth scripts -- but then (I'm tactfully omitting the masochism factor here :-) ), it's effictively introducing another stage again.

Anyways, the cold dark evil reality we live in is standard PC BIOS -- which tends to have a considerable number of drivers (though still not enough for all cases...), but is totally stupid -- all it does is load a single sector from the hard disk...

So stage 2 (boot loader) is not really avoidable. Which leaves us with stage 3: Is the boot loader powerful enough to avoid the need for a ramdisk? I tend to believe that with GRUB2, it is.

It comes with a lot of drivers -- probably enough to satisfy any need.

(This is a bit of code duplication of course; with a ramdisk, the system's native drivers are used to ultimately load the final system environment, and the earlier stages can be kept minimal. OTOH, the grub drivers can be much simpler than the system's proper drivers, so it's not really that bad. Moreover, it can't be really avoided anyways -- in general, you want to be able to load the kernel and boot infromation from the same places from where the rest of the system is later loaded... And if not you, then someone else :-) )

Also, it comes with the multiboot loader, which -- in combination with the powerful scripting facilities -- should be sufficient to completely set up the working environment for many operating systems. And if that is yet not powerful enough, there is still the possibility of writing a custom loader module handling specifics of the system. The nice thing is that the module doesn't need to be distributed with GRUB itself (it's nice to keep the bootloader down to a size that still fits on a CD ;-) ) -- it is perfectly possible and reasonable to make it part of the actual operating system for which it is designed.

In a (much) older post, I mentioned my POSIX level driver proposal. Part of it describes boot methods. Aside from a boring ramdisk (sorry...), I also proposed a lovably crazy approach: Implementing a mechanism that allows using GRUB's drivers after the actual system starts, until it has loaded it's own drivers.

Now I realize that it makes much more sense the other way round: Move the boot process (up to the moment all necessary native drivers are available) completely into GRUB. This allows a similar level of flexibility, with considerably less magic. (I'll miss the crazyness...)

The driver proposal relies heavily on extracting information from the filesystem structure, and passive translators in particular; so we need to extend GRUB so that it can read passive translator information from the filesystem, and initialize active translators so that the driver hierarchy can immediately become functional once control is passed to the system.

This could either be done by implementing extensions that can be used in normal boot scripts, or by implementing a loader module that does all the driver setup automatically. The former is probably more tricky, but also more transparent and flexible.

A similar approach should allow preparing the startup of any other system as well, avoiding the need for any initial ramdisk. Good riddance :-)

Advanced Lightweight Virtualization

2007-10-24T09:16:00.000+02:00

Everyone is talking about virtualization now. Well, maybe not your mum; but almost everybody. OK, probably not your aunt either... Well, you get the point :-)

Now I tend to be just this tiny little bit sceptical about things everyone talks about, and thus generally quite late in the game when it comes to crying "me too!". But I think the time has come when I can join in without risking my great reputation as an antediluvian freak.

So, coolness factor etc. aside: Why is everyone talking about virtualization? I think the reason is that it offers a very simple, straighforward solution to a bunch of problems related to various kinds of isolation.

One very prominent kind is related to security: Mainstream operating systems (both UNIX derivates and Windows) by default allow any process in the system to communicate with almost anything else in the system. The concepts of users and file access permissions provide some limits, but these are unsuitable to enforce any serious security policy: They only work under the assumption that software is bug-free, and that users only run software they absolutely trust.

Bolted-on solutions like SELinux allow to restrict the communications channels in theory; but these are extremely complex to manage, making them error prone, and unfeasible to use anything else than simple default policies provided by the OS vendor.

Hardware virtualization on the other hand provides security in a trivial manner: Basically, it just cuts any communication channels -- (almost) total security through total isolation. They err on the other side: Usually you do want to have some communication, and with VMs you have to jump through hoops to set it up, e.g. using virtual network interfaces.

A somewhat related use case is isolation in administrative matters: With a VM, the guest system is completely independent from the host system. It can be configured differently; it can be upgraded without affecting the host system, as well as the other way round. You can have different user accounts. And so on.

Again, the cost of total isolation is... Well, total isolation :-) It means that you have to manage each VM individually -- sometimes a desirable property, sometimes a burden. (And most of the time, a bit of both...)

Last but not least, VMs allow total isolation of interfaces: The guest system only talks to the (virtual) hardware, and is thus totally independent of the functionality and interfaces of the host OS -- you can run a totally different system inside the VM.

Here, the downside of independence is a lot of overhead, and very poor resource utilization. Paravirtualization cuts this down a bit, but doesn't fundamentally change the situation.

(This is a blessing for hardware vendors of course -- especially as standard application vendors lately have been slacking a bit with bloating their software to make up for recent increases in processing power and memory sizes...)

All in all, while hardware virtualization provides total isolation in all regards, it is often total overkill too -- more isolation than necessary or desired.

Various kinds of container mechanisms (vserver and OpenVZ in Linux for example) are an interesting alternative in many situations. Here, you have a single instance of the system, but several isolated user environments -- so you get isolation of communication channels, and usually also some administrative independence (at varying degrees), but without the overhead of hardware virtualization. (The term "lightweight virtualization" is sometimes used for that; however, it doesn't seem to be widely adopted: Google gets some relevant hits, but not really that many...)

What these container solutions can't do (apart from being less robust against security exploits, due to the common system instance), is running a different system in the subenvironment.

There are also some specific middleground-solutions like User Mode Linux or lguest, which allow running another instance of the system, but with less overhead than true hardware virtualization.

Now let's take a look at the Hurd. It's main feature, compared to traditional (monolithic) UNIX-like systems, is the fact that almost all system funtionality is provided by optional layers (servers and libraries), which can easily be replaced: Any user or program can run it's own services instead of using the system-provided ones -- thus creating a different environment, with little or no overhead, and without affecting the rest of the system. (This is a tribute to the GNU philosophy, that a user should always have full control over the software he runs.)

By default, all processes run in a single standard environment; but upon demand, any process can be put into some different, more or less independent subenvironment. There are endless variations: You could run select processes with distinct instances of some default servers, to increase robustness and scalability; you could set up containers isolated from the rest of the system; you could use a different variant of some server, e.g. a different network stack optimized for some specific use case; you could run another instance of the whole system (this is called subhurd or neighbour-Hurd); you could run a special enivronment, with well defined versions of certain components, to be sure that a certain feature is present independent of the host system, or to avoid possible incompatibilities through changes in the host system; you could even run a totally different system, having little in common with the main one. All of this can be done on any running Hurd installation, without any modification to the host system.

We haven't been expressing these Hurd features in terms of virtualization up till now. But I think it makes perfect sense to do so: It seems common practice to describe various facilities of this kind by the term "virtualization"; and saying that the Hurd is designed from ground up to support fine-grained virtualization, is certainly more perspicuous to most people than talking about user extensibility.

So, let's be more buzzword compliant :-) Let's call it advanced lightweight virtualization.

Theory of Filesystem Relativity

2007-07-24T22:14:00.001+02:00

It has been pointed out that the Hurd chroot implementation has serious problems in connection with passive translators, resulting in unexpected behaviour and gaping security holes: When someone sets a passive translator (e.g. a firmlink) within a chroot, and then accesses the translated node from within the chroot, the translator will run *outside* the chroot, but will be accessible from inside it -- meaning you can easily escape the chroot. (Using something like settrans -p tunnel /hurd/firmlink / )

This is a serious flaw in how passive translators work; and it has been used to demonstrate that supposedly passive translators are broken by design, and should be replaced by transparent system-wide persistence. However, I doubt such a radical conclusion is really appropriate.

For one, the original Hurd designers said at the setout that chroot will only be supported for compatibility if it can be done without too much hassle. One could very well claim that a secure and perfectly consistent chroot implementation was never intended. Yet, I do think that chroot can be fixed without totally overthrowing the passive translator concept.

But before diving into this, I'd like to mention that IMHO the current chroot implementation, using a system call handled in the actual filesystem servers, is wrong. It seems much more hurdish, more flexible, and more robust, to use a filesystem proxy as / for the chrooted process. One advantage is that the same mechanism can be used not only for chroot -- which uses a proxy that simply translates all paths so that they point to a subtree instead of the global / -- but also with other kinds of proxies to achieve different semantics: For example using a proxy that mirrors the global /, and only replaces a few specific locations. (/servers/* is a likely candidate, which allows replacing default system servers.)

Now back to passive translators. I can think of quite a lot of possible approaches:

Simply don't allow setting passive translators inside a chroot at all. After all, chroot is only for UNIX compatibility, and translators are not a UNIX concept...
Allow setting passive translators, but only temporarily, not storing them in the underlying filesystem. When accessing the translated node, the translator is started by the chroot. Allowing passive translators but not really storing them is a bit unelegant, of course...
Store the passive translator, but also store the chroot information; and only start the translator if the node is accessed from within the same chroot.
Store the passive translator and the chroot, and whenever the node is accessed, run the translator in a matching chroot. This might be the most elegant solution. Only problem I see is that the translator is run in an identical, but not the *same* context. For chroot this shouldn't be a problem I believe; but some other kinds of chroot-like subenvironments might break: If you have some kind of subenvironment, where some things are local to the specific instance, running the translator in a different instance might not do the trick. But as I said, for a normal chroot it should be fine.
Last but not least, we could simply allow setting passive translators from within a chroot normally like it happens now, but when a translated node is accessed, the translator started would run in the context of the process accessing it -- which is different for a chrooted process than for a normal one. (For consistency, any active translators running outside a chroot would have to be ignored inside it...)

One could claim that the last variant is actually the only sane one: It's a bit confusing that the translators will refer to something else within the chroot than outside it -- but in most situations that probably is actually the most useful behaviour. Also, it's how symlinks behave in a chroot.

Of course, in some cases you actually want the other behaviour. There is really no solution that always does the desired thing. A similar problem arises with translators or symlinks on NFS: Should they be resolved on the client or the server side? Sometimes it's desirable for links or translators always to refer to the same physical location on the server, no matter whether the FS is accessed through NFS or directly; while sometimes it's more desirable to always refer to the same logical location, so you always get an appropriate local resoure on the machine where the program runs...

Another situation, which also isn't specific to chroots or even to translators (I experienced the problem with symlinks on Linux), is handling of mount points: Should translators and symlinks refer to the root of the actual file system (partition), rather than the VFS root, so they always point to the same physical location no matter where the FS is mounted?...

It seems that in most other cases involving symlinks or translators, the rule is to have them referring to the same *logical* location in changing contexts; so, while I'm not sure whether it's the more useful behaviour, it at least would be most consistent to go with the last suggestion, i.e. make passive translators within a chroot always run in the context of the chroot.

External Insistence

2007-01-25T20:47:00.000+01:00

In an earlier post, I explained my concerns with transparent system-wide persistence. One of the problems I pointed out, is that in such a system, you have to manually serialize all important state on upgrades and in some other situations anyways, relativating the value of transparent persistence.

Marcus Brinkmann now showed me a nice text explaining the upgrade problem in great detail. It's a good read, at least the first half.

BTW, meanwhile I refined some of the ideas I tried to explain in the original discussion, and I might post an upgrade at some point; but all in all, my concerns with transparent system-wide persistence haven't changed.

MehtaHurd

2006-11-24T16:42:00.000+01:00

Recently, I saw a presentation on DeepaMehta. Sounds quite promising: Completely new approach instead of the traditional (broken) desktop metaphor; adapted to the way we think... Of course, I was mighty sceptical. It turned out more interesting than I expected.

Putting it bluntly, DeepaMehta is kind of an extensible filesystem browser with an advanced bookmarking/navigation system.

Of course, this statement is a misrepresentation in several regards. Most notably, DeepaMehta's objects aren't really meant as simple bookmarks. They can store considerable amounts of metadata, as well as actual data, and only optionally can reference traditional files or other external data, but normally they exist on their own -- in fact, the idea is that ideally (almost?) all data is stored in DeepaMehta itself. Also, DeepaMehta doesn't care at all about the traditional filesystem structure... Well, let's get more specific.

DeepaMehta is based on three core observations regarding existing desktop systems. One is that overlapping windows are an abomination that needs to be avoided. This is a very important issue of course. DeepaMehta always displays only one navigation window, and one window presenting the content of the currently active object -- just like Norton Commander in quick view mode... (Somewhat resembling Oberon, which also has a content pane plus another pane for other stuff -- though the division is different there.)

Limiting to always just one content window is a bit extreme IMHO: Seeing several things at the same time can be useful sometimes, given a sane (tiled and dynamic) window manager. Nevertheless, it's a Good Thing (TM) that DeepaMehta addresses the problem of overlapping windows.

The second observation is that traditional applications make no sense. Instead, we need a generic shell application, with plugins to be able to perform specific actions on various object types -- again a very important issue. To some extent, we see this happening with things like Nautilus already; but DeepaMehta is much more consequential: All navigation actions are performed in the navigation window, using standard mechanisms supported by plugins for creating specific object associations; and all object viewing/editing actions are performed in the content window, again supported by plugins for specific object types.

The third observation is that traditional navigation mechanisms are quite unsuitable to the way we work; for many tasks, a new approach can be vastly more efficient and convenient. This is the strong side of DeepaMehta, and the really innovative part: It uses a navigation system based on mind maps. Basically, you have all your objects, as well as various types of bilateral associations between them.

As there are far too many objects and associotions to make this manageable directly, you never look at all of them at once, but instead use various partial views (maps). A new view can be created by starting from some object, uncovering the associations (and associated objects) you are presently interested in, and going on from there, until you see all you want in this view. Once you have a view with all the objects interesting for your present occupation, you can easily swith which of these objects is visible in the content window at any given time, by focusing it in the map.

Constructing a map can also happen automatically, e.g. when browsing a web site in the content window: Each time you click a link, the linked site and the link association is added to the map. Othewise, creating a map by hand can quite tedious. (Especially as DeepaMehta, while introducing totally different high-level concepts, is very old-fashioned at the level of actual UI elements -- all the map management for example is done through nested context menues, requirering lots of searching and endless clicking...)

For some tasks, this seems way too static: To efficiently browse a filesystem (or generally to quickly move through any larger object structure), some more automated approach seems necessary. Not sure how to do that; but I think it should be possible to come up with something nicely extending the concepts in this direction.

There are some more fundamental problems with the existing DeepaMehta implementation, however. (Aside from the fact that it is written in Java...)

For one, it is centered around the idea that the future is in the net, with all intelligence in servers, and workstation machines being only dumb clients -- a pretty absurd notion. In spite of all the babbling of dotcom freaks, it's pretty obvious in any realistic view that the local system should always be the primary focus of a desktop framework. Networking is a mostly orthogonal issue, which can be nicely integrated using other mechanisms, for example a transparently networked filesystem like Plan9. Designing a desktop system as client-server from the ground up, only makes it more complicated and less flexible, creating much more problems than it solves.

The second major problem with DeepaMehta is that it creates a world of its own, with little relation to existing environments. This obviously makes transition pretty hard. Moreover, as it's impossible (or at least very unrealistic) to run a whole operating system, including administration etc., with DeepaMehta alone, there will always be the DeepaMehta layer on top/beside the rest of the system; and like with any sub-plattform, this creates very serious integration problems.

For these reasons, I don't consider DeepaMehta as is very useful in a broader view. Yet, the navigation approach seems immensly valuable. So, how could it be integrated at a generic level in more traditional systems? I have a number of ideas for various methods how to implement the DeepaMehta navigation concept through extensions to the normal filesystem. (The Hurd architecture makes such extensions easy...)

The basic idea is, instead of having a special object database like DeepaMehta, to use traditional filesystem entities (files and/or directories) as the primary objects. We already have links in the filesystem, so one might imagine something based on that for the object associations.

In a very crude approach, the DeepaMehta semantics could be expressed by some defined structure in a traditional filesystem. Every object would need to have its own directory, containing the main file as well as the object type designator, additional attributes, and object associations. An association could be represented by a link referencing the associated object, plus some file or a second link denoting the object type. (The object type itself can be described by a special object...) Maps could be either stored in additional subdirs inside the object directories, with pointers (links) to the associations visible in the particular map; or in completely distinct directories, linking all the objects and associations visible in the map.

This approach should work fine with programs that are aware of the special semantics. However, it creates lots of other problems otherwise. If you access the objects with normal programms, you always have to specify the main file inside the object dir. If you copy or delete objects, you have to do it recursively on the object dir. Generally, it's very complicated and not really intuitive. Also, it requires a special structure, but this structure can easily get destroyed when accessing stuff with traditional filesystem tools. As traditional links are unidirectional, it would either be very inefficient to figure out the links in the reverse direction, or it would require redundant links in both directions, which need to be kept consistent.

An alternative is to extend the standard filesystem mechanisms. This would allow to attach all necessary information to the files themself, instead of putting them as additional items in a special directory. Various modifications to the linking mechanism are necessary for that: Normally, links point from a directory entry to a file or another directory. For DeepaMehta-like relationships, we'd need links that directly connect one file to another. Also, they would need to be bidirectional, and have a type attribute. Maps could be represented as special filesystem entities, which can be browsed like directories, but with a map-like instead of the traditional hierarchical structure; presenting an alternative view of the main directory tree.

This way, navigating through maps should work in a pretty nice manner even with traditional applications; and when accessing objects with normal programs, they behave just like ordinary files. On the other hand, there is no way to access the object relationships and other additional attributes, or to modify maps, with tools not aware of the new features. Also, this approach is very invasive in general, making quite fundamental changes to various aspects of the filesystem functionality.

I think a more hurdish approach is to introduce a Mehta-translator: If you want some file to appear as a DeepaMehta-like object, just set the translator on it. Without the translator, it appears like a normal file; but with the translator set, it is presented as a pseudo-directory, with all the additional information. This way, the main file can be easily accessed and managed with normal applications (without the translator active), but the additional properties can also be represented in such a way that they can be accessed with traditional tools. Unlike with the directory representation on a traditional filesystem described above, the translator can enforce consistency when properties are manipulated through the pseudo-directory. It can also introduce other special semantics for the pseudo-filesystem entities, making it more intuitive.

Maps can be implemented with another translator, which creates a pseudo-filesystem representing the map as a special directory structure. (Much like in the variant with special FS features, but through a simple translator instead of modifications in the normal FS.)

Some light and some shadow

2006-03-13T01:47:00.000+01:00

Today I stumbled upon a very interesting article on propsed design concepts for KOffice, which makes Martin Pfeiffer the winner of the KOffice design competition. I haven't looked at the other contributions; but taken by itself, it looks like the award is well deserved.

The proposal contains lots of innovative, mostly good ideas -- as I've already hinted in some of my other postings, a feature IMHO sorely lacking from most UI concepts.

It implements, at least partially, many many principles I'm badly missing from existing GUIs. (Though by far not all, of course...) It also proposes some very interesting totally new ideas I haven't even thought about so far. It suggests solutions for some problems I'm seeing in existing systems. It makes me think more consciously about some things I vaguely considered before. All in all, it starts a lot of valuable thoughts. And, of course, it also has a couple of ideas that I do not agree with at all. (Well, what you'd expect, perfection? ;-) )

I won't dwell on all the specific aspects now. (I might pick up some in later posts.) It's way too much interesting stuff. Instead, I'll pick a single, very very fundamental issue -- one which this proposal sadly gets (almost) all wrong: Integration.

While it's obviously right that better integration between the various office applications is desirable, there is a fundamental misconception how to achieve this. The idea of having individual "viewers" for the various document types, and something that integrates them, is even basically right -- in spirit it goes in the right direction, towards what I call a hurdish application infrastructure. (Though not very far.)

There is also the very important realization that a shared canvas implementation should be used for displaying the contents, making the different document type "viewers" basically only transform the documents. But having the canvas in a shared library is a poor choice -- though typical for traditional monolithic UNIX systems, which do not offer an integrated object access approach like Hurd's translators (or Plan9's FS servers), to make implementing common functionality as server processes easy.

Where the proposal is really fatally wrong, is what that "something" integrating the components should be: It wants a single specific main application, gluing the whole into a sealed system, creating a desktop for office work inside the main desktop -- instead of fixing that one to provide the necessary integration. Reenforcing the dominance of closed monster applications, buying inner integration at the cost of making any outside interaction awkward, thus creating ever growing subworlds, ever more alien among each other. Ever more duplicated functionality, as it becomes so painful to use anything not part of the closed subworld. Creating another Emacs.

Barring fun

2005-10-12T03:00:00.000+02:00

Regarding the last post: Thinking about it a bit more, my statement that this is unrelated to the issues I'm usually talking about here, is not quite true.

On the contrary: "-Ofun" is a nice way of putting it; but a large amount of the stuff dicussed in the essay (maybe even all of it), really boils down to tearing down entry barriers -- here, in a social context. Just shows how crucial this principle is in general :-)

The importance of having fun

2005-10-08T16:48:00.000+02:00

This is somewhat unrelated to the technical issues I'm usually discussing here. But I'm agreeing so fully, and the issue seems so crucially important to me, that I want to point to it anyways.

Eric S. Raymond pointed out many organisational issues for successfull free software projects in his famous Cathedral and Bazaar paper. Most of the things he mentions there are right. But all of them are actually only a function of the fundamental idea the -Ofun, pointed out in this article.

And I know no other project so badly following it, so exactly doing the opposite of -Ofun, like the Hurd :-( It's really that, and nothing else, what makes Hurd's progress so slow. Can it be fixed? I hope so...

The other side

2005-10-03T23:50:00.000+02:00

After recently commenting on KDE, I've now also stumbled upon GNOME's project Topaz, which -- together with some other pages it links -- describes future ideas for GNOME.

The Focus is a bit different than KDE's. (Fundamental UI changes, including "extensions", are mentioned at the setout, but there are very little actual ideas. Most stuff revolves around under-the-hood technical improvements.) However, with regard to new ideas, it's just as disappointing.

There are a couple of quite fundametal changes suggested -- mostly not terribly new, but still. There doesn't seem to be any special emphasize on any particular one of these. I'm just picking the one most relevant in my opinion: The VFS.

There is a page on the new GNOME VFS, which in turn links to the Desktop VFS from freedesktop.org. They are both describing mostly the same ideas, and I'm not sure about the exact relation between those projects, so I'll just treat them as one.

So they have -- correctly -- discovered, that POSIX file handling is unsuitable for most of today's applications. Full agreement here: We made the same discovery when designing various Hurd translators. open(), read()/write(), close() were sufficient in times when most Unix tools worked as filters, sequentially reading input files, sequentially writing to output files. Most of today's applications require different semantics.

One very important operation, as the GNOME folks have accurately observed, is atomically reading or writing a whole file. They want it because most of today's applications read the whole file when "loading" a document, and store the whole file when "saving" -- which is wrong IMHO, but that's a different story :-)

Nonetheless, in the Hurd it is probably even more fundamental: When a translator is exporting data through file nodes, it is extremely common for clients to read the node contents into a string, or store a string in the node. A simple operation doing this in one call, would be awfully useful. Not only because both client and server need considerably less handling for that, but also because knowing that the client just wants to write the whole file, is very important information for the server.

Generally, we need much more semantics in file operations than POSIX offers. Think of inserting data in the middle of a file. With POSIX, the only way to do that, is to overwrite the entire rest of the file. This is not only complicated and terribly inefficient: In case the underlying file is served by some more interesting translator, it can actually pose serious functional problems, if parts of the file are overwritten for no real purpose.

For operations that don't fit any of the generic semantics (write entire file, insert data, ...), we probably need to introduce transactions, to allow manually grouping primitives into semantic units. (This is probably what OGI's comment to an older post was referring to -- which would mean that at last I've understood the idea behind this comment :-) ) For many translators, it's crucial to know whether an operation is completed and data should be processed, written to the store/network, whatever; or whether following calls will alter it further. With POSIX only, some translators can only be usefully implemented by employing quite sophicticated caching and heuristics, if at all.

So, the GNOMEs are right about the necessity of new filesystem semantics -- though I don't know if they'll get all the issues mentioned above right. Sadly, that's where sanity stops in their proposal(s). A special filesystem API for desktop use only? That's absurd. How did they get that silly notion, that the file access requirements of "desktop" applications are fundamentally different from command line tools or daemons, so much as to warrant a special API for desktop use alone?

Oh well, I guess that's the general problem of GNOME (and KDE): Considering the underlying system(s) as given, they tend to pile layers on layers of workarounds, instead of much more simply and usefully fix it right at the system core level. (Reminds me of MS Windows, which started as a desktop environment, and ended up being an OS... In just a few years more, we will probably hear people say: "GNU/Linux? Isn't that obsolete? I'm using GNOME!") This just shows that we really need a GNU kernel, so the developers of a GNU desktop environment won't need to sink tremendous amounts of time into working around limitations of systems they have to run on in lack of a native one, where they could get all the functionality they need... But well, that's a different rant.

So, now that we agree ;-) an extra API for better FS access is silly, what is the alternative? That's obvious: Just like we already have Hurd extensions to POSIX interfaces in many other places, we should try to extend the standard POSIX file operations with the stuff we need, without forsaking compatibility. I'm pretty confident we can do this. (I've already discussed some aspects in conjuction with device drivers.)

Clients not aware of the new semantics, can continue using the old ones. Those that want to use the new features, will check with the server whether it implements them, and fallback to the traditional stuff otherwise. Most of this can probably be handled transparently in libc (client side) and/or the FS server helper libs: If a particular server doesn't know about the atomic file read/write operations for example, it will just get a series of standard POSIX requests doing the job instead. No need to force a switch to an incompatible new API.

Involvement Engineering

2005-09-17T18:14:00.000+02:00

Yesterday I stumbled upon KDE Plasma, which is an attempt to bring some innovation into KDE's next major release.

So they discovered that desktops, except for minor incremental fixes, have hardly improved over several decades. This is a good start.

Sadly, it mostly already stops here. It seems they made the discovery, but have no idea what to do about it. Reading their "Vision" document, it is much like reading Microsoft announcing a new Windows version: Everything will be so much better, and more buzzword-compliant, and new possibilites, and stuff... But going down to the details, you realize that hardly anything has really changed.

OK, that is a bit unfair. There is a bit more in Plasma beyond the realization that change is necessary. There is for example some rough notion (in the form of extenders), that the desktop geometry needs to become more dynamic -- though they constrain it to a specific domin, instead of making it a more fundamental concept. There is also a rough notion of alternative sessions for different use situations -- though they create a specific solution for desktop layout, instead of considering a more fundamental concept. Yes, there are some things going in the right directin, even if in extremly small steps.

And there is this applet concept. It isn't really new; similar concepts existed in specific environments for a long time. It isn't even terribly new as applied to the desktop. Beginnings existed in the form of dockapps and similar for quite long; and Apple, though not very long, has something -- I believe -- quite similar to what they are planning in Plasma.

Nevertheless, this made me consider the topic a bit closer. So, what are the distinctive concepts that make up these applets?

Basically, we have comparatively small, independantly developed hacks, adding functionality to the application/system, without touching the core. There are numerous examples of such an approach resulting in a thriving community of extension developers: TeX/LaTeX, Emacs, Perl, Firefox and Greasemonkey come to mind.

The most fundamental property is probably the fact that those hacks -- whether called extensions, widgets, applets, whatever -- do not touch the application/system core, but are developed using some extension language/API. There is a number of implications from this: People can start writing improvements without knowing the language of the core (usually more low-level than the extension language), or the core code -- just build on some simple APIs. People can do their improvements independantly, without discussing with the core developers, according to an own release schedule etc. They can distribute them independantly. Problems are less critical, because they do not touch the core, and because they affect only those people who explicitely chose to use the extension. For the same reason obscure features can be implemented without bloating the core. And so on. While some of the advantages may be actually organisational or even psychological rather than technical, all of them contribute to lowering entry barriers for people to contribute improvements, for getting actively involved.

The good news is: The Hurd lends itself to such a concept. With a hurdish application infrastructure constructed of small components building upon each other, the whole system actually could be considered the applet idea taken to extreme. (Which is one possible explanation why I like it...) All services being exported via filesystem interfaces, we also always have very accessible extension APIs to easily plug applets into. And with the concept being consequently implemented throughout the system, it's much more powerful than any solution limited to the topmost layer like in Plasma or any other specific approaches possible on traditional systems.

Now the only thing we really need to be careful about, is to ensure Hurd components are really easy to create, distribute, and install...

Persistance vs. Insistance

2005-09-11T19:58:00.000+02:00

Session management is a thread of thought I only recently took up in my considerations. Nevertheless, it touches on many other aspects -- it looks like this has been the missing link in various contexts. But more on this later.

Session management basically means: I don't want to manually restore my work environment on each bootup; I want it to come up just as I left it -- usually at least.

There is little doubt working session management is something users really really want to have. I say working, because retrofitted concepts like X/KDE/GNOME/whatever session management (which is the most explicit most users know), tend to be incompletely implemented and thus useless.

In a UNIX environment, most of the system doesn't know session management. To change the startup behaviour of something, you usually need to explicitely change settings, either by editing config files; or in GUI applications, usually there are a number of startup-related settings in the options dialog.

Nevertheless, besides the not-working X session management, you'll find many places where session management aspects have sneaked in, if you look closer. A typical example would be the soundcard mixer init scripts, which save the current mixer settings on shutdown and restore them on next startup. Some applications, like Opera or Vim, have options to explicitely and/or automatically save/restore their sessions. The shell (and some other command line based applications) save the command line history. In the Hurd, we have passive translators, as kind of session management at the filesystem tree level.

And then there is screen, which implements kind of a poor man's universal session management: When you detach a screen session from the terminal, the software just runs on in the background -- instead of doing real shutdown/startup, you just detach and reattach your session. (Screen sessions are usually used on servers, i.e. systems that run permanently with rare interruptions.) Those who know this feature usually love it -- by this little "trick" it implements something resembling session management that really and always works, because the applications needn't be aware of it at all...

Another, less obscure example of transparent and thus (more or less) reliably working and widely established session managment-like behaviour, is suspend to disk. Just puts the whole memory image to disk, and restores the exact same situation on resume. (The exception are most hardware drivers, which obviously need to be aware of the suspend. This was actually the main starting point for my considerations -- the suspend infrastructure of the hardware driver framework could and should be extended to the application level...)

A very similar approach -- which could be considered an extension of suspend, although it has a completely different origin -- is persistency like in EROS. The difference is that here the image is saved not only on suspend, but periodically (every 5 minutes), so the system will come up in the last state even on a power outage or so.

One side effect of system-wide persistence (again, with the exception of system core and hardware drivers), is that you need no sophisticated system boot and program startup mechanisms -- you just add objects to the system once, and they live on forever. (Unless you decide eternity is a bit too long, and get rid of them earlier...) Which is the very reason why EROS has it: Making the whole system persistent seemed easier than creating a method for secure explicit storage/retrival of capabilities. But they claim it's desirable for usability purposes also...

However, there are downsides to this. Completely transparent session management works fairly well in case of specific things like screen, because of the scope of a screen session usually being quite limited; and suspend, because of it typically being used only for fairly short breaks in work that means to be continued exactly where you left off. But a completely and always persistent system, creates various problems.

For one, to update software, you basically need to create a new object in place of the old one. If you want to preserve state information in the process, you need to implement this explicitely: Either run the new version in parallel and pass the state from the old one to the new (session handoff), or dump the old state to a third party temporarily and read it when starting the new version (session saving). In both cases, you are effectively implementing explicit session management -- all of the nice transparency is gone.

This actually doesn't only happen on updates: If you want to move some state information, e.g. when replacing hardware or working on a different machine or whatever, you need the very same protocols -- only that the actual state transfer gets more complicated.

Another, related problem is flexibility in general. Transparent persistance is a sledge hammer approach: You always get the whole thing. But in many situations, you do not actually want to, or can't, restore the system in the exact same state. Maybe your system configuration changed meanwhile, so some stuff won't work or doesn't make sense anymore. Maybe you screwed up and explicitely want to easily get rid of parts of the old state. Maybe you want an easy way to active only parts of your session at times. Maybe you want to carry along part of your environment to different machines. And so on. You might even have your system on a USB stick and need it to adapt to a different machine each time you boot it!

So what we really want is a flexible system of subsessions, that can be restored or not upon demand and/or resource availability. One session for the core system, various sessions for individual hardware components, a number of sessions for background services, some sessions for higher-level system components traditionally handled by runlevels (networking, windowing environment), and lots of sessions for individual parts of your application envirionment -- a music player session, a news reading session, a communication session, various sessions for projects you are working on at times, etc. You get a whole tree of subsessions and sub-subsessions building upon each other. (Those subsessions are also related to various other mechanisms like resource management; security; and forming an application infrastructure from generic components, as touched in my post on a hurdish X implementation -- that's the "missing link" aspect mentioned at the start. I'll handle those in other posts.)

So between disfunctional retrofitted session management, and sledge hammer total persistence, we really want a session management approach that is not quite transparent, but fully integrated and consistently implemented throughout the system.

Implications from that requirements, as well as ideas how it could be implemented, I'll leave out for now, as this is already getting quite lengthy...

Welcome to HELL

2005-09-04T23:49:00.000+02:00

An idea I'm contemplating for a while now: Hurd Emulation Layer for Linux.

The idea is to create an emulation environment, which would allow running hurdish applications on top of a familar GNU/Linux system. Now this is a bit unsubstational of course, not having explained so far what I mean by hurdish applications, except for the mention how I imagine a hurdish X implementation.

Well, doubtless this will be motif in many coming entries. So just for starters: The fundamental concept is to replace bloated monolithic all-in-one applications by many small translators providing specific services, and some mechanisms to combine them into systems providing the desired overall functionality. Ideas how this would work in particular, as well as why I believe that on the Hurd -- unlike on traditional UNIX systems -- this actually *can* work, I will also cover in other posts.

Well, having said it is not possible on traditional UNIX systems *natively*, I do think it should be possible to create a limited, somewhat Hurd-like environment on top of such systems -- far from perfect, but just enough to run hurdish applications with no or only little changes. Some mechanism to emulate translators, something to map Hurd free form RPCs on UNIX IPC mechanisms (terribly inefficient but probably possible), something to emulate some of the Hurd extensions of the traditional UNIX interfaces. Of course it would break as soon as programs try to dig too low, using specific functionality of Hurd core components etc. But for most higher-level applications, it can probably work.

As for actual implementation, I only have a number of rough ideas. We might be able to emulate translator functionality by playing with FUSE. To run hurdish programs, we could link them to a special modified variant of the Hurd libc, emulating the functionality on top of Linux instead of contacting Hurd servers; or we could employ some proper sandboxing, intercepting system calls etc. For some commonly used Hurd services, we might even try to run hacked Hurd servers, or replacements providing the functionality in the foreign environment. Non-hurdish processes could be provided with some hooks via LD_PRELOAD to better cope with the emulated Hurd features when interfacing to hurdish processes.

One fundamental question is whether to wrap every single hurdish process in its own HELL, mapping the Hurd functionality to UNIX mechanisms such that the individual processes can be combined in a Hurdish manner; or to provide one single big HELL environment (kind of a userspace Hurd implementation), interfacing to the non-hurdish world only at the outer borders. The first one may be more desirable, as it integrates better into the system. However, it probably means more overhead; could cause some troubles due to non-perfect mapping of Hurd features; might be harder to use; and most likely is harder to implement.

Of course, there are also strategical considerations involved -- do we really want to have HELL? This is a similar uncertain question to whether porting free software to proprietary systems like Windows or Mac OS should be encouraged. (Though not politcal in the HELL case.) On one hand, it encourages authors to write hurdish applications, as with HELL the audience isn't limited to actual Hurd users; people who for various reasons can't or don't want to do the switch, can still use these programs. (I'm considering making netrik a native Hurd application. However, I don't want to exclude GNU/Linux users alltogether. That's how I originally came up with HELL.)

Also, discovering some of the advantages of the Hurd approach while using HELL, people might be encouraged to switch to the Real Thing (TM), where these advantages come into their own, the hurdish concepts being used consequently throughout the whole system, not only in a limited environment.

OTOH, people being able to use hurdish applications and reap some of the advantages on a traditional system, might actually get lower inclination to switch; they might get stuck with a suboptimal system forever.

I tend to believe HELL would act more as an enabler for Hurd than a stopper -- YMMV.

Return of the Bad Guys: A tale about a little interface

2005-08-28T01:55:00.000+02:00

Once upon a time, The Fathers were dissatisfied with the then current situation of graphics support on Linux. At those ancient times, there were basically two options: svgalib or X. It was an or, not an and, because these two didn't mix terribly well.

So The Fathers set out to fix this situation, and created a sophisticated scheme for a General Graphics Interface. It was designed to encompass the existing options in a nice overall framework, allowing the mixed use of both svagalib and X. It should allow running svgalib applications in X, and X on top of svgalib; even nesting them at will. In short, it should allow seamlessly running any graphical application in any environment, all reliably and securely in parallel.

Fulfilling those goals basically required two major components. The environment independance would be implemented by a library abstracting the different backend targets in a common interface, plus a set of associated frontends to allow using all the existing applications on top of it. (svgalib wrapper for running svgalib applications, and XGGI for running X.) Of course, applications could also use the native GGI interface directly, making use of the backend abstraction it provides.

libggi only as an abstraction for X and svgalib targets however couldn't fulfill all of the desired goals: Having no provisions for secure sharing in svgalib, and sharing only between its clients but not with the outside world in X, reliably using an X server and some svgalib programs on the console in parallel is not possible. Doing this required a more sophisticated, native target implemented mostly in libggi, but requiring some kernel support for secure sharing. That support would be accessed by a Kernel Graphics Interface -- the second major component -- which would implement the actual hardware access and sharing in a kernel driver, just like other drivers do. It would cover just the stuff really necessary for secure sharing (mode setting, framebuffer setup, acceleration pipe access), but not any higher-level logic that can do without kernel support. (It would also require some rework of the console system, to work with KGI.)

So what happend when The Fathers introduced GGI/KGI? Well, libggi found it's niche as a nice multi-target graphics library when used directly throuh the GGI API; it is still active and becoming more and more powerful to this day. The little sister KGI however had less luck: It was immediately faced with strong backlash, from several sides. The X window folks considered X to be The One And Only (TM) graphics interface, that should be used by everything. The kernel guys opposed the idea of integrating graphics drivers into the kernel, maybe because "putting graphics in the kernel" is considered a windows thing or something, and people suggesting it were considered the Bad Guys; maybe because people didn't realize the fundamental difference between low level graphics drivers -- which like other drivers belong into the kernel -- and higher level graphics handling that obviously does not.

Maybe the time wasn't right for an idea that seemed so radical back then, when graphics support was still considered something very special that is best handled by an external entity; when people still believed X could be integrated better into the system over time, removing the entry barriers and making other approaches unnecessary.

While The Fathers struggled on afterwards, the steam was mostly out; KGI soon lost momentum.

Today, things are quite different. In the meantime, the simple fact that some platforms just do not have such a thing as a text mode, forced addition of framebuffer support in the kernel; but once there, it was warmly received -- it turned out many users *want* that kernel support, avoiding the considerable problems associated with the pure userspace implementation done by X. In fact, people even created a complete graphics system called DirectFB, which hacked accelerated graphics support on top of the kernel framebuffer interface. (But lacking kernel support for the acceleration features, inherited many of the problems of svgalib.)

Also, it turned out even X required some kernel support for efficient 3D acceleration, introducing the DRI/DRM interface.

In short, today kernel graphics is a widely accepted fact. Today, people no longer concentrate on preventing graphics support from entering the kernel, but on how to implement a *clean* interface; discussing a mode setting API and everything.

Maybe it's time for realizing that the little baby child KGI, while quite lifeless from the bad treatment it received, is still around; that it's not that ugly after all, but on the contrary offers quite exactly what people are looking for now: A clean, generic, well thought out interface for supporting mode setting, framebuffers, and acceleration feature access in the kernel; and that in fact it has done so all the time, though not recognized for that.

Maybe it's time to give this little child a hand, to let it grow, shape it a bit maybe -- so it can become really great.

The next step

2005-08-25T02:30:00.000+02:00

Most people consider the Hurd only a project to replace monolithical kernels. IMHO, it is more.

On the irc.freenode.net##hurd channel, we just had another discussion on the X window system. (You can read it up in the channel log.) Which seems a good occasion to summarize some of my thougths here.

While there are a number of other things that are flawed about X (which I may touch in other posts sooner or later), there is one really fundemental problem: The X server is basically a gigantic monolithic beast, suffering from much the same problems as monolithic kernels. (Flexibility, extensibility, robustness, usability, security, etc.) And it needs to be fixed in much the same manner.

The nice thing is that the underlying Hurd concepts (RPC, translators, etc.) not only give the foundation for reimplementing the functionality of monolithic kernels with a multi-server system in userspace, but also for refactoring monolithic higher-level infrastructure components like X -- just like the Hurd is splitting monolithical kernels into a set of interacting servers handling individual parts of the functionality, a hurdish windowing system will split the functionality of X into individual servers. (And just like the Hurd uses libc to implement POSIX interfaces on top of the multi-server system, allowing for a smooth and flexible transition to more powerful concepts, we will need a replacement X library implementing X interfaces on top of the multi-server windowing system.)

Unite and Conquer

2005-08-24T00:21:00.000+02:00

Returning to my POSIX level driver proposal: Many people expressed concerns that the standard filesystem semantics I want to use for (most) driver communication are too slow and not really appropriate.

The proposal explicitely mentions the possibility of using shortcuts wherever we experience serious performance problems with FS sementaics. Now recently I had some intitial discussion with Peter de Schrijver (p2-mate) at freenode.net#hug, on what specifically are the problems with the POSIX interfaces. It turns out that mostly the drivers have some generic, quite similar requirements. This means that instead of creating specific shortcut protocols only for some extremely demanding drivers, we should probably rather focus on a few generally useful extensions. I like this :-)

For one, drivers are often serving quite a large amount of very small requests. Pure POSIX semantics would introduce quite a considerable overhead here, because each single request needs to establish a session (open()/close()), unless it already has a permanent one; do addressing and other setup (seek()/ioctl()); and finally do the actual data transfer (read()/write()). In POSIX semantics, we need an extra RPC for each of those steps, plus some bookkeeping overhead. (If we want to avoid ioctl()s for the setup -- because they aren't very transparent, killing the major advantage of filesystem semantics -- there is even more overhead, as we need to introduce an additional file descriptor for setting request options.) A more appropriate protocol would wrap all of this in a single RPC.

Well, there is an important observation to make here: The optimization is useful not because we are dealing with drivers here, but because the drivers have a specific requirement (efficient handling of many small requests), which could also emerge in any other program -- I'm pretty sure many higher-level translators will profit from an optimization for that just as much. Which confirms my view that drivers aren't fundametally different from other programs, and we really want generic extensions to the POSIX/Hurd interfaces, rather than a special interface for drivers.

I guess we could implement this extension mostly transparently, with servers implementing it optionally as an optimization. Maybe even handle it in the FS server libraries, so translators not aware of the shortcut will just get the single RPC presented as a number of independant callbacks. Not sure about the exact implications, though.

Another property of drivers is that typically they are working with data that is structured in blocks, and doesn't really fit well with the POSIX assumption of all data being represented as sequential streams. Serializing the stuff (using XML or whatever) would by tremendously expensive, considering that drivers usually aren't processing the data at all, but only working with some status information, and passing the data on. What we really want is a memory container for the actual data, and a second independant memory container with per-block status information. (Including block boundaries in the case of variable-sized blocks.)

Again, we have a requirement that isn't really specific to drivers at all: Quite a lot of high-level programs are actually working with similar non-serial data, and would greatly profit from a generic extension allowing for passing several memory containers in a single read()/write() RPC.

Another possibility I'm considering, instead of adding a number of independant extensions handling different aspects that need optimization, could be just creating some generic method for merging several POSIX calls in a single RPC. This would be extremely flexible and powerful; however, I'm not sure it could be implemented without being too awkward. Also, the fact that it would need to handle the requests in a very generic fashion, might skyrocket complexity and nullify some of the performance gains we are striving for.

And Now For Something Completely Different

2005-08-22T04:40:00.000+02:00

The last two days I spent a considerable amount of time thinking about acoustic processors. Not sure this really fits here, but maybe some consider it interesting nevertheless.

An acoustic processor is normally a box you integrate into your stereo to improve the sound. Specifically, by adjusting the audio signal in such a manner as to balance anomalies created by your hearing room and/or your equipment.

Now acoustic processors aren't terribly popular, so far. There are probably several reasons for that. For one, high end audio purists are generally sceptical about any modifications to the audio signal. Also, such a box isn't easy to integrate: For several reasons (price, calibration, distortion), such a processor can feasibly only work on digital signal. However, while CDs have been around for quite a while now, only recently most other audio sources are becoming digital. And even with digital sources like CD players, the signal is usually already converted in the source and passed on as analog. No home for the poor little acoustic processor.

Last but not least, the price tag: Such a box is all but simple, and consequently all but cheap. Those who need it most -- with cheap equipment and inappropriate hearing rooms -- can't afford it; those who could, have fairly little need.

Being one of those who'd need it but can't afford (my equipment isn't bad, but far from perfect in the low bass area; and my current hearing room is terrible), I've been playing for quite a while with the idea of going a route even I could afford: Use software to do the acousting processing offline. Grab all my CDs, torture them in the offline acousic processor, and burn the result again -- producing CDs that will sound terrible in any other setup, but should be perfect with my equipment and hearing room.

So, how does such an acoustic processor work? Well, the simplest variant is just an equalizer with a lot of bands (128 or so), and an auto-calibration system. (Measurement microphone in conjuction with a program to run a test and adjust the parameters.)

However, this simplistic variants do not work terribly well. The problem is that just adjusting absolute volumes doesn't help too much. Temporal effects play a big role: For one, if the speakers or the hearing room generate resonances, the effective volume may depend on the length of the sound. Even more importantly, psychoacoustic effects make sounds prolonged due tue resonances/reverberation seem relatively louder.

(Furthermore, it's desirable to correct phase discrepancies between channels and frequency bands, for improved positioning and naturalness; to correct dynamics for improved vitality and resolution; and so forth... But that's definitely beyond my amateur means.)

So what we want to do is adjust the volumes of individual frequency bands depending on the signal levels. When a sound sets in, the relevant frequency band's volume is adjusted by the stored volume factor for short sounds in this band; when it persists for a longer time, we successively move to the factors for longer sounds. Well, at least that's my idea on how it should work.

My major problem is my very limited knowledge of acoustics, psychoacoustics and digital signal processing. As a layman, I believe we first need a frequency analyzer, continuously tracking the signal level per frequency band in the input signal over time. This frequency analyzer needs to have similar properties to our hearing, I guess. Now using some function involving the different adjustment factors and the signal level history, we can determine the necessary current volume adjustment for each band. These levels are perpetually fed into an equalizer, processing the input audio signal.

Well, so much for the theory; now if someone could tell me exactly how to implement this...

Another complication is that having no measurement equipement, I'm trying to determine all the necessary volume adjustment factors by hand/ear, using various test sound. So far, my experiments were rather discouraging; but I still have hope... (For the first time in my life, I'm considering a wireless keyboard.)

And well, once remastering all my CDs in this manner, I'd like to use the occasion also to fix some evident recording errors... Most notably, undo this abominable moronic dynamic compression most CDs are fucked up with. How do we do that, again?...

Design by Bulldozer

2005-08-20T02:01:00.000+02:00

In the previous post, I mentioned there are some very fundamental advantages in my POSIX level driver proposal, related to usability.

Now I want to pick up on one of those, which will be a recurring theme in this blog, being an extremely important issue: Accessibility.

I do not mean accessibility in the usual interface-related meaning of posing no barriers to people with disabilities. I mean accessiblity in the sense of not posing barriers to about everyone.

Most developers, and even many UI designers, seem completely unaware how extremely important accessibility is. Somehow they assume if something makes sense to them, it is good. The thought doesn't even cross their minds, that it might mean considerable work for others to learn the concept.

Look at which technologies are successful. The WWW is popular because it has low entry barriers. People are more likely to participate in wikis than contribute to static pages because they have a lower entry barrier. And so on.

Look at Firefox. Why is it so popular? Because it focuses on those features that are easily accessible. Popup blocking is a good feature, because it is obvious. So are tabs. Or the search bar.

Compare this to Opera. If you configure it to ask about setting cookies, for example. You get a dialog that presents you with more than half a dozen of options how to handle each cookie, some of them I don't even understand. (At least in the versions I tried.) And Firefox? It presents you with a very simple dialog, having only a few obvious options. Maybe its slightly less powerful; but still covers what you want in about 99% of all situations, and is at least three times simpler. Meaning about an order of magnitude more useful.

There are many other examples of features in Opera that are quite interesting, but so hard to use that they have no practical value. Features so complicated or obscure that hardly anybody will bother to learn them, are just useless. Sure, there are always a few nuts taking considerable pains learning even the most obscure feature of some program. However, if it takes more trouble to discover, learn, configure and get used to some feature than it saves in the long run, this is just an end to itself. You can boast how powerful your program is and/or how well you master it. But that's about all the value you will ever get out of it.

Accessibility is important not only for GUIs, but really at all levels of the system. Let's take one example from the driver proposal: Among many other possibilities, it allows control of who is allowed to run what drivers, simply by changing file permissions on the underlying device nodes. Now, of course, you could implement some kind of permission system in any other driver framework... But requiring some obscure special mechanism, with some kind of config files in the background or somehting, not only is it considerably less flexible, but actually much much harder to set up in the first place. Unix file permissions on the other hand are obvious and a well know concept to every Unix admin -- there is nothing you need to learn or remember; just by looking at the nodes you can guess what to do.

And it goes even further down. All of this is true for the system internals, programming interfaces, everything. Making functionality accessible, tearing down entry barriers, is among the most important design principles in about any kind of software developement. (I'll show how this applies in various contexts in other posts on more specific topics.)

You will be assimilated

2005-08-19T06:57:00.000+02:00

I thought about a number of things today, but no time for writing it up, because I had to write a long answer mail regarding my POSIX level driver proposal for Hurd on L4.

The foundation of this proposal is the fact that on a multi-server microkernel system, we have quite a lot of freedom about how to implement hardware drivers -- my take on it being to treat them just like ordinary applications, with only a minimal set of special mechanisms for driver-specific stuff, on the premise that drivers actually aren't that much special in their nature, and shouldn't be in the implementation. This offers a large number or advantages over other approaches: To users, admins, and system distributors, and even application and driver developers. (Of course, the advantages mostly relate to usability :-) )

The most important of those advantages are of a very generic nature -- stuff that I will cover in other posts sooner or later. (Though maybe in different contexts.)

The linked document is a longish, very technical description of my proposal, explaining what it is about, trying to outline some of the advantages (though probably not very well), and describing my ideas on a possible implementation on Hurd/L4 in quite a lot of detail. Especially the last section requires some knowledge of the Hurd, L4, and the Hurd port to L4.

Now probably I will be dreaming about competing driver framework proposals for Hurd/L4. Not sure yet whether this will be pleasent dreams or nightmares.