Declaration of Unoriginality

So, in the previous post I raved like a lunatic about the concept of declarative UI languages -- and QML in particular. It turns out that apparently I got excited about old wine in new skins. Which isn't exactly unusual either :-)

More specifically, I recently chatted with a certain developer -- and he pointed out that Edje (one of the various pieces in the EFL stack) has (supposedly) provided the same stuff for years...

This is a bold claim of course. Scepticism rears its head... However, judging from a quick glance at least, there are indeed striking similarities between QML and Edje Data Collections. Now I should dig a bit deeper, to find out how far the similarities go. Only I'm too lazy to do that, until I get to actually use either of them :-)

Someone also threw in XAML, which is used (among other things) for declaratively describing user interfaces in Microsoft's WPF. While I originally understood WPF to be one of the crazy frameworks for doing desktop applications in HTML, it turns out that with XAML as the language for UI descriptions, it is related to DHTML (i.e. HTML/CSS+JavaScript) only in spirit; while the actual implementation is designed from scratch, and thus probably saner... Or let's say: it has the potential for being saner -- but being created by Microsoft, it's as likely as not they actually screwed it up anyways :-)

Obviously, XAML being XML-based, it doesn't look very similar to either Edje Data Collections or QML (and is barely human-readable in fact) -- but from a cursory glance, the fundamental concepts behind them are quite similar.

What's more, XAML also forms the basis for Workflow Foundation, which some described as monads in disguise (no, I do not remember where I read that) -- i.e. related to functional programming. I don't know how these pieces fit together exactly (nor am I much inclined to seriously study such proprietary abominations... I mean technologies); but by the sound of it, this might allow for the kind of declarative UI descriptions with functional-style behaviour specification, that I was musing about -- especially when combined with F# for the actual application logic.

It's rather chilling to see that apparently Microsoft is kinda taking the lead here... So let's change topic quickly -- I'm freezing by now!

To avoid serious confusion, I feel obliged to point out that storing UI definitions in a data file (rather than building the UI elements one by one with function calls in the main program) is not a new concept by itself. Point-and-click GUI builders have done this back in the nineties, if not earlier. However, elevating the UI descriptions to actual source code -- which can be viewed and modified by the programmer directly, rather than only through some point-and-click tool -- totally changes the game.

For one, the UI definition becomes a first-class part of the program. It can be handled with a text editor like the rest of the source code; it can be properly versioned. The connections between UI definition to main program, and the workings of the UI in general, become much more transparent.

Moreover, the UI definition itself becomes more powerful. There is only so much behaviour you can reasonably describe with a point-and-click tool; any non-trivial interaction requires calling back into the main program. When on the other hand the UI definition is handled as a true source code file, it becomes natural to implement complex interactions directly there as well; so the whole UI definition can be contained in the same source file, and the main program really only has to handle actual program logic. That's where these new declarative UI frameworks excel.

By the way: I learned in the meantime that functional programming is generally considered a subclass of declarative programming -- so my intuition about this was quite on spot :-)

Declaring World Domination

So, you came here looking for a receipe for achieving world domination? We don't have one! You fell for our PR stunt!

But then, what you will find here is almost as good... ;-)

Let's start at the beginning. I was at LinuxTag 2009. Hurray.

So, what was I doing there? Well, that's rather obvious: meeting nice girls. Why else would anyone go to a major free software event, with 95%-or-so male geek population?... ;-)

Quite surprisingly -- in view of the above numbers -- I still managed to attend a few interesting talks. One of them was the keynote on QML. They did no less than declare a new paradigm: declarative UI programming!

Admittedly, it's not really all that new. In fact, it has been around for a while -- this is essentially what web pages are. The reason people find it easy to get going with HTML is, as everyone knows, the lovely syntax of HTML...

OK, just kidding. The really nice thing about HTML and CSS is that they are declarative. I must admit that I can't quite explain why declarative languages are so intuitive and nice -- but they are. And I'm not just saying that because I'm a raving lunatic either. Promise! ;-)

Indeed people tend to like this declarative stuff a lot. So much in fact that there have been some attempts to bring HTML-based applications to the desktop. (!!!) I'm not kidding now. Maybe you heard about it. I guess some people are just crazy -- and not *all* of them in a positive way ;-)

If declarative UI programming is so attractive, that people are willing to go through this kind of hoops and even put up with Web standards, the logical conclusion must be: to create a proper language for declarative UI design -- but unlike previous attempts, actually designing it to be sane, instead of trying to build something on top of the HTML legacy...

I contemplated something like that for a while, and now the ex-trolls have invented just that: a (hopefully) sane declarative language for creating proper user interfaces.

It's not all bliss, though: while simple interactions can be described in a purely declarative manner using builtin functionality and standard modules, more complex stuff is implemented using JavaScript. Ouch.

Aside from JavaScript specifically being a glorious achievment in backwards evolution -- it almost, though not quite, reaches the standard of sophisticated ugliness set forth by such historic highlights as COBOL or ADA -- I always felt that imperative scripting languages generally do not really fit in with a declarative markup language.

Functional programming much more seems a logical complement to declarative languages. Both describe how the result relates to the input state; without needing to specify in what order individual calculation steps are to be performed. The purely declarative part describes states, while the purely functional part describes relations between states. It seems to me that when a declarative language evolves toward more sophisticated state transformations, the desription of these relations will naturally look more and more like a full-blown functional language.

This paradigm is actually not limited to GUI programming: I have been feeling for a while now, that the reason Hurd translator programming is rather tricky, is related to the imperative languages used: translators tend to describe functional relations between the presented file system, and some underlying state -- I'm pretty sure these could be expressed much more naturally with a declarative/functional approach.

But back to QML and the glorious keynote: at the end, the speaker's great conclusion was that he considers this a paradigm shift, similar to the shift towards object-oriented programming that happened in the past... This conclusion shocked me a bit. Why so negative?!

I never took to this "object-oriented programming" silliness: it always struck me as the kind of questionable abstraction, which manages to do the amazing trick of obscuring the internal workings, and limiting possibilities, without actually hiding any complexity in exchange...

Declarative programming on the other hand -- as I have *discreetly* hinted at during the course of this article ;-) -- is something I actually do consider a great idea.

So indeed, a paradigm shift it is -- but not at all like the one towards OOP!

One Shell to Rule Them All

Just when I lauded GNOME for (slowly) moving in the right direction, the GNOMEs failed me: I heard that the Nautilus CD Burner was dropped in favor of a "traditional" CD burning application. Oh ungrateful world!

This will teach me not to trust Swiss bankers.

Wait, wrong link... Let's try again: this will teach me not to trust mythical creatures characterized by their extremely small size and subterranean lifestyle. That's better.

I briefly blabbered about this in my article on DeepaMehta: traditional applications just do not make sense. No really, they don't. No sense at all.

I do not believe there was ever actually any technical or otherwise practical reason for having applications. Rather, it's just nuclear fallout from the proprietary software world: when one has a compulsive need for selling "products" (I'm convinced it is some kind of mania -- all that talk about business model is just alibi :-) ), then one needs to offer something tangible; something that the lemmings using it can associate with the neat (for some value of "neat") package they got from the maniac... err, I mean vendor -- and probably shelled out some money for. (Pun not intended, honestly :-) )

This is an exquisite demonstration of how formidably the proprietary model fails to produce real value; how the vendors' interests work in perfect opposition against the users': what we really want are *not* clearly distinguished applications. Quite the reverse: we want additional functionality to integrate as seamlessly as ever possible; to become an organic part of the system -- to become unexistent as an entity of its own. And in a free software world, once the proprietary "product" fallout clears, we can indeed attain this goal.

So, what's this about nuclear fallout; where does this applied hate come from, err I mean that hate against applications? Just what makes them such a first-order nuclear meltdown? Quite simple: it's just too many shells.

(Don't get me wrong -- I actually like seafood ;-) )

A shell, in this context, is the part of an application that reads interactive user commands, and invokes the corresponding functionality -- the spell casting interpreter so to say. But why does every application need it's own shell? Winning hint: It doesn't. There is really no good reason. Unless you like pain -- plenty of that in here. A genuine pain factory indeed.

For starters, having many shells naturally breeds inconsistence. (It's indeed a law of nature. Goes by the name of Entropy.)

Even if you mange -- by threats, pleas and bribes -- to keep the actual interfaces consistent, the user experience inevitably still will be inconsistent: simply because of having to open the same file in various applications to do certain things on it. There is no escaping the pain.

Multiple shells also inevitably result in redundancy; and thus bloat, confusion, and more pain in general: it is never quite clear which functionality best should be accessible from which shell. There is always a tendency to add more and more stuff to each one, to avoid the situation where you have to use another shell (application) just for this one feature...

This also goes for the main system shell, i.e. the file manager: which functionality should be available there? Surely it's useful to have a preview of images for example; but once you have that, how about slide shows? Or functionality to rotate images? And once you have rotation, why not other editing features? Where to stop?

The obvious answer is: don't stop at all. (Well, it is obvious, isn't it?... :-) ) Just put all the functionality in the main shell, thus avoiding the need for any other ones.

This way, there are no applications in the traditional sense anymore. All additional software just plugs into the main navigation facility. (Normally the file manager, though theoretically other object systems are possible as well... Except that using anything but the file system as the primary facility for managing objects, is probably an idea almost as bad as applications :-) )

If you install an SVG editor for example, you just get the SVG editing abilty available from the main shell. Simple and consistent -- no more pain. Life is good.

Activate me!

In the past I have been complaining about GNOME's lack of innovation; and now I stumbled over a project called GNOME Shell...

So, this is the moment: this is when I have to revise my world view; when I have to apologize and praise the GNOME folks for their innovative ideas...

Nah, just kidding :-) But I have to admit that I was surprised -- and this is the incredible part -- in a positive way for a change.

Most of it is still rather vague (i.e. remarkably like my own ideas...); and the ideas presented there are not exactly revolutionary -- but one thing is clear: For the first(?) time GNOME folks indeed seem to be thinking outside the Windows (TM)... err... I mean outside the box ;-) ; for the first time they really try to come up with something new, rather than just doing cosmetics to well-known (stupid) approaches...

Do you hear this noise?... It's me applauding.

One thing that caught my attention in particular are some ideas regarding Activities: remarkably similar in some regards to my own ideas regarding session management...

This confirms an observation I'm recently making again and again: Slowly, very slowly, most things in the free software world tend to be moving in the right direction. Maybe in just another 20 years or so we will have a sane desktop environment! ;-)

A Plea for Reinventing The Wheel

Let me talk a matter I have been pondering more than once. (How unusal, eh?... ;-) )

The latest incident, which prompted me to write this article, was at a (somewhat bizarre) presentation of Protonet. (Which is essentially a WLAN meshing appliance.) There was argument about whether the Protonet guys should have used existing Freifunk stuff, instead of creating their own infrastructure. While most of the geeks present were arguing that it was stupid and pointless and evil overall to reinvent the wheel, I was pleading Protonet's case... (I'm not associated with either Protonet nor Freifunk, BTW.)

So, why did I do that? Just for the sake of trolling, of cource... Err wait, did I really say that aloud? That's obviously not what I mean! :-)

The truth is that sometimes reinventing the wheel -- or rather, inventing a new variation on the wheel theme from scratch -- is indeed a good thing. Reinventing the wheel is not always just ignorance or Not Invented Here syndrome (no, really!) -- there are various totally valid reasons for doing so.

There are -- surprisingly -- technical ones: while it certainly seems a terrible waste to create something new, when more or less the same functionality has already been implemented elsewhere, this is a very superficial view. Often it's not really a waste, because implementing the core functionality from scratch can actually be less effort than working with an existing framework!

An existing framework, that has matured over years, tends to have all kinds of features and quirks, to handle all possible aspects of the problem; to cover all possible use cases. (And usually, some impossible ones as well -- after all, we want to be really complete, dont' we?... ;-) )

Now of course you think that this is a good thing, and precisely the reason for using an existing framework. (See, I can read your mind! ;-) )

However, when I want to create something new, the completeness is not helpful. When I want to create something new, I want to focus on my new hotness, not trying to cover all freaky obscure use cases. ("Freaky obscure use cases" obviously being anything I don't need myself ;-) )

I actually want to ignore most aspects: "ignorance is bliss". I want it to be incomplete on purpose. I don't want to waste my energy on learning all the mundane aspects of the existing framework, and trying to figure out how to fit my new ideas into it, without breaking existing functionality -- the functionality that someone, somewhere, has learned to adore, and will fight for it tooth and claw... Being the egotist that I am, I want to spend my energy on my new ideas instead.

Your next objection surely is that this is shortsighted, and will come back and bite me in the arse: because -- if I want my new stuff to be generally useful -- I will ultimately have to cover all of these aspects anyways. (This is your next objection, isn't it? How predictable you are! ;-) )

And -- prepare for a surprise here -- you are totally right. Didn't expect that, eh? :-)

It is true that I can't really avoid dealing with all the aspects the existing framework covers. All I can do is postpone; but sooner or later I will have to deal with them. And then it's time to look at the existing framework, and see how my ideas can be integrated there. Only then my ideas are already tried and tested; only then I know exactly how things should work; only then I know which aspects are really important, and which can be traded. Only then I can show my ideas, instead of just trying to explain them; only then I can prove that they work; only then others can try them out, and see for themselfs that they are useful; only then I can point to existing hitmen^H^H^H^H^H^Husers, who like the new ideas, and want to see them implemented in the existing framework; only then I can expect help from others with this daunting task.

Yeah, sometimes being shortsighted is useful.

Of course this means that most likely I will have to throw away part of my code; perhaps even all of it. So? Luckily, I didn't spent too much effort on it in the first place... Call it a prototype, proof of concept, whatever. Surely you won't question prototyping being a good thing?

The code doesn't count much. It's the ideas that count; having inspired a group of minions^H^H^H^H^H^H^Hfollowers sharing my vision; having gained enough momentum to overcome technical and social obstacles...

And here we are already happily in the midst of the second catergory of (valid) reasons for reinventing the wheel: the social aspects. (Ha! I know you like these, like we all do!)

These are often even more important than the technical ones. Working with an existing framework means working with an existing community -- a community that has it's own priorities, goals, conventions, deities... Not a good environment for creating something new: you spend your energy on dealing with conflicts (religious and other), instead of actually creating stuff.

Let's take a look at the worst case. It's not even uncommon -- I've seen it happen. You have a group of people, having an interesting idea. They all have a common goal, a shared vision. They are very enthusiastic, and want to make it happen. Ideas are thrown around, people start setting things up and working on stuff... In other words, pure awesomeness, life is good etc.

And then, people from a somewhat related, established project come around, and start discussing. (Yeah, discussing -- it's every bit as bad as it sounds! ;-) )

First they will say, "See, what you are trying to do here is interesting; but it's essentially the same as what we are doing. Why don't you join our ranks and we can work together?" "Indeed, why not?" you will think, naive as you are. So you stop the stuff that was already going on, and instead talk to this established group about what needs to be done.

But now you discover, the hard way, that they aren't really that much interested in your ideas after all. Although they are doing something similar, it's not quite the same. They have their own ideas, their own priorities, their own goals, their own deities... They tell you how you should do things differently from what you intended; more like they want them to be -- arguing that it's The Better Approach (TM). They will tell you to focus on different aspects; to work on different things. In short, they will patiently explain to you that what you really want to work on, is evidently not what you thought that you wanted, but rather what they consider right.

They drown you out. They are an established religion, with firm dogmatas; while you struggle to articulate your fresh heretic vision, and to hold your own little group together. Some of you will hold firm to your original beliefs, spending all your energy vainly trying to convince the other group of their value; finally giving up exhausted and dismayed. Others will seemingly convert to the established religion, agreeing to work on other stuff; but inwardly feeling that it's not really what they set out to do; consequently lacking enthusiasm, and ultimately just dropping off as well. (So the other project doesn't gain anything from it either -- their hope of annexing your group to work on their stuff is frustrated too. They only loose time and energy as well -- serves them right, bastards!)

The result is total disaster -- your enthusiasm lost, your vision in shatters, your people dispersed; leaving nothing behind but a universal feeling of disappointment...

You might try to avoid interaction with the established religion by forking. However -- aside from the fact that without interacting with the developers, building on an existing framework is even more problematic technically -- this doesn't help much either: some people, when seeing your dissenting, will come over and whine, why are you forking instead of "cooperating"? They will go on a crusade, actively trying to harass your group. Not exactly helpful for productivity...

In other words, if you are trying to create something new, you initially need to isolate from other similar projects as much as possible. Only once you have working code, a community of followers, enough momentum to hold your ground -- only then you can talk to established projects on an equal footing.

So, let's happily invent new wheels. Mine is pentagonal -- how about yours? ;-)

Shedding Light on Mozilla

Someone just pointed my to Ubiquity, which is a Firefox extension offering an alternative way of issuing browser commands, using a kind of command line. At a quick glance it looks quite promising.

There are several interesting aspects to it, but I don't want to go into all of them. The one that definitely stands out is that these people seem to have realized something very fundamental: Textual command interfaces can be more efficient and intuitive than the ubiquitous (pun intended) point-and-click interfaces. Woohoo!

Smartass Software

"The trouble with computers is that they do what you tell them, not what you want." -- D. Cohen

This lovely little quote is quite brilliant: It immediately strikes the hearts of every single computer user. "This is so true..."

But then, I'm brilliant too. I really am. And in fact I'm going to prove it right now: I'll pour some of my genius into making the statement even more brilliant, by means of an addendum:

The trouble with the previous statement is that many people attempt to remedy it.

Eh? What is that supposed to mean?!

(See? That immediately strikes your heart as well! ;-) )

But let's start at the beginning. Computers being able to perform extremely complex and varied tasks, we intuitively assume they must be pretty intelligent. And yet they are so immensly dumb. We have to explain what we expect in every detail -- things that often seem so obvious. This discrepancy is very annoying.

So, can computers -- or actually programs running on them -- be made smarter? This seems a rather logical conclusion: If programs are so annoying because of their dumbness, wouldn't they get more useful if they get smarter? Shouldn't we try our best to make them so?

The answer to that question is quite clear, and if you think you can guess it, you are probably wrong. The clear answer is "no". (Beware of conclusions that strongly suggest themself, and yet are totally wrong :-) )

The truth is that their apperent stupidity is actually one of the major strenghts of computers: The fact that they are perfectly deterministic; that -- if you understand the interface -- you always know exactly what effect a command will have.

It is annoying to have a dumb interface, which requires a lot of repetitive work every time you perform some command. But it's even more annoying to have a "smart" interface, which tries to guess the user's wishes: The problem being that inevitably it will sometimes guess wrong. "Nobody is perfect."

The smarter the software gets -- the more it tries to guess the user's wishes -- the less predictable it gets; the harder to control; the more frustrating. It can save a considerable amount of tedious repetitive actions, but the price is high: While the repetitive actions, being repetitive (am I repeating myself?...), will soon go almost unnoticed, the loss of predictability in "smart" software means that you have always to check whether it's doing what you want; you always have to think about it -- it always takes part of your attention. Not really a net win.

The "T9" text entry system for mobile phones is a typical example: The traditional way, where you have to press the keys the right number of times to get the desired letter, requires a lot more key presses in total -- but it's perfectly deterministic; you can even type blindly. (Really -- I do that sometimes. And I'm not saying that because I want to appear cool... Well, at least not only because of that :-) )

With T9 on the other hand, you have to check every word (except of course for the most common ones, which you know by heart); you have to loop on the feedback, sometimes multiple times, until you get the desired result. (Well, unless -- like many people tend to do -- you skip that part, and send messages that will pose a challenge to a cryptoanalyst, or else could pass for some form of modern art...)

It gets even more wretched when you want to type some word the T9 software doesn't yet know: You have to go back, change the mode, and type it again from the beginnig using the traditional way.

Or you can engage in some absurd manoeuvres, to trick it into giving the desired results: I have seen people try typing a similar but different word which T9 happens to know about, and then go back and fix it into the one they actually wanted to type. Or writing the individual constituents of a long word seperately, and then going back to join them together. (You must know, the German language has this interesting property that you can name pretty much anything with a single word, by connecting several other words into one. Just like Lego -- except that it blows up in your face if you don't follow the man page. Which reminds me of a toaster...)

I had some other good examples on my mind, but unfortunately I forgot them. I know that's a lame excuse, but it has a substantial advantage over any number of brilliant other excuses I could come up with: This one is true! I had to postpone finishing this post for more than a week, and that was admirably effective in making me totally forget what other example I wanted to present. I suck. Here it is, now I've said it. Are you satisfied? :-)

In consequence of this personal failure, finding other examples is conveniently left as an exercise for the reader. But then, I trust you are all smart people; one mind-bogglingly great example surely does suffice to convince you of the ultimate truth? :-)

A while back I wrote about DeepaMehta. While chiefly dwelling on the object navigation mechanism that forms the heart of DeepaMehta, I mentioned that there are some other ideas I like about it. One of them is considering the computer as a tool that is employed by the user to perform his work more effectively, rather than a cheap "assistent" that tries to do his work for him. (And -- like anything that is too cheap -- most likely falls short of any satisfying result...)

Now does this mean that everyone has to be content with dumb interfaces that require us to do a lot of tedious repetitive work? Certainly not -- we should do anything in our power to cut down on such redundancy: Streamlining the interface, providing shortcuts etc. -- anything that reduces the number of key presses and/or mouse clicks necessary to perform frequent tasks; yet always doing exactly what the user asks for, rather than trying to guess his wishes. That's the way to enlightement. See you on the other side ;-)

A Case Against initrd

I never really liked initial ramdisks. It always felt like a dirty, hackish solution. It tends to slow down the boot process, and it requires maintaining a complete second system environment -- which has to be kept in sync with the main system on upgrades and configuration changes... Rather surprisingly, I don't use initrd on my system :-)

Some other considerations now prompted me to think about this in more depth. And I know you want to read my earthshaking conclusions :-)

During a typical boot process nowadays, we have a succession of four different environments: First one is the firmware. (BIOS in standard PCs.) Second is bootloader. Third is ramdisk. And finally, the fourth is the real working environment.

This is clearly too much. It creates complexity. It creates redundancy -- not so much in code, but in configuration -- a maintenance nightmare. (Be honest: Are you *not* dreaming of broken ramdisks and bootloader entries by night?...)

So, which of these phases could be rationalized? The first one is obviously necessary. (Well, unless we want to store a complete image of the system environment in flash memory, and update it on every upgrade and configuration change :-) ) The fourth one is what we want in the end. But what about the two intermediate stages, poor things, can we cut down on these?

Leaving them both out seems pretty much impossible in practice. That would require the firmware to provide drivers for pretty much any device the user might want to load the system components or information about the startup from (users have an annoying tendency to come up with the most surprising setups :-) ); and the firmware would need to be smart enough to construct and launch the image for a completely working system instance, based on the provided startup information.

I heard that OpenFirmware is/was quite powerful. I doubt though that it had enough drivers to completely avoid the need for additional stages; and I'm also not convinced that it was smart enough to completely load various systems without too much hassle. Of course, it's possible to do about anything with a sufficient amount of Forth scripts -- but then (I'm tactfully omitting the masochism factor here :-) ), it's effictively introducing another stage again.

Anyways, the cold dark evil reality we live in is standard PC BIOS -- which tends to have a considerable number of drivers (though still not enough for all cases...), but is totally stupid -- all it does is load a single sector from the hard disk...

So stage 2 (boot loader) is not really avoidable. Which leaves us with stage 3: Is the boot loader powerful enough to avoid the need for a ramdisk? I tend to believe that with GRUB2, it is.

It comes with a lot of drivers -- probably enough to satisfy any need.

(This is a bit of code duplication of course; with a ramdisk, the system's native drivers are used to ultimately load the final system environment, and the earlier stages can be kept minimal. OTOH, the grub drivers can be much simpler than the system's proper drivers, so it's not really that bad. Moreover, it can't be really avoided anyways -- in general, you want to be able to load the kernel and boot infromation from the same places from where the rest of the system is later loaded... And if not you, then someone else :-) )

Also, it comes with the multiboot loader, which -- in combination with the powerful scripting facilities -- should be sufficient to completely set up the working environment for many operating systems. And if that is yet not powerful enough, there is still the possibility of writing a custom loader module handling specifics of the system. The nice thing is that the module doesn't need to be distributed with GRUB itself (it's nice to keep the bootloader down to a size that still fits on a CD ;-) ) -- it is perfectly possible and reasonable to make it part of the actual operating system for which it is designed.

In a (much) older post, I mentioned my POSIX level driver proposal. Part of it describes boot methods. Aside from a boring ramdisk (sorry...), I also proposed a lovably crazy approach: Implementing a mechanism that allows using GRUB's drivers after the actual system starts, until it has loaded it's own drivers.

Now I realize that it makes much more sense the other way round: Move the boot process (up to the moment all necessary native drivers are available) completely into GRUB. This allows a similar level of flexibility, with considerably less magic. (I'll miss the crazyness...)

The driver proposal relies heavily on extracting information from the filesystem structure, and passive translators in particular; so we need to extend GRUB so that it can read passive translator information from the filesystem, and initialize active translators so that the driver hierarchy can immediately become functional once control is passed to the system.

This could either be done by implementing extensions that can be used in normal boot scripts, or by implementing a loader module that does all the driver setup automatically. The former is probably more tricky, but also more transparent and flexible.

A similar approach should allow preparing the startup of any other system as well, avoiding the need for any initial ramdisk. Good riddance :-)

Advanced Lightweight Virtualization

Everyone is talking about virtualization now. Well, maybe not your mum; but almost everybody. OK, probably not your aunt either... Well, you get the point :-)

Now I tend to be just this tiny little bit sceptical about things everyone talks about, and thus generally quite late in the game when it comes to crying "me too!". But I think the time has come when I can join in without risking my great reputation as an antediluvian freak.

So, coolness factor etc. aside: Why is everyone talking about virtualization? I think the reason is that it offers a very simple, straighforward solution to a bunch of problems related to various kinds of isolation.

One very prominent kind is related to security: Mainstream operating systems (both UNIX derivates and Windows) by default allow any process in the system to communicate with almost anything else in the system. The concepts of users and file access permissions provide some limits, but these are unsuitable to enforce any serious security policy: They only work under the assumption that software is bug-free, and that users only run software they absolutely trust.

Bolted-on solutions like SELinux allow to restrict the communications channels in theory; but these are extremely complex to manage, making them error prone, and unfeasible to use anything else than simple default policies provided by the OS vendor.

Hardware virtualization on the other hand provides security in a trivial manner: Basically, it just cuts any communication channels -- (almost) total security through total isolation. They err on the other side: Usually you do want to have some communication, and with VMs you have to jump through hoops to set it up, e.g. using virtual network interfaces.

A somewhat related use case is isolation in administrative matters: With a VM, the guest system is completely independent from the host system. It can be configured differently; it can be upgraded without affecting the host system, as well as the other way round. You can have different user accounts. And so on.

Again, the cost of total isolation is... Well, total isolation :-) It means that you have to manage each VM individually -- sometimes a desirable property, sometimes a burden. (And most of the time, a bit of both...)

Last but not least, VMs allow total isolation of interfaces: The guest system only talks to the (virtual) hardware, and is thus totally independent of the functionality and interfaces of the host OS -- you can run a totally different system inside the VM.

Here, the downside of independence is a lot of overhead, and very poor resource utilization. Paravirtualization cuts this down a bit, but doesn't fundamentally change the situation.

(This is a blessing for hardware vendors of course -- especially as standard application vendors lately have been slacking a bit with bloating their software to make up for recent increases in processing power and memory sizes...)

All in all, while hardware virtualization provides total isolation in all regards, it is often total overkill too -- more isolation than necessary or desired.

Various kinds of container mechanisms (vserver and OpenVZ in Linux for example) are an interesting alternative in many situations. Here, you have a single instance of the system, but several isolated user environments -- so you get isolation of communication channels, and usually also some administrative independence (at varying degrees), but without the overhead of hardware virtualization. (The term "lightweight virtualization" is sometimes used for that; however, it doesn't seem to be widely adopted: Google gets some relevant hits, but not really that many...)

What these container solutions can't do (apart from being less robust against security exploits, due to the common system instance), is running a different system in the subenvironment.

There are also some specific middleground-solutions like User Mode Linux or lguest, which allow running another instance of the system, but with less overhead than true hardware virtualization.

Now let's take a look at the Hurd. It's main feature, compared to traditional (monolithic) UNIX-like systems, is the fact that almost all system funtionality is provided by optional layers (servers and libraries), which can easily be replaced: Any user or program can run it's own services instead of using the system-provided ones -- thus creating a different environment, with little or no overhead, and without affecting the rest of the system. (This is a tribute to the GNU philosophy, that a user should always have full control over the software he runs.)

By default, all processes run in a single standard environment; but upon demand, any process can be put into some different, more or less independent subenvironment. There are endless variations: You could run select processes with distinct instances of some default servers, to increase robustness and scalability; you could set up containers isolated from the rest of the system; you could use a different variant of some server, e.g. a different network stack optimized for some specific use case; you could run another instance of the whole system (this is called subhurd or neighbour-Hurd); you could run a special enivronment, with well defined versions of certain components, to be sure that a certain feature is present independent of the host system, or to avoid possible incompatibilities through changes in the host system; you could even run a totally different system, having little in common with the main one. All of this can be done on any running Hurd installation, without any modification to the host system.

We haven't been expressing these Hurd features in terms of virtualization up till now. But I think it makes perfect sense to do so: It seems common practice to describe various facilities of this kind by the term "virtualization"; and saying that the Hurd is designed from ground up to support fine-grained virtualization, is certainly more perspicuous to most people than talking about user extensibility.

So, let's be more buzzword compliant :-) Let's call it advanced lightweight virtualization.

Theory of Filesystem Relativity

It has been pointed out that the Hurd chroot implementation has serious problems in connection with passive translators, resulting in unexpected behaviour and gaping security holes: When someone sets a passive translator (e.g. a firmlink) within a chroot, and then accesses the translated node from within the chroot, the translator will run *outside* the chroot, but will be accessible from inside it -- meaning you can easily escape the chroot. (Using something like settrans -p tunnel /hurd/firmlink / )

This is a serious flaw in how passive translators work; and it has been used to demonstrate that supposedly passive translators are broken by design, and should be replaced by transparent system-wide persistence. However, I doubt such a radical conclusion is really appropriate.

For one, the original Hurd designers said at the setout that chroot will only be supported for compatibility if it can be done without too much hassle. One could very well claim that a secure and perfectly consistent chroot implementation was never intended. Yet, I do think that chroot can be fixed without totally overthrowing the passive translator concept.

But before diving into this, I'd like to mention that IMHO the current chroot implementation, using a system call handled in the actual filesystem servers, is wrong. It seems much more hurdish, more flexible, and more robust, to use a filesystem proxy as / for the chrooted process. One advantage is that the same mechanism can be used not only for chroot -- which uses a proxy that simply translates all paths so that they point to a subtree instead of the global / -- but also with other kinds of proxies to achieve different semantics: For example using a proxy that mirrors the global /, and only replaces a few specific locations. (/servers/* is a likely candidate, which allows replacing default system servers.)

Now back to passive translators. I can think of quite a lot of possible approaches:

  • Simply don't allow setting passive translators inside a chroot at all. After all, chroot is only for UNIX compatibility, and translators are not a UNIX concept...
  • Allow setting passive translators, but only temporarily, not storing them in the underlying filesystem. When accessing the translated node, the translator is started by the chroot. Allowing passive translators but not really storing them is a bit unelegant, of course...
  • Store the passive translator, but also store the chroot information; and only start the translator if the node is accessed from within the same chroot.
  • Store the passive translator and the chroot, and whenever the node is accessed, run the translator in a matching chroot. This might be the most elegant solution. Only problem I see is that the translator is run in an identical, but not the *same* context. For chroot this shouldn't be a problem I believe; but some other kinds of chroot-like subenvironments might break: If you have some kind of subenvironment, where some things are local to the specific instance, running the translator in a different instance might not do the trick. But as I said, for a normal chroot it should be fine.
  • Last but not least, we could simply allow setting passive translators from within a chroot normally like it happens now, but when a translated node is accessed, the translator started would run in the context of the process accessing it -- which is different for a chrooted process than for a normal one. (For consistency, any active translators running outside a chroot would have to be ignored inside it...)

One could claim that the last variant is actually the only sane one: It's a bit confusing that the translators will refer to something else within the chroot than outside it -- but in most situations that probably is actually the most useful behaviour. Also, it's how symlinks behave in a chroot.

Of course, in some cases you actually want the other behaviour. There is really no solution that always does the desired thing. A similar problem arises with translators or symlinks on NFS: Should they be resolved on the client or the server side? Sometimes it's desirable for links or translators always to refer to the same physical location on the server, no matter whether the FS is accessed through NFS or directly; while sometimes it's more desirable to always refer to the same logical location, so you always get an appropriate local resoure on the machine where the program runs...

Another situation, which also isn't specific to chroots or even to translators (I experienced the problem with symlinks on Linux), is handling of mount points: Should translators and symlinks refer to the root of the actual file system (partition), rather than the VFS root, so they always point to the same physical location no matter where the FS is mounted?...

It seems that in most other cases involving symlinks or translators, the rule is to have them referring to the same *logical* location in changing contexts; so, while I'm not sure whether it's the more useful behaviour, it at least would be most consistent to go with the last suggestion, i.e. make passive translators within a chroot always run in the context of the chroot.

External Insistence

In an earlier post, I explained my concerns with transparent system-wide persistence. One of the problems I pointed out, is that in such a system, you have to manually serialize all important state on upgrades and in some other situations anyways, relativating the value of transparent persistence.

Marcus Brinkmann now showed me a nice text explaining the upgrade problem in great detail. It's a good read, at least the first half.

BTW, meanwhile I refined some of the ideas I tried to explain in the original discussion, and I might post an upgrade at some point; but all in all, my concerns with transparent system-wide persistence haven't changed.

MehtaHurd

Recently, I saw a presentation on DeepaMehta. Sounds quite promising: Completely new approach instead of the traditional (broken) desktop metaphor; adapted to the way we think... Of course, I was mighty sceptical. It turned out more interesting than I expected.

Putting it bluntly, DeepaMehta is kind of an extensible filesystem browser with an advanced bookmarking/navigation system.

Of course, this statement is a misrepresentation in several regards. Most notably, DeepaMehta's objects aren't really meant as simple bookmarks. They can store considerable amounts of metadata, as well as actual data, and only optionally can reference traditional files or other external data, but normally they exist on their own -- in fact, the idea is that ideally (almost?) all data is stored in DeepaMehta itself. Also, DeepaMehta doesn't care at all about the traditional filesystem structure... Well, let's get more specific.

DeepaMehta is based on three core observations regarding existing desktop systems. One is that overlapping windows are an abomination that needs to be avoided. This is a very important issue of course. DeepaMehta always displays only one navigation window, and one window presenting the content of the currently active object -- just like Norton Commander in quick view mode... (Somewhat resembling Oberon, which also has a content pane plus another pane for other stuff -- though the division is different there.)

Limiting to always just one content window is a bit extreme IMHO: Seeing several things at the same time can be useful sometimes, given a sane (tiled and dynamic) window manager. Nevertheless, it's a Good Thing (TM) that DeepaMehta addresses the problem of overlapping windows.

The second observation is that traditional applications make no sense. Instead, we need a generic shell application, with plugins to be able to perform specific actions on various object types -- again a very important issue. To some extent, we see this happening with things like Nautilus already; but DeepaMehta is much more consequential: All navigation actions are performed in the navigation window, using standard mechanisms supported by plugins for creating specific object associations; and all object viewing/editing actions are performed in the content window, again supported by plugins for specific object types.

The third observation is that traditional navigation mechanisms are quite unsuitable to the way we work; for many tasks, a new approach can be vastly more efficient and convenient. This is the strong side of DeepaMehta, and the really innovative part: It uses a navigation system based on mind maps. Basically, you have all your objects, as well as various types of bilateral associations between them.

As there are far too many objects and associotions to make this manageable directly, you never look at all of them at once, but instead use various partial views (maps). A new view can be created by starting from some object, uncovering the associations (and associated objects) you are presently interested in, and going on from there, until you see all you want in this view. Once you have a view with all the objects interesting for your present occupation, you can easily swith which of these objects is visible in the content window at any given time, by focusing it in the map.

Constructing a map can also happen automatically, e.g. when browsing a web site in the content window: Each time you click a link, the linked site and the link association is added to the map. Othewise, creating a map by hand can quite tedious. (Especially as DeepaMehta, while introducing totally different high-level concepts, is very old-fashioned at the level of actual UI elements -- all the map management for example is done through nested context menues, requirering lots of searching and endless clicking...)

For some tasks, this seems way too static: To efficiently browse a filesystem (or generally to quickly move through any larger object structure), some more automated approach seems necessary. Not sure how to do that; but I think it should be possible to come up with something nicely extending the concepts in this direction.

There are some more fundamental problems with the existing DeepaMehta implementation, however. (Aside from the fact that it is written in Java...)

For one, it is centered around the idea that the future is in the net, with all intelligence in servers, and workstation machines being only dumb clients -- a pretty absurd notion. In spite of all the babbling of dotcom freaks, it's pretty obvious in any realistic view that the local system should always be the primary focus of a desktop framework. Networking is a mostly orthogonal issue, which can be nicely integrated using other mechanisms, for example a transparently networked filesystem like Plan9. Designing a desktop system as client-server from the ground up, only makes it more complicated and less flexible, creating much more problems than it solves.

The second major problem with DeepaMehta is that it creates a world of its own, with little relation to existing environments. This obviously makes transition pretty hard. Moreover, as it's impossible (or at least very unrealistic) to run a whole operating system, including administration etc., with DeepaMehta alone, there will always be the DeepaMehta layer on top/beside the rest of the system; and like with any sub-plattform, this creates very serious integration problems.

For these reasons, I don't consider DeepaMehta as is very useful in a broader view. Yet, the navigation approach seems immensly valuable. So, how could it be integrated at a generic level in more traditional systems? I have a number of ideas for various methods how to implement the DeepaMehta navigation concept through extensions to the normal filesystem. (The Hurd architecture makes such extensions easy...)

The basic idea is, instead of having a special object database like DeepaMehta, to use traditional filesystem entities (files and/or directories) as the primary objects. We already have links in the filesystem, so one might imagine something based on that for the object associations.

In a very crude approach, the DeepaMehta semantics could be expressed by some defined structure in a traditional filesystem. Every object would need to have its own directory, containing the main file as well as the object type designator, additional attributes, and object associations. An association could be represented by a link referencing the associated object, plus some file or a second link denoting the object type. (The object type itself can be described by a special object...) Maps could be either stored in additional subdirs inside the object directories, with pointers (links) to the associations visible in the particular map; or in completely distinct directories, linking all the objects and associations visible in the map.

This approach should work fine with programs that are aware of the special semantics. However, it creates lots of other problems otherwise. If you access the objects with normal programms, you always have to specify the main file inside the object dir. If you copy or delete objects, you have to do it recursively on the object dir. Generally, it's very complicated and not really intuitive. Also, it requires a special structure, but this structure can easily get destroyed when accessing stuff with traditional filesystem tools. As traditional links are unidirectional, it would either be very inefficient to figure out the links in the reverse direction, or it would require redundant links in both directions, which need to be kept consistent.

An alternative is to extend the standard filesystem mechanisms. This would allow to attach all necessary information to the files themself, instead of putting them as additional items in a special directory. Various modifications to the linking mechanism are necessary for that: Normally, links point from a directory entry to a file or another directory. For DeepaMehta-like relationships, we'd need links that directly connect one file to another. Also, they would need to be bidirectional, and have a type attribute. Maps could be represented as special filesystem entities, which can be browsed like directories, but with a map-like instead of the traditional hierarchical structure; presenting an alternative view of the main directory tree.

This way, navigating through maps should work in a pretty nice manner even with traditional applications; and when accessing objects with normal programs, they behave just like ordinary files. On the other hand, there is no way to access the object relationships and other additional attributes, or to modify maps, with tools not aware of the new features. Also, this approach is very invasive in general, making quite fundamental changes to various aspects of the filesystem functionality.

I think a more hurdish approach is to introduce a Mehta-translator: If you want some file to appear as a DeepaMehta-like object, just set the translator on it. Without the translator, it appears like a normal file; but with the translator set, it is presented as a pseudo-directory, with all the additional information. This way, the main file can be easily accessed and managed with normal applications (without the translator active), but the additional properties can also be represented in such a way that they can be accessed with traditional tools. Unlike with the directory representation on a traditional filesystem described above, the translator can enforce consistency when properties are manipulated through the pseudo-directory. It can also introduce other special semantics for the pseudo-filesystem entities, making it more intuitive.

Maps can be implemented with another translator, which creates a pseudo-filesystem representing the map as a special directory structure. (Much like in the variant with special FS features, but through a simple translator instead of modifications in the normal FS.)

Some light and some shadow

Today I stumbled upon a very interesting article on propsed design concepts for KOffice, which makes Martin Pfeiffer the winner of the KOffice design competition. I haven't looked at the other contributions; but taken by itself, it looks like the award is well deserved.

The proposal contains lots of innovative, mostly good ideas -- as I've already hinted in some of my other postings, a feature IMHO sorely lacking from most UI concepts.

It implements, at least partially, many many principles I'm badly missing from existing GUIs. (Though by far not all, of course...) It also proposes some very interesting totally new ideas I haven't even thought about so far. It suggests solutions for some problems I'm seeing in existing systems. It makes me think more consciously about some things I vaguely considered before. All in all, it starts a lot of valuable thoughts. And, of course, it also has a couple of ideas that I do not agree with at all. (Well, what you'd expect, perfection? ;-) )

I won't dwell on all the specific aspects now. (I might pick up some in later posts.) It's way too much interesting stuff. Instead, I'll pick a single, very very fundamental issue -- one which this proposal sadly gets (almost) all wrong: Integration.

While it's obviously right that better integration between the various office applications is desirable, there is a fundamental misconception how to achieve this. The idea of having individual "viewers" for the various document types, and something that integrates them, is even basically right -- in spirit it goes in the right direction, towards what I call a hurdish application infrastructure. (Though not very far.)

There is also the very important realization that a shared canvas implementation should be used for displaying the contents, making the different document type "viewers" basically only transform the documents. But having the canvas in a shared library is a poor choice -- though typical for traditional monolithic UNIX systems, which do not offer an integrated object access approach like Hurd's translators (or Plan9's FS servers), to make implementing common functionality as server processes easy.

Where the proposal is really fatally wrong, is what that "something" integrating the components should be: It wants a single specific main application, gluing the whole into a sealed system, creating a desktop for office work inside the main desktop -- instead of fixing that one to provide the necessary integration. Reenforcing the dominance of closed monster applications, buying inner integration at the cost of making any outside interaction awkward, thus creating ever growing subworlds, ever more alien among each other. Ever more duplicated functionality, as it becomes so painful to use anything not part of the closed subworld. Creating another Emacs.

Barring fun

Regarding the last post: Thinking about it a bit more, my statement that this is unrelated to the issues I'm usually talking about here, is not quite true.

On the contrary: "-Ofun" is a nice way of putting it; but a large amount of the stuff dicussed in the essay (maybe even all of it), really boils down to tearing down entry barriers -- here, in a social context. Just shows how crucial this principle is in general :-)

The importance of having fun

This is somewhat unrelated to the technical issues I'm usually discussing here. But I'm agreeing so fully, and the issue seems so crucially important to me, that I want to point to it anyways.

Eric S. Raymond pointed out many organisational issues for successfull free software projects in his famous Cathedral and Bazaar paper. Most of the things he mentions there are right. But all of them are actually only a function of the fundamental idea the -Ofun, pointed out in this article.

And I know no other project so badly following it, so exactly doing the opposite of -Ofun, like the Hurd :-( It's really that, and nothing else, what makes Hurd's progress so slow. Can it be fixed? I hope so...

The other side

After recently commenting on KDE, I've now also stumbled upon GNOME's project Topaz, which -- together with some other pages it links -- describes future ideas for GNOME.

The Focus is a bit different than KDE's. (Fundamental UI changes, including "extensions", are mentioned at the setout, but there are very little actual ideas. Most stuff revolves around under-the-hood technical improvements.) However, with regard to new ideas, it's just as disappointing.

There are a couple of quite fundametal changes suggested -- mostly not terribly new, but still. There doesn't seem to be any special emphasize on any particular one of these. I'm just picking the one most relevant in my opinion: The VFS.

There is a page on the new GNOME VFS, which in turn links to the Desktop VFS from freedesktop.org. They are both describing mostly the same ideas, and I'm not sure about the exact relation between those projects, so I'll just treat them as one.

So they have -- correctly -- discovered, that POSIX file handling is unsuitable for most of today's applications. Full agreement here: We made the same discovery when designing various Hurd translators. open(), read()/write(), close() were sufficient in times when most Unix tools worked as filters, sequentially reading input files, sequentially writing to output files. Most of today's applications require different semantics.

One very important operation, as the GNOME folks have accurately observed, is atomically reading or writing a whole file. They want it because most of today's applications read the whole file when "loading" a document, and store the whole file when "saving" -- which is wrong IMHO, but that's a different story :-)

Nonetheless, in the Hurd it is probably even more fundamental: When a translator is exporting data through file nodes, it is extremely common for clients to read the node contents into a string, or store a string in the node. A simple operation doing this in one call, would be awfully useful. Not only because both client and server need considerably less handling for that, but also because knowing that the client just wants to write the whole file, is very important information for the server.

Generally, we need much more semantics in file operations than POSIX offers. Think of inserting data in the middle of a file. With POSIX, the only way to do that, is to overwrite the entire rest of the file. This is not only complicated and terribly inefficient: In case the underlying file is served by some more interesting translator, it can actually pose serious functional problems, if parts of the file are overwritten for no real purpose.

For operations that don't fit any of the generic semantics (write entire file, insert data, ...), we probably need to introduce transactions, to allow manually grouping primitives into semantic units. (This is probably what OGI's comment to an older post was referring to -- which would mean that at last I've understood the idea behind this comment :-) ) For many translators, it's crucial to know whether an operation is completed and data should be processed, written to the store/network, whatever; or whether following calls will alter it further. With POSIX only, some translators can only be usefully implemented by employing quite sophicticated caching and heuristics, if at all.

So, the GNOMEs are right about the necessity of new filesystem semantics -- though I don't know if they'll get all the issues mentioned above right. Sadly, that's where sanity stops in their proposal(s). A special filesystem API for desktop use only? That's absurd. How did they get that silly notion, that the file access requirements of "desktop" applications are fundamentally different from command line tools or daemons, so much as to warrant a special API for desktop use alone?

Oh well, I guess that's the general problem of GNOME (and KDE): Considering the underlying system(s) as given, they tend to pile layers on layers of workarounds, instead of much more simply and usefully fix it right at the system core level. (Reminds me of MS Windows, which started as a desktop environment, and ended up being an OS... In just a few years more, we will probably hear people say: "GNU/Linux? Isn't that obsolete? I'm using GNOME!") This just shows that we really need a GNU kernel, so the developers of a GNU desktop environment won't need to sink tremendous amounts of time into working around limitations of systems they have to run on in lack of a native one, where they could get all the functionality they need... But well, that's a different rant.

So, now that we agree ;-) an extra API for better FS access is silly, what is the alternative? That's obvious: Just like we already have Hurd extensions to POSIX interfaces in many other places, we should try to extend the standard POSIX file operations with the stuff we need, without forsaking compatibility. I'm pretty confident we can do this. (I've already discussed some aspects in conjuction with device drivers.)

Clients not aware of the new semantics, can continue using the old ones. Those that want to use the new features, will check with the server whether it implements them, and fallback to the traditional stuff otherwise. Most of this can probably be handled transparently in libc (client side) and/or the FS server helper libs: If a particular server doesn't know about the atomic file read/write operations for example, it will just get a series of standard POSIX requests doing the job instead. No need to force a switch to an incompatible new API.