Persistance vs. Insistance

Session management is a thread of thought I only recently took up in my considerations. Nevertheless, it touches on many other aspects -- it looks like this has been the missing link in various contexts. But more on this later.

Session management basically means: I don't want to manually restore my work environment on each bootup; I want it to come up just as I left it -- usually at least.

There is little doubt working session management is something users really really want to have. I say working, because retrofitted concepts like X/KDE/GNOME/whatever session management (which is the most explicit most users know), tend to be incompletely implemented and thus useless.

In a UNIX environment, most of the system doesn't know session management. To change the startup behaviour of something, you usually need to explicitely change settings, either by editing config files; or in GUI applications, usually there are a number of startup-related settings in the options dialog.

Nevertheless, besides the not-working X session management, you'll find many places where session management aspects have sneaked in, if you look closer. A typical example would be the soundcard mixer init scripts, which save the current mixer settings on shutdown and restore them on next startup. Some applications, like Opera or Vim, have options to explicitely and/or automatically save/restore their sessions. The shell (and some other command line based applications) save the command line history. In the Hurd, we have passive translators, as kind of session management at the filesystem tree level.

And then there is screen, which implements kind of a poor man's universal session management: When you detach a screen session from the terminal, the software just runs on in the background -- instead of doing real shutdown/startup, you just detach and reattach your session. (Screen sessions are usually used on servers, i.e. systems that run permanently with rare interruptions.) Those who know this feature usually love it -- by this little "trick" it implements something resembling session management that really and always works, because the applications needn't be aware of it at all...

Another, less obscure example of transparent and thus (more or less) reliably working and widely established session managment-like behaviour, is suspend to disk. Just puts the whole memory image to disk, and restores the exact same situation on resume. (The exception are most hardware drivers, which obviously need to be aware of the suspend. This was actually the main starting point for my considerations -- the suspend infrastructure of the hardware driver framework could and should be extended to the application level...)

A very similar approach -- which could be considered an extension of suspend, although it has a completely different origin -- is persistency like in EROS. The difference is that here the image is saved not only on suspend, but periodically (every 5 minutes), so the system will come up in the last state even on a power outage or so.

One side effect of system-wide persistence (again, with the exception of system core and hardware drivers), is that you need no sophisticated system boot and program startup mechanisms -- you just add objects to the system once, and they live on forever. (Unless you decide eternity is a bit too long, and get rid of them earlier...) Which is the very reason why EROS has it: Making the whole system persistent seemed easier than creating a method for secure explicit storage/retrival of capabilities. But they claim it's desirable for usability purposes also...

However, there are downsides to this. Completely transparent session management works fairly well in case of specific things like screen, because of the scope of a screen session usually being quite limited; and suspend, because of it typically being used only for fairly short breaks in work that means to be continued exactly where you left off. But a completely and always persistent system, creates various problems.

For one, to update software, you basically need to create a new object in place of the old one. If you want to preserve state information in the process, you need to implement this explicitely: Either run the new version in parallel and pass the state from the old one to the new (session handoff), or dump the old state to a third party temporarily and read it when starting the new version (session saving). In both cases, you are effectively implementing explicit session management -- all of the nice transparency is gone.

This actually doesn't only happen on updates: If you want to move some state information, e.g. when replacing hardware or working on a different machine or whatever, you need the very same protocols -- only that the actual state transfer gets more complicated.

Another, related problem is flexibility in general. Transparent persistance is a sledge hammer approach: You always get the whole thing. But in many situations, you do not actually want to, or can't, restore the system in the exact same state. Maybe your system configuration changed meanwhile, so some stuff won't work or doesn't make sense anymore. Maybe you screwed up and explicitely want to easily get rid of parts of the old state. Maybe you want an easy way to active only parts of your session at times. Maybe you want to carry along part of your environment to different machines. And so on. You might even have your system on a USB stick and need it to adapt to a different machine each time you boot it!

So what we really want is a flexible system of subsessions, that can be restored or not upon demand and/or resource availability. One session for the core system, various sessions for individual hardware components, a number of sessions for background services, some sessions for higher-level system components traditionally handled by runlevels (networking, windowing environment), and lots of sessions for individual parts of your application envirionment -- a music player session, a news reading session, a communication session, various sessions for projects you are working on at times, etc. You get a whole tree of subsessions and sub-subsessions building upon each other. (Those subsessions are also related to various other mechanisms like resource management; security; and forming an application infrastructure from generic components, as touched in my post on a hurdish X implementation -- that's the "missing link" aspect mentioned at the start. I'll handle those in other posts.)

So between disfunctional retrofitted session management, and sledge hammer total persistence, we really want a session management approach that is not quite transparent, but fully integrated and consistently implemented throughout the system.

Implications from that requirements, as well as ideas how it could be implemented, I'll leave out for now, as this is already getting quite lengthy...


Ognyan Kulev said...

Yes, long and broad article. I was particularly pleased that you've considered transparent updating of server processes by passing state from old to new version :-)

All this reminds me of another problem that has to be solved in the Hurd: RPC context. For example, transaction handle can be put in context so that all file system operations are in one transaction, and possibly only in file systems that support transactions. What I'm interested in is not hard-coding transaction handle in context, but general context for different things.

The EROS Guy said...

Well, gosh. Lots to say about EROS persistence. Where to start...

To update software, you basically need to create a new object in place of the old one. If you want to preserve state information in the process, you need to implement this explicitly

Well, yes. In normal systems we would do this by writing a file and rereading it. We do this with databases all the time. It can work the same way in a persistent system.

The difference is that in a persistent system, it is relatively easy to hot-update the component without taking the application down at all.

I don't agree that this is explicit session management. Session management is a recording of relationships between components. Upgrade deals entirely with local state within a single process.

In any case, I don't see that persistence added any new complexity here.

In many situations, you do not actually want to, or can't, restore the system in the exact same state.

If you don't want things, kill them. This also deals with misconfiguration As to system configuration changes, we've had that solved for years, and it isn't a big deal.

Actually, misconfiguration in EROS tends to be less hazardous than in conventional systems, because it is easy to test new versions of subsystems in their own environments, isolated from the one that you are relying on.

Maybe you want to carry a part of your environment along to different machines

This is a real issue, and we don't have a good answer to it. We can certainly do everything that you can do in, say, Linux today. But the session stuff really does seem to be inextricably tied to a machine.

On the other hand, would you rather the system didn't recover when you kick the plug out of the wall?

What you describe as "sledge hammer persistence" works, and it works efficiently and well. The problem with trees of subsessions is that they are almost impossible to build efficiently. For an explanation of why, have a look at how EROS actually implements persistence in our paper

marcus said...

You say that the Hurd has passive translators to keep some system configuration persistent. But passive translators are a fundamentally flawed concept.

Here is the question you need to ask: Quick, what is the execution environment of a passive translator when it is start up? What are the environment variables it sees? What is the root directory and the current directory?

Passive translators are broken because they will inevitably run in the wrong execution environment. Here is a specific attack: A program runs in a chrooted filesystem. It installs a passive firmlink translator within the chroot that points to /. When that passive translator is activated by the chrooted task, what happens? If the passive translator is (accidentially ) activated by a task outside the chroot (for example, an ls -l or find invocation), what happens?

Right. The root directory of the firmlink will be the _real_ root directory of the parent filesystem, not the chrooted root. Active translators do not have this problem as they inherit their environment from the task installing the translator, ensuring proper authorization.

But active translators can not survive a system shut down or machine reboot, as they can not be stored to disk. Passive translators can be stored to disk, but they are inherently insecure.