Unite and Conquer

Returning to my POSIX level driver proposal: Many people expressed concerns that the standard filesystem semantics I want to use for (most) driver communication are too slow and not really appropriate.

The proposal explicitely mentions the possibility of using shortcuts wherever we experience serious performance problems with FS sementaics. Now recently I had some intitial discussion with Peter de Schrijver (p2-mate) at freenode.net#hug, on what specifically are the problems with the POSIX interfaces. It turns out that mostly the drivers have some generic, quite similar requirements. This means that instead of creating specific shortcut protocols only for some extremely demanding drivers, we should probably rather focus on a few generally useful extensions. I like this :-)

For one, drivers are often serving quite a large amount of very small requests. Pure POSIX semantics would introduce quite a considerable overhead here, because each single request needs to establish a session (open()/close()), unless it already has a permanent one; do addressing and other setup (seek()/ioctl()); and finally do the actual data transfer (read()/write()). In POSIX semantics, we need an extra RPC for each of those steps, plus some bookkeeping overhead. (If we want to avoid ioctl()s for the setup -- because they aren't very transparent, killing the major advantage of filesystem semantics -- there is even more overhead, as we need to introduce an additional file descriptor for setting request options.) A more appropriate protocol would wrap all of this in a single RPC.

Well, there is an important observation to make here: The optimization is useful not because we are dealing with drivers here, but because the drivers have a specific requirement (efficient handling of many small requests), which could also emerge in any other program -- I'm pretty sure many higher-level translators will profit from an optimization for that just as much. Which confirms my view that drivers aren't fundametally different from other programs, and we really want generic extensions to the POSIX/Hurd interfaces, rather than a special interface for drivers.

I guess we could implement this extension mostly transparently, with servers implementing it optionally as an optimization. Maybe even handle it in the FS server libraries, so translators not aware of the shortcut will just get the single RPC presented as a number of independant callbacks. Not sure about the exact implications, though.

Another property of drivers is that typically they are working with data that is structured in blocks, and doesn't really fit well with the POSIX assumption of all data being represented as sequential streams. Serializing the stuff (using XML or whatever) would by tremendously expensive, considering that drivers usually aren't processing the data at all, but only working with some status information, and passing the data on. What we really want is a memory container for the actual data, and a second independant memory container with per-block status information. (Including block boundaries in the case of variable-sized blocks.)

Again, we have a requirement that isn't really specific to drivers at all: Quite a lot of high-level programs are actually working with similar non-serial data, and would greatly profit from a generic extension allowing for passing several memory containers in a single read()/write() RPC.

Another possibility I'm considering, instead of adding a number of independant extensions handling different aspects that need optimization, could be just creating some generic method for merging several POSIX calls in a single RPC. This would be extremely flexible and powerful; however, I'm not sure it could be implemented without being too awkward. Also, the fact that it would need to handle the requests in a very generic fashion, might skyrocket complexity and nullify some of the performance gains we are striving for.

No comments: