Monday, December 26, 2011

Focus considered harmful

Most of us are used to the concept of focus in modern desktop UIs - there's usually one app that has it. But what is it? Essentially the app that has the magical focus is the target of all indirect input.

The way I see it, there are two classes of input devices in a common desktop system: direct and indirect. Direct input devices can readily and immediately target any app; while indirect ones cannot. Mice are direct input, keyboards are not.

The key problem with input is that you need to decide who gets it. This was not a problem with the OSs of yore where there was just one running app at all times, nor is it a problem with the new tablets because they have only one "full screen" app running at all times. Desktops with their multitasking, non-full screen apps however, have the problem in spades. Now that I use the Mac, I'm ticked off that cmd-tab doesn't necessarily mean you'll get the app front-n-center. You see, if you'd minimized the app before, you have to add opt to the cmd-tab just before the app you want so that it will pull up the minimized window as well. Otherwise its just the menu on the top for you. Why? I'm sure there's a really super important Mac/Apple reason for this behavior but it pisses the heck out of me.

Enter the newbie, exemplified by, in this case, my parents. My parents cannot figure focus out. They will painstakingly fire up an app, and start typing in the hope that Windows will magically figure out that where they're looking at (or were looking at before looking down at the keyboard) is where they want the keystrokes to go to. Of course, this is without having brought the cursor to the text field in the first place so all that input does is bupkis (Where do keystrokes go when typed without focus, I wonder). And if they'd actually remembered to "make sure the blinking line is at the user name field", McAfee decides to popup a message that silently steals that focus away.

Focus is an implementation artifact and a leaky abstraction at that. The currently available approaches are:

  1. Its a non-issue: This is what tablets do - you dont need to worry about who gets the input if its completely clear that there's always only one such target.
  2. No more indirect input: Again, this is what touch devices do; you don't need to pick a target when the act of providing input actually selects the target.
  3. Provide a switcher: This is the Alt/Cmd-tab solution. Provide a way to switch the target of the indirect input. However, as we've seen above, this has problems.
Would it help to be even more explicit about who gets the keystrokes than just highlighting the app's title bar? Would it help to avoid modal (and non-modal, but focus-stealing) dialogs?

Is there a better way that tells the user that any indirect input will be sent to a particular app? Or, alternatively, is there a seamless way of "figuring out" the intended target? I can think of two promising avenues:
  1. Computer vision is slowly getting better to the point where head tracking using web cams is actually possible. Can this reach the resolution required to figure out the app the user is actually looking at? Lots of false positives in the future down this path, but definitely its a head-on approach at the problem.
  2.  No more leaks in the abstraction: This is a cheaper solution to computer vision and essentially involves ensuring that:
    • There're separate "user controlled space" and "system use" space and never the twain shall overlap.
    • User controlled space will always have one app that has focus. It will lose focus only when the user wills it to. The action to switch from one app to another is atomic and results in the other app getting focus. This will be true even on tiling window managers.
      • Dialogs and other such UI elements that halt the app before user interaction will work as before.
    • In addition to serially switch the target of input using Cmd/Alt-tab there will be a key sequence to switch between apps - maybe like the Ctrl-Shift-F1-7 sequence that Linuxes have for the default ttys.
    • System use space will be read only, except for some suitable way to scroll through messages.
Todo: figure out how to deal with system messages that require immediate attention.

1 comment:

Unknown said...

Interesting insights on the focus metaphor in desktop's. Couple of thoughts:

- twm makes the window which has the mouse pointer as the active window. this used to annoy the crap out of me until i got used to it.

- "There're separate "user controlled space" and "system use" space and never the twain shall overlap." - the android notification system uses this approach. ios used to have a modal (and rather annoying approach) to notifications but it looks like ios5 adopted the notifications approach similar to that of android. the notifications in honeycomb (and now ics), are quite nicely implemented on the tablet.

- windows had a somewhat leaky approach to notifications with the "yellow bubble" in the bottom right but the bigger problem was that apps chose to not use it entirely. the system should somehow enforce this via abstractions.

- honeycomb/ics have a slightly better approach to the cmd-tab by using a separate menu item to show currently running apps with their active screenshot.

as a side note, a lot of people are migrating their parents to tablets (specifically the ipad) to get around the weirdness of desktop/laptop os's.