I was surprised to see Jakob Nielsen’s conclusions about the state of Web usability and the need for UI standards (Jan. 1998, p. 65), and having just come to the opposite set of conclusions in my column "View Source: Lessons from the Web’s Massively Parallel Development" in ACM’s networker (Dec. 1998, p. 56), I’d like to respond. Nielsen and I are not in disagreement that many, even most, Web sites have poor UI design, but we disagree about causes and remedies. My belief, outlined in my networker column, is that most of the things Nielsen identifies as weaknesses are in fact strengths, and most of the solutions he proposes would damage the very things that make the Web so successful. People have been speculating about distributed hypertext systems since 1945, but the Web is the first one to achieve global force. Under those circumstances, we have to assume the Web is doing something right.
Nielsen commits a design-guru error when he assumes that the Web is successful in spite of its poor design. The Web is successful because of its poor design. It is the lack of enforced conformity to standards other than HTTP headers and HTML validity that has enabled it to grow so quickly in both volume and functionality. Furthermore, attempts to "fix" the Web would damage the very source of its success. This is simply a replay of the "worse is better" argument in the Lisp-vs.-C debates, where the weaker method actually led to stronger results (see www.jwz.org/worse-is-better.html). The Web grew as it did precisely because it does one thing well—linking files—without wasting any time trying to prevent designers from making design mistakes.
As an illustration, consider Nielsen’s assertion that had Apple "played its cards better, HyperCard might have grown into the universal status now owned by the Web." There are no scenarios in which this could have come to pass, for three reasons. The first two: HyperCard relies on a programming language, the Web does not. If you’re talking about something that is going to be used by millions of people, the phrase "easy-to-use programming language" is an oxymoron; any programming at all is too high a hurdle and will limit distribution. Furthermore HyperCard is software, while HTTP and HTML are platform- and language-independent standards, and it is easier to spread standards than software. The third and damning reason: HyperCard strongly enforces standards of UI design, frustrating beginning designers while lowering the speed with which easy tasks can be accomplished. Before convincing yourself that HyperCard is a runner-up for the category of universal hypertext system, ask whether your mother would find it easier to make a HyperCard stack or a Web page.
Nielsen’s argument is filled with ideas of centralization and force; we hear that "to ensure interaction consistency across all sites it will be necessary to promote a single set of design conventions" with nary a word about the loss of experimentation and innovation this will entail. Or that "the main problem lies in getting Web sites to actually obey any usability rules." Obey? Hasn’t Nielsen noticed that the lack of central authority is a core reason for the Internet’s success?
Nielsen underestimates designers and users; his argument assumes that people are either too stupid to know bad interfaces from good ones or too impatient to try telling one from the other, and so must be shielded from the worst design errors by fiat. This might be true for an operating system interface, with its expense and high switching costs, but it is not true on the Web, where switching costs are just a click away. In fact, the Web is almost perfectly suited to allow the users themselves to reward good design and punish bad design, and is the solution to the UI problems Nielsen discusses.
Moreover, Nielsen mentions and then ignores the very idea that undermines his vision of central coordination and control: Design Darwinism. "Survival of the fittest" is a fourth entry in his list of possible solutions to bad design, and the only one that doesn’t rely on forcing designers to operate according to central standards. The Web is a giant market, with many buyers (users) and many sellers (sites), each acting out of self-interest. We know how to increase quality in this kind of environment, and it isn’t central planning. It’s competition. If users really do prefer well-designed sites to badly designed ones (and I share Nielsen’s sense that they do), then making their preferences more apparent to one another, and by extension to the designers and owners of Web sites, will be a faster and better way to create the move to quality.
Clay Shirky
New York, NY
Jakob Nielsen made some insightful points in his article "User Interface Directions for the Web" and some contentious ones.
As Chair of the North American chapter of The XML/EDI Group (see www.xmledi.com), I can say that the members of our group are working hard to promote the use of XML to significantly ameliorate the effects Nielson discusses.
XML provides the ability to switch processing and content manipulation from the server and the network to the client browser. This is not a new lesson but one that does require more sophisticated approaches. Both major browsers (Netscape 5.0 and Microsoft Internet Explorer 5.0) are set to provide XML-enabled site capabilities for their users.
In the area of site preparation, XML holds considerable promise in reducing start-up time and cost. By allowing the use of templates, styles, and form scripts, Web content can be generated from metadata. Thus the end user can be directed to create sophisticated sites without needing specialist skills. Put simply, the Web site preparation tools are getting much better and smarter.
However, Nielson’s discussion and rendering of interesting people and applying "bozo filters" I found flawed and offensive. The questions often posed by neophytes are the most interesting, insightful, and challenging. The whole point of the Web experience, one could argue, is it provides a level interface, despite one’s background. One could also argue that if a neophyte asking annoying and trivial questions shows the designer has singularly failed to prepare good materials, FAQ, and other aids to allow assimilation of the discussion.
I’m not sure how far XML can ameliorate such interchanges, but at least it can ensure they take place without intolerable delays.
David Webber
Carrollton, MD
Jakob Nielson Responds:
It is somewhat of a moot point to argue over what HyperCard might have been, but it was indeed possible to create HyperCard stacks without having to touch HyperTalk (the scripting language inside HyperCard). There were several levels of authorship involving nothing but drawing and typing on the screen and filling in simple dialog boxes. And that’s without contemplating possible authoring tools that could have emerged if Apple had done better promoting HyperCard.
HyperCard was a very creative environment and proof that many interesting designs can emerge while still maintaining the benefits from following human interface guidelines. For screenshots and examples, I refer readers to a review of three large HyperCard stacks I wrote in 1989 (see www.useit.com/papers/cdrom. html). Except for being black-and-white and small-screen (technical limitations of the first releases of HyperCard), these designs are better than most of the current Web.
Design Darwinism will work but is a very expensive way to achieve usability. Why go all the way to shipping products only to have to throw away 99% of the work? Users definitely do recognize quality and aggregate at the better sites, but that’s a difficult way to learn if you are on the losing end. Guidelines, standards, and systematic usability engineering methodology can dramatically enhance a site’s chance of success. My advice to anybody responsible for a company’s Internet strategy is to maximize the site’s chance for survival rather than launching a random design and hoping it will be the one to become a hit. An analogy: If you are going to be reincarnated as an animal, would you prefer being a random mutation (a few of which become new species) or would you rather be an elephant?
The Internet is strongly based on standards for everything from email headers to IP numbers. It would immediately fall apart if each vendor used its own formats. User interfaces are slightly different because humans are more adaptable than routers. People can learn anything if they try hard enough. The question is merely why they should have to suffer so.
The experience from all other user interfaces is that consistency and predictability enhance usability:
- In mainframe days, IBM had big projects to get everybody to use the same function keys.
- The biggest success of the Macintosh was that users could move between applications with little training because the trade press forced compliance (any design deviations would be reviewed as "not Mac-like").
- Books are always printed with the table of contents in front, the index in back, and sequential page numbers on the pages in the middle.
Why would the Web be any different? In fact, it’s not. The lessons from 20 years of usability experience continue to hold, even as implementation technology changes. At any given time, usability is improved by following standards and conforming to user expectations, but over time, these standards and expectations have to evolve as the technology improves and new interface ideas are invented. This is what we said in a book I edited about user interface standards in 1989. Ten years later I say the same thing because these design lessons are based on fundamental human needs and characteristics.
In response to Webber: It is sad but true: With enough people and Web sites clamoring for your attention, you do need a bozo filter to focus your attention on the most valuable ones.
This does not mean that a commercial customer service operation should filter out newbie questions. On the contrary, such questions are indeed good feedback to reveal weaknesses in design, documentation, and Web support pages. But if I want to have a discussion about, say, strategic trends for Web design, I shouldn’t have to wade through thousands of questions asking whether to design for 640 or 800 pixels.
HomeNet Study
I deeply and profoundly disagree with four key points of the HomeNet Group’s research ("On Site," Dec. 1998, p. 21).
One point is the assertion that a study across any group without a control group is meaningful. I seriously doubt this. While HomeNet blithely asserts that the results cannot have been affected by the age, economy, school, and home situation, or the home football team’s performance, they don’t know and can’t prove it. So they simply define the problem away by saying their early measures and later measures across the same group are statistically significant.
The second point is that the differences measured were in any way significant. I don’t mean "statistically significant," in which mathematically one says, "I can, with extreme certainty, say these results correlate measurement A and measurement B," I mean socially significant. The example I cite most is that (I may be off by one unit here, but the numbers are approximately correct) it is socially significant to the individual that in one month before Internet usage, said individual interacted with 24 people, and in one month during the experiment said individual interacted with 23 people? Can you convince me in any way that interacting with one less individual in the course of a month can affect my health, happiness, or psychological profile? I firmly believe the other measures were equally meaningless, in that they were incredibly tiny differences, and cannot account for "test awareness" or other possible factors. For example, if you are asked to measure how often you feel depressed, you will (a) become more conscious, over time, of such feelings and (b) be prepared to attribute any slight discouragement to a "feeling of depression" as asked for on the form. With no comparable control group measured, we have no idea whether HomeNet’s conclusion or either of the reasons I gave are the valid explanation.
There is a truly bizarre set of assumptions about the metrics of "psychological and social well-being." I was amazed to see that a decrease of 24 to 23 interactions in the course of a month is an indicator of a significant decrease in psychological and social well-being. In fact, although I think of myself as a quite happy person, content with life, I find it difficult to enumerate 24 people I’ve interacted with in the last year, let alone in the last month. (I can count, at most, 10 social interactions I’ve had in the last month, and that includes spouse, parents, and parents-in-law, and that month is fairly typical.) When I confronted Kraut with this question at a public lecture, I was told "30 years of social research proves that people with large social circles are happier and better-adjusted than those with small social circles." I then observed that this was certainly an average result, and there are almost certainly tens of millions of people with small social circles who are quite happy, but my objection was dismissed as meaningless, since the evidence of my own life clearly contradicted a national average and was statistically unimportant. Unfortunately for social scientists, people are not statistics; what keeps me quite happy may not keep another person happy.
Because there was no control group, there was no measure about the effects of other, highly interesting, self-absorbing activities people could engage in that could have identical effects. For example, from age 12 through 16, I was deeply interested in photography and had my own darkroom. I was also interested in electronics and had my own workshop/lab. During college I was deeply interested in computers and worked all available hours as a part-time programmer. Was the printing of photographs or the construction of electronic circuits socially harmful to me because I spent time in the basement doing this and not "interacting with a larger social group"? Unless you measure the kind of people who win science scholarships across similar time periods, it is not clear the Internet could not be replaced by building model airplanes, ham-radio communication, preparing for an Olympic sports team, playing the piano or chess with equal devastating effects on the social well-being of an individual (a 4% decrease in their remembered social circle).
Therefore, I assert, although the study was scientifically meticulous, its conclusions are essentially meaningless in terms of actual people.
Joseph M. Newcomer
Pittsburgh, PA
The HomeNet Group Responds:
Newcomer’s letter shows either a deep misunderstanding of the logic of casual analysis or a failure to carefully read our research report before criticizing it. Our complete research, reported in the American Psychologist (Vol. 53, No. 9, 1,0171,031), showed during their first year or two online, as people used the Internet more, they became reliably less socially engaged and more lonely and depressed. Newcomer challenges these findings by arguing that because our research design did not include a control group, it could not distinguish the Internet as a cause of these changes from other events occurring during the same time period. However, our conclusions were not based on a simple comparison of a sample before and after they got online, but rather on a comparison of heavy and light users of the Internet over time. As a result, factors both the heavy users and light users were equally exposed to cannot explain differences between them. Thus, Pittsburgh weather, the local economy, or sensitivity to depressive thoughts brought on by completing an initial questionnaire cannot account for our results, because both groups were exposed to these potential influences.
Newcomber is correct in noting we did not conduct a true experiment, with an experimental group randomly given access to the Internet and a control group randomly prevented from using it. An experiment would be a stronger test of the casual impact of the Internet, because it would ensure, within statistical limits, that the two groups were initially equivalent. Our published research used statistical techniques to control for pre-existing differences between the heavy and light users. It also included longitudinal data to demonstrate that the groups did not initially differ in their social and psychological well-being. We are currently conducting a true experiment, in which people who recently bought a computer are randomly given an Internet account or not given one. But even this research will not rule out alternative explanations, because some of those who are given Internet access will not use it, and others in the control group will subscribe to an ISP on their own.
Newcomer notes that the effects of using the Internet were small and clouded by large individual variation, as we acknowledged. Our results show changes in the outcome variables from using the Internet on the order of 5% to 15%, rather than the orders-of-magnitude effects that would convince Newcomer the results were "socially significant." The impact of the Internet is small, primarily because humans are homeostatic systems, with many competing influences producing stability. Marriage, psychotherapy, and college education don’t produce orders-of-magnitude change in psychological and social well-being, so it is unreasonable to expect that a few hours online per week will do so. However, a small change experienced by the millions of people using the Internet can still be very socially significant. We understand that individual differences are real and congratulate Newcomer for being happy with a small social circle. But his happiness is as irrelevant to the causal link between social support and happiness as is the existence of a 100-year old smoker to the causal link between smoking and cancer.
The importance of our results are not in the size of the effect, but in its direction. We were shocked that a technology we and others expected to be positive or, at a least, benign would instead have any negative impact on psychological and social well-being. For example, we find it disturbing that people who are heavy users of the Internet and email, its major application, report keeping up with fewer others. We should treat the direction of the effect as both a scientific challenge to understand why it occurred and a technical challenge to eliminate it. We do not know whether the hours spent online with the Internet have better or worse effects than hours watching TV, practicing piano, preparing electronics experiments, or training for an Olympic sport, but we certainly should aim to do better than TV.
Evidence for RST claims
I read the Technical Opinion "Myths about Rough Set Theory" (Nov. 1998, p. 102) searching for the scientific evidence that RST advocates claims as their superiority—objectivity—over other approaches. I suggest the authors provide references to such claims. Computers are used to process polluted or incomplete data. Therefore competent users are needed to evaluate input data and interpret results produced by an algorithm-driven robot.
W.M. Jaworski
Montreal, Quebec
Authors Respond:
Our "Technical Opinion" was inspired by the following excerpt from the Communications article "Rough Sets," (Nov. 1995, p. 89): "Rough set theory is objective—for a given information table, quantities of corresponding approximations are computed. On the other hand, the Dempster-Shafer theory is subjective—it is assumed that values of belief (or plausibility) are given by an expert."
We apologize that our response was rather late, but the subjectivity issue is of fundamental importance (comparable with an infamous nuclear fusion at room temperature in physics), as pointed out in the letter. The quote contains literary truth: "For a given information table," which makes the first statement valid. However, when the same logic is applied to the second statement, it becomes: "Dempster theory is objective—for a given belief (or plausibility) function." Conversely, claiming that "Dempster-Shafer theory is subjective—it is assumed that values of belief (or plausibility) are given by an expert" has the logical implication of "Rough set theory is subjective because the information table construction requires an expert’s opinion (e.g., selection of attributes, split of attribute values)."
We acknowledge that RST is an important tool for knowledge discovery and data mining. Numerous applications attest to its utility. However, the unsubstantiated objectivity claim may badly impede further applications by overblown expectations, while RST stripped of the objectivity claim will broaden its application, since there is immense need for (rather difficult) processing of subjective assessments and judgements. Unfortunately computer science insists on pleasing reluctant users with more and more tools for processing objective data—a rather scarce commodity, as shown in our T.O.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment