First, on Monday last week, I read in the news that the U.K. government announced the creation of a new Institute for Web Science. Prime Minister Gordon Brown said 30 million pounds will be used to create this institute to help make " 'public data' public," and act as a bridge between research and business.
Then, this Monday, I read Tim O'Reilly's excellent article on "the State of the Internet Operating System," in which he talked about how the way we organize computing systems in the world is completely different from how we teach computing architectures. He is right. When you think about how we enable a user to type some keywords and get back, say, pictures of a moose, there are a lot of moving parts that all must work together seamlessly. These components include server farms, IP and caching networks, parallel large-scale data analysis, image and facial recognition algorithms, and maybe location-aware data services. He said the "Internet Operating System" components include search engines, multimedia access, systems relating to user identity and your social network, payment systems, advertising, activity streams, and location. How many universities can say they have experts in all of these areas? These topics are often only covered in computer science departments as either advanced topic courses or, worse, not offered at all.
What do these two pieces of news tell us about the state of the world? There is wide recognition that the Web has changed the world.
"Well, duh!" you say. But there is more....
I read that Rensselaer Polytechnic Institute (RPI) has created the nation's first undergraduate degree in Web Science. The news release said that the students in this interdisciplinary degree program will investigate issues on the Web relating to "security, trust, privacy, content value." RPI President Shirley Ann Jackson was quoted as saying, "With these new degree programs, students and researchers here at Rensselaer will help to usher in a new era of understanding and study of the Web from its social and economic impacts to the evolution of data." Amen!
When I got my degrees, the university taught compilers, complexity theory, AI, algorithms, operating systems, and databases. While these courses enable me to learn new techniques such as MapReduce, large-scale analytics, visualization, etc., I often feel my education only equipped me to prepare for the Web world, but not actually prepare me for the Web world. How I wish my undergraduate curriculum included required studies on security and privacy; large-scale data analytics; advanced data-mining techniques; detailed study of recommendation algorithms and systems; as well as HCI research methods like remote user studies, eye tracking, and survey methods.
Am I saying that compilers and theory don't matter anymore? Of course not. They are still excellent academic research pursuits in their own domains, but there might be other new topics that should make it into the curriculum now to better prepare students for a new world. The construction of the new social Web, which is ever changing, requires a different set of skills! The world has changed, and so should the computing science curriculum.
Sorry, but I couldn't disagree more. This is not about computer science, it's about how we engineer modern systems.
Computer science is about the fundamentals of creating systems to process information, the basic abstract notions, logic, algebra, algorithms, computability, and computer organization.
It's clear we need to introduce new topics like virtualization, cloud computing, semantic Web, and social networks, but that doesn't change a bit the fundamentals of how computers work.
I went to university in the late 1980s and I've been able to cope with all the changes in those 20 years thanks to a solid formation in the fundamentals.
People need a future-proof education, not a perishable one.
You're making the exact same arguments that some mathematicians and electrical engineers used to make about computer science. How the fundamentals of their areas more than covered the new research directions. Heck, physics folks used to argue, and some still do, that everything derives from their field, so every other field is redundant.
As computer scientists, we have a choice. We can either teach our students compiler design and floating point implementations, or we can teach them machine learning and large-scale data analytics. We can either fold in this new research direction and expand our field, or we can say that it is an application area and let it organically grow until it is so big that it splits into a new department and leaves CS behind. (In fact, I often wonder if it is already too late for us to embrace and extend!)
Ed H. Chi
Investing in a large amount of software testing can be difficult to justify, particularly for a startup company. For every line of production software written, an organization should, at minimum, invest an equivalent amount of developer time and number of lines of code to test the created software. The extra effort means that features can take longer to develop and deliver to customers. With the constant pressure of "Deliver Now!" it is very easy to skimp on the amount of testing in an effort to launch sooner. The real difficulty is most developers are good enough that they only do minimal testing to make sure their software works as expected, deploy their software, and move on.
Companies can actually develop software like this for a long time. However, as soon as the software gets beyond a basic complexity level, the number of bugs that creep back in via regression or untested use cases will result in an unstable application. At this point the company is compelled to either (a) stop development, and add the regression tests they failed to do earlier, or (b) continue a bad pattern where a team of software testers chase regression bugs and add test suites for the previous version of software, while other developers create the next set of features (and bugs) concurrently. Both these patterns are flawed because the time they take to fix the issue is longer than it would have taken had the tests been created continuously.
Test-driven development, an Extreme Programming practice, is arguably one of the best ways to help ensure that the created software always has a truss to test it. The basic methodology is to create the test suite first, have it fail, and then create the methods that will get it to pass successfully. This helps to ensure that there is at least one test case for each method created by the developer who wrote the software. By having the testing harness developed concurrently with the software you will have placed the responsibility of testing on the developer who created the feature. This means the company saves time in overall development because the tests are created by people knowledgeable about what needs to be tested, and software can be tested continuously on every source code commit allowing for deployment on demand.
This leaves one critical hole in the testing process. How good is the test suite that the developer created? This is the point where I put on a pragmatic's hat. If your organization already has the discipline to test every method of your software, you should probably ask the developer to just test the "basic" behavior and allow for extending the test suite if a new bug emerges. The purpose of the test harness is to make sure the software works given the known assumptions of the software, and having them re-tested on every check-in and deployment helps build confidence that you are deploying correct software. The most dramatic bugs I have seen, with or without a test harness, have generally happened when an unanticipated event occurred, and testing against the unknowable is difficult.
My favorite story about unanticipated bugs that would have been helped by having a test harness in place occurred early in my tenure at Amazon. It was a bug I affectionately call "Karmic Revenge." The site was crashing on a subset of Amazon's book catalog, and it happened disturbingly frequently on the search results page. I was called in to identify the bug. (For those coders in the audience: I discovered that a data structure we were using was referencing an array at location offset of  which was causing the software to crash.) The catalog software had changed recently such that the number -1 was a flag that no data was available. Unfortunately, this knowledge hadn't propagated through the search software. The "Karmic Revenge" was the book that displayed the problem was about "Memory Management in C." Additionally, for the superstitious, the date the bug was identified, debugged, and fixed was Friday, February 13, 1998. Some bugs you just can't forget.
Had a test harness been in place, perhaps this bug would have never made it to the production site. Or if the bug had made it to the site, then once found, a new test would have been added to the test harness to prevent future occurrences. However, the structure didn't exist either in the code or at the organizational level. Better patterns of development will always reduce the likelihood of this error occurring and reoccurring.
©2010 ACM 0001-0782/10/0900 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.