Artificial Intelligence and Machine Learning News

Rebuilding For Eternity

Researchers use computer vision techniques to preserve culturally significant sites as high-resolution 3D models.
  1. Introduction
  2. Mining Tourists' Photos
  3. Laser Scanning and Photometry
  4. References
  5. Author
  6. Footnotes
  7. Figures
Bayon temple statue
A photo of the face of a statue at Bayon temple from the library of the Bayon Digital Archive Project, led by Katsushi Ikeuchi of the University of Tokyo.

Buildings collapse. Wind and rain beat them, temperatures cycle from freezing to blistering, and random strikes of lightning threaten sudden obliteration. Those in wet climes face water rot; in the desert, ceaseless wear by dust and sand. Even more potent are the human challenges: war, fire, and deliberate destruction. No earthly structure is safe, from the Ancient Library of Alexandria to the Twin Towers of New York City.

But digital representations can survive such dangers, capturing structures forevermore. Digitization provides other benefits, such as the ability to “visit” a structure remotely, to examine its otherwise inaccessible details, or to observe how it’s changed over time. Further, digitized structures invite researchers to apply computer-based analytic tools to draw out new discoveries in such fields as archaeology, history, and architecture.

Improvements in digital storage, network access, and processing power of the past 10 years have encouraged researchers to capture ever-larger sites. Now, two distinct methods enable high-resolution, 3D digitization of structures as large as a city street or a multi-acre historical site. One method uses high-end laser scanning equipment for accurate digitization to the sub-millimeter level; the other uses video or photo collections as the basis for analysis and reconstruction.

Back to Top

Mining Tourists’ Photos

By now, many computer-savvy people have encountered site reconstruction in the form of Microsoft’s Bing Maps, Google Earth, or Google Maps’ Street View feature. All three are the result of mapping programs that were enhanced with real-world imagery captured through satellite photography, aerial photography, or at street level.

These efforts, while impressive, produce imagery that shows sites from only one or two aspects. Such views are sufficient for most purposes—to help people find, recognize, and “tour” remote locations—even while they’re inadequate for 3D reconstructions on their own. Google Earth offers an additional way to add handcrafted 3D models to its terrains through the Google Sketch-Up program—a feature appropriate for both existing and historical buildings. One unusually extensive Sketch Up initiative placed recreations of more than 6,000 ancient Roman buildings in their original positions in Rome.

Another consumer-level phenomenon that contributes to high-end 3D modeling is crowdsourced photos, available through such sources as the photo-sharing Web site Flickr. That was the basis for a reconstruction of present-day Rome, led by Sameer Agarwal, an acting assistant professor of computer science and engineering at the University of Washington, and described in the paper “Building Rome in a Day.” Agarwal and colleagues selected 150,000 photos from more than two-and-a-half million returned from a search for “Rome” and “Roma,” and applied existing Structure from Motion techniques to meld multiple views of the same structure into a unified 3D model. Central to this project—and to most others that agglomerate photos—is a feature detector that recognizes image elements. This project uses the popular Scale-Invariant Feature Transform algorithm; others include Speeded Up Robust Features and Maximally Stable Extremal Regions.

The Bayan temple project measured the entire site to a resolution of at least one centimeter.

The Rome project was unusual in its scale, and the University of Washington team used a variety of existing and novel approaches to match photos, place them in relation to each other, and ultimately meld them back together into a consistent geometry. The latter steps led to two innovations. First, the team computationally reduced the number of photographs to a “skeletal set” to simplify and confirm site geometry; second, newly developed software optimized the results over a distributed parallel network of about 500 cores. The result was a completion time, from photo matching to reconstruction, of less than 22 hours. Previous techniques would have taken more than 11 days to complete just the photo-matching process.

Much of the work on this project, and the Photo Tourism project by three of the same researchers, led to production of the consumer-level program Microsoft Photosynth and an open-source version, Bundler. According to Microsoft Partner Architect Blaise Aguera y Arcas, approximately 16% of recent, user-contributed Photosynth mosaics are of artwork or heritage-related sites.

The photographed environment for the Rome project was fairly static. Many of the sites were of long-standing structures, and nearly all photos were taken in the five years since the launch of Flickr. But 3D modeling techniques can also enhance historic photographs, as the paper “Inferring Temporal Order of Images from 3D Structure,” by Grant Schindler, a Ph.D. candidate at the Georgia Institute of Technology, and colleagues, demonstrated with the use of a collection of 212 photos of downtown Atlanta taken over a period of 144 years. 3D analysis of the photos determined structure locations and then, based on the pattern of buildings appearing and disappearing, a constraint-satisfaction algorithm put them in the correct order. (An interactive applet at enables people to “time travel” through the images.)

Back to Top

Laser Scanning and Photometry

Photo-based modeling’s big advantage is that many people can create its source material. But it has its failings, most notably when trying to coordinate photo collections that depict large blank spaces. Depth information is mostly interpolated from multiple views of the same object. If those views don’t exist, the image remains flat. Also, quality, while improved with more cameras, tends to be uneven. And photo-based modeling is only effective in places that are accessible to a camera.

Laser scanning combined with photometry comprises a much more reliable solution for applications that require it. That’s the combination Katsushi Ikeuchi, a professor in the Institute of Industrial Science at the University of Tokyo, and colleagues used in capturing the Bayon temple, a complex of sacred structures in Angkor Thom, Cambodia that covers more than five acres. Ikeuchi’s team used laser-measurement devices mounted on scaffolding, ground-level tripods, and a cherry picker to determine surface depth in places where those conventional approaches could reach. To scan down narrow corridors it added ladder-mounted climbing scanners, and for points high on the 40-meter-tall temple it scanned from a tethered balloon, in both cases developing software to compensate for the scanners’ sometimes-unpredictable motion.

The Bayon Digital Archive Project gathered more than 10,000 range images totaling more than 250 gigabytes of data, measuring the entire site to a resolution of at least one centimeter. Because the data set was so big, several new algorithms were needed to match and align the points into a 3D mesh.

Instead of the “next-neighbor” alignment algorithm previously used, the Bayon temple team developed a two-step process that quickly identified matching pairs at the time of data capture, thereby converting the ultimate calculation from N2 to N complexity. Later, the points were aligned simultaneously on a parallel processor cluster in a week of processor time—a 14-fold improvement over what the team claims was needed under existing algorithms.

The final model was important in two regards. First, it captured details of the 800-year-old temple, which is in danger of collapse. Second, it made comprehensive computer-aided scrutiny of the site possible—a benefit that bore fruit when the team was able to definitively categorize the temple’s 173 carved stone faces in a new and significant way.

Similar results came from laser-scanned models at another historical site, the mausoleum of Henry VII, a 14th-century King of Germany. The monument comprises many parts that have been moved, lost, changed, and amended over the intervening 700 years, and the project’s goals were both reconstructive and educational. As Clara Baracchini, officer at Superintendency for Environmental, Architectural, Artistic, and Historical Heritage of the Provinces of Pisa, Livorno, Lucca, and Massa Carrara, and colleagues noted, the project could be used “to teach medieval sculpture to students and to let them try to reconstruct the original monument from the disassembled components.”

But while laser scanning can create models of unsurpassed detail, the main problem with it is that, as Georgia Tech Associate Professor Frank Dellaert put it, “You can’t do it in the past.” On the other hand, photo-based approaches lack some of laser scanning’s advantages. The two approaches differ greatly in approach, cost, and purpose, but both have already proven themselves invaluable for historians and researchers.

Back to Top

Back to Top

Back to Top

Back to Top


UF1 Figure. A tourist’s photo, left, of the face of a statue at Bayon temple posted on the Flickr Web site. At right is an image of the same face in the library of the Bayon Digital Archive Project, led by Katsushi Ikeuchi of the University of Tokyo.

Back to top

    * Further Reading

    Xiao, J., Fang, T., Tan, P., Zhao, P., Ofek, E., Quan, L.
    Image-based façade modeling, ACM Transactions on Graphics 27, 5, December 2008.

    Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.
    Building Rome in a day. International Conference on Computer Vision, Kyoto, Japan, 2009.

    Schindler, G., Dellaert, F., Kang, S.B.
    Inferring temporal order of images from 3D structure. International Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, 2007.

    Ikeuchi, K. and Miyazaki, D.
    Digitally Archiving Cultural Objects.
    Springer, New York, NY, 2008.

    Ikeuchi, K.
    UTokyo's e-Heritage Project: 3D Modeling of Heritage Sites.

    Baracchini, C., Brogi, A., Callieri, M., Capitani, L., Cignoni, P., Fasano, A., Montani, C., C. Nenci, C., Novello, R.P., Pingi, P., Ponchio, F., Scopigno, R.
    Digital reconstruction of the Arrigo VII funerary complex, VAST 2004.


Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More