Artificial Intelligence and Machine Learning

Computer Vision, ML, and AI in the Study of Fine Art

Ongoing research in the analysis of art is building upon the vast store of algorithms and knowledge from mainstream computer vision, deep learning, and artificial intelligence.

By David G. Stork

Posted Apr 30 2024

Computer Vision in the Study of Fine-Art Paintings and Drawings
Computer-based Tools for Image Analysis of Art
Computer Methods in Resolving Debates in Art Analysis
Problems in Art Analysis that Resist Techniques from Mainstream AI Research
Conclusions, Opportunities, and Future Directions
Acknowledgments
References

In the past decade, computer vision (CV), machine learning (ML), and artificial intelligence (AI) have been applied to problems in the history and interpretation of fine-art paintings and drawings. Such automated methods provide new tools for connoisseurs and art scholars and have resolved art-historical debates that proved resistant to traditional art-historical methods. Nevertheless, immense challenges and opportunities remain for the application of AI in the study of art, specifically on problems that are inadequately addressed by mainstream AI research. For this reason, art analysis serves as a grand challenge to image-based AI.

Key Insights

Fine-art images are some of the most sophisticated, complex and valuable images ever created.
Such images present entire classes of problems not well addressed by mainstream AI research.
Fine-art paintings and drawings thus serve as a grand challenge to AI.
The computational tools, used with deep understanding of the relevant art-historical context, empower art scholars to answer outstanding questions, pose new classes of questions, and develop richer interpretive strategies.

Computer Vision in the Study of Fine-Art Paintings and Drawings

Advances in imaging technology and especially CV and AI have, for decades, benefited nearly every scientific and engineering discipline, including medicine, geology, biology, chemistry, and psychology. Consider that works of art bear the most memorable and important images ever created by humans, and many works themselves are exceedingly valuable—not just financially but culturally. It is natural, then, that computer methods, properly guided by scholars’ knowledge of history and context, should be of service in the humanistic studies of art as well. In fact, in the past few years, rigorous automated image analysis has assisted some art historians, critics, and connoisseurs in their scholarly studies of fine-art paintings and drawings.

Such rigorous computer image analysis of fine art is rather different from traditional “digital humanities,” which has generally concentrated on digital methods of capture and display but where the fundamental analyses and interpretations are still performed by human scholars and connoisseurs.³⁸ As we shall see, however, artworks pose several profound problems that require sophisticated methods beyond those in traditional digital humanities. These represent a grand challenge to AI, well beyond what is generally addressed in research in digital humanities programs and even mainstream artificial intelligence.

This article explores three manifestations of research into computer image-based analysis of fine art. First, the article offers an example of how computer image analysis can help art scholars by expanding traditional non-automatic approaches. Then, we refer to some debates and issues in art scholarship that have been aided significantly—and in some cases solved—thanks in large part to computer methods. The article then turns to broad problems in image-based intelligence that arise in art analysis that are inadequately addressed by mainstream AI research and hence present great opportunities for research. The article concludes with thoughts about future directions.

Computer-based Tools for Image Analysis of Art

Computer image analysis has automated several tasks traditionally performed by connoisseurs and art historians, such as the analysis of composition in landscapes,²² of brushstrokes and other marks,²³ of canvas weave,²⁴ lighting,²¹ and general properties of style.⁹ Such tools do not replace the connoisseur but instead empower art scholars and extend the analyses they can perform, including at scales impractical for efforts solely “by eye.” These digital tools, when used with an awareness of the art historical context, can provide rigor and consistency to analyses and enable the analyses of large-scale trends.

Consider just one example, the analysis of pose in portraiture—a formal property that artists set for numerous compositional and expressive ends.³⁹ Art scholars have traditionally used rather subjective, informal, and coarse descriptions of portrait poses—such as frontal, profile, or three-quarters view—which they determine entirely by eye. Such traditional methods scale poorly and preclude detailed analyses of thousands of portraits, as might reveal trends in portraiture over centuries or even over a single prolific portraitist’s career.

Computer image analysis offers a powerful tool for portrait pose analysis. Figure 1 shows how a pose can be described by three rotation angles as well as a “production model” of the projection of a generic head onto the artist’s picture plane. Two deep neural network (DNN) methods—Perspective-n-point¹⁷ and Fine-grained Structured Aggregation network (FSA-net)⁴⁰—can estimate these pose angles automatically based on the locations of extracted visual keypoints, such as the corners of the eyes, mouth, tip of the nose, and so on. These algorithms prove remarkably robust to variations in artistic style.

Figure 1. (L) The head orientation of a subject’s pose can be described by rotation angles about three perpendicular axes—yaw, roll, and pitch. (R) We model the portrait as a projection of the subject’s head onto the picture plane of the artwork. The pose angles are computed by a deep neural network from the geometric configuration of extracted visual keypoints in the artwork. Even highly expressive and non-realistic portraits can be analyzed in this way, as we shall see in Figure 3.

Such automated tools permit the rapid and accurate analysis of large corpora of portraits. Figure 2 shows computed distributions of the absolute value of the roll angle of 11,000 portraits from a half millennium of Western canon and Japanese art, here grouped by art movement. Yaw, tilt, and measures such as the location of the head within the frame of the artwork can be similarly estimated, all within four minutes on a desktop computer.⁴

Figure 2. Box-whisker plots of the computed absolute value of roll angle in 11,000 portraits, grouped by art movement. Notice especially the large difference between the means and variances for Japanese Ukiyo-e and so-called Naïve or Primitivist portraits, as illustrated in Figure 3.

There is rich information to be interpreted in this, and related plots, but consider just the differences between Japanese Ukiyo-e (“Floating world”) art and so-called Naïve or Primitivist art. The box-whisker plots in Figure 2 confirm that Ukiyo-e portraits often exhibit large rotation angles associated with the mie poses of kabuki actors in dramatic moments in plays, or of geishas in seductive poses. By contrast, in Naïve or Primitivist portraits, poses are rather “simple,” favoring direct forward gaze. Such pronounced differences are evident in the representative works in Figure 3.

Figure 3. (L) Portraits from the Japanese Ukiyo-e period, which often depict actors in dramatic mie poses in key moments of kabuki plays or geishas in fluid or occasionally erotic poses. (R) By contrast, portrait heads in so-called “Naïve” or “Primitive” paintings of the Western canon are nearly always frontal and vertical. Such information, and much more, is summarized in box-whisker plots, such as in Figure 2. Image credits (left, clockwise from top left): Toyohara Kunichika, *The Actor Ichikawa Sadanji I as Akiyama Kii No Kami*, (1894), Art Institute of Chicago (CCo); Hashiguchi Goyou, *Woman Combing Her Hair*, Detail, (1920), Walters Art Museum (CCo); Utagawa Toyokuni III, *Mitate Sanko No Uchi*, (1858), Walters Art Museum, (CCo); Masatsuya, *Otani Oniji III as Ono Sadakuro*, Art Institute of Chicago (CCo); (right, clockwise from top left): Pablo Picasso, *Les Demoiselles D’Avignon*, (1907), Museum of Modern Art {{PD-U.S.}}; Henri Rosseau, *Myself*–Portrait-Landscape, (1890), National Gallery, Prague {{PD-U.S.}}; Amedeo Modigliani, *Young Woman*, (1918), National Museum, Oslo, Norway {{PD-U.S.}}; Henri Matisse, *The Italian Woman*, (1916), The Solomon R. Guggenheim Museum {{PD-U.S.}}.

Artists often take characteristic bodily stances in the studio when executing self-portraits. For instance, a right-handed artist will place the easel to the right of the plane mirror so as to better reach it with the dominant hand, and the artist’s head is often rotated somewhat toward the canvas. The head of a left-handed artist will be rotated in the opposite direction. The distribution of (signed) yaw angles will differ between right- and left-handed self-portraitists; indeed, such differences are evident in the computational data. That is, the computational results reveal which artists were likely left-handed.⁴

The automated pose-estimation algorithms extend previous work, providing a foundation for a range of future studies in art analysis.¹⁸ For instance, one can incorporate face-based gender recognition as a pre-processing step to explore gender-specific trends in poses.²⁵ Likewise, with possible style-based tailoring of face-recognition algorithms, one might explore trends in non-Western art, particularly Asian art, such as portraits from the Song Dynasty⁴¹ or from the Renaissance.¹³

Deep neural networks for art analysis. Deep networks developed for natural photographs have been modified through their architectures and transfer training to perform well on segmentation of art images throughout a wealth of styles and media—a task vastly more difficult than in natural images.¹⁴ Deep networks trained with large corpora of art images and contributed textual summaries of human responses can accurately predict human emotional responses to novel artworks.¹ Deep networks have been applied to the extremely challenging and important task of art authentication.⁸^,²⁸ Most of these methods address just image information and thus have not been accepted by the art community because authentication also rests on studies of materials (pigment, canvas, and so on), provenance (documentary record of ownership and display), iconography, and more. As such, there are deep challenges to AI for incorporating such diverse forms of information for authentication, which will be discussed again later in this article.

Computer Methods in Resolving Debates in Art Analysis

Rigorous computer image analysis has not merely served as a tool easing traditional interpretive tasks but has also resolved art historical debates for which methods from traditional art scholarship proved incomplete or inadequate. In 2000, the celebrated British and American artist David Hockney, later aided by thin-film physicist Charles Falco, claimed to have “proven” that some artists as early as 1420 secretly used optical devices during the execution of some of their works, and more broadly that the use of optics led to an enhanced realism or “optical look” of the “ars nova” or new art of that time.¹⁵^,¹⁶

Rigorous computer image analysis has not merely served as a tool easing traditional interpretive tasks but has also resolved art historical debates.

Part of the evidence they adduced rested on Hockney’s claim that the complex chandelier in Jan van Eyck’s Arnolfini portrait was “in perfect perspective,” which (Hockney claimed) implied van Eyck used optics to draw it. Sophisticated homographic analysis of the image of the chandelier, based on an ACM Dissertation Award-winning thesis,⁶ showed that in fact the chandelier deviated significantly from “perfect perspective,” thereby rebutting the optical proposal, at least for that painting.⁵^,⁶^,²⁹

Hockney also adduced Georges de la Tour’s Christ in the Carpenter’s Shop as evidence that this artist used optical projections. In brief, his claim rests on his reading of the work that the light source was “outside the [frame of the] picture.” Application of sophisticated and rigorous occluding-contour algorithms and maximum-likelihood estimation of the location of the source of illumination based on five classes of image information showed definitively that Hockney’s claim was false and that instead the source of the illumination was at the depicted candle. Such rigorous methods thus refuted Hockney’s argument.³⁶

Hockney and Falco claimed their most definitive evidence for Lorenzo Lotto’s putative use of optics arose in Husband and Wife, where they claimed geometric and other “anomalies” in the depicted carpet “proved” that Lotto used a small, concave mirror to project an image during the execution of this work.¹⁶ Rigorous analysis using computer ray-tracing software showed instead that the proposed setup simply could not work as claimed. Moreover, when the required corrections to their setup were made, the computer ray-tracing evidence contradicted the optical claim.²⁷

Broadly speaking, computer image analysis supported and confirmed the unanimous independent scholarly rejection of Hockney’s theory.³⁵ Indeed, partially in response to the rigorous computer-assisted analyses, Hockney himself has retreated from his projection claim. Similarly, engineer and self-described non-artist Tim Jenison argued that Johannes Vermeer secretly used a catadioptric telescope during the creation of his sublime interior genre paintings,²⁰ but computer-enhanced analyses (and facts of historical and material culture) refuted Jenison’s claim.³¹^,³²^,³⁷

Problems in Art Analysis that Resist Techniques from Mainstream AI Research

Most traditional AI algorithms have been developed for analyzing natural photographs, videos, and specialized images such as medical x-radiographs. These images have properties that, at base, derive from the laws governing the natural world, be they everyday physics of objects and light, medical or remote images governed by electromagnetic radiation, and so on. Art images differ in numerous ways from those just listed and present several deep challenges that are inadequately addressed by current mainstream research, including the following:

Style. There is no unanimous agreement concerning the definition of style in fine-art paintings. Nevertheless, style certainly refers to formal properties of color, composition, marks, brush strokes, and so on considered as distinct from the nominal subject matter. Portraits, to take just one genre, can be rendered in styles that are highly realistic, expressive, or abstract, in unnatural colors executed in a wide variety of marks and brush strokes and innumerable personal styles. The variety of styles found in paintings is vastly greater than that found in the natural photographs that have commanded the attention of the AI community.

Style presents several challenges to computer analysis, for instance the recognition of style and identifying an artist (“author”) by the style of a work: image segmentation, object recognition, and scene analysis. Works such as shown in Figure 4 present great challenges in these regards—challenges to human viewers and algorithms alike.

Willem de Kooning’s Woman I (1950–52), Museum of Modern Art, illustrates Abstract Expressionism’s emphasis on bold visible brush strokes, unnatural colors, and distortion of form, and more. — Figure 4. Willem de Kooning’s *Woman I* (1950–52), Museum of Modern Art, illustrates Abstract Expressionism’s emphasis on bold visible brush strokes, unnatural colors, and distortion of form, and more. Such a work presents deep challenges in visual analysis to both human viewers and algorithms alike.

Small data sets. As mentioned earlier in this article, an important recent development in image analysis is the use of DNNs trained with hundreds of millions or more natural photographs. Such approaches have provided human-level or superhuman-level performance on several image-analysis tasks. Unfortunately, such systems generally perform quite poorly on stylized paintings and drawings, for instance the work in Figure 4.

The rather obvious approach would be to train deep networks with large corpora of representative art images. Alas, this approach generally cannot be followed directly because there simply are not enough representative art images. After all, the Spanish master Pablo Picasso—one of the most prolific artists of all time—leaves us merely 13,500 paintings. Johannes Vermeer leaves us just 33, each considered a distinctive masterpiece, such as shown in Figure 5.

Johannes Vermeer’s A Glass of Wine (1658–60); (Photo © José Luiz Bernardes Ribeiro /CC-BY: 4.0), Gemäldegalerie, Berlin. — Figure 5. Johannes Vermeer’s *A Glass of Wine* (1658–60); (Photo © José Luiz Bernardes Ribeiro /CC-BY: 4.0), Gemäldegalerie, Berlin. Vermeer leaves us roughly 33 paintings—far too few to train deep networks to recognize the master’s works reliably.

One way to overcome the relative paucity of art images is to transfer-train a DNN, that is, present art images as additional training patterns to a network previously trained with a very large number of photographs. This approach has proven of only modest value on problems such as image segmentation and is unlikely to provide significant benefit on more challenging problems, such as scene analysis. An alternate approach has shown success for the problem of image segmentation in artworks. We used deep networks to map artistic style onto photographs to thereby produce a large corpus of images of modern subjects rendered in artistic styles of artists. Training with this large corpus of “surrogate artworks” yields highly accurate performance on tasks such as segmentation.¹⁴ Nevertheless, much research remains to be done.

Imaginary objects. Non-realist artists frequently depict imaginary objects or creatures, such as angels, dragons, and so on. Artists from the Dada Movement of the early 20^th century, such as Salvador Dalí, Marcel Duchamp, Man Ray, and René Magritte, frequently altered the properties of objects for surprise. The fact that these objects are unusual or even unique is often central to the artist’s expressive goals, of course. The fact that such objects are rare or unique presents a great challenge to AI systems for recognizing such objects and interpreting their associated artworks. These objects do not appear in large corpora of photographs typically used for training networks for image analysis. Perhaps one approach would be to develop modular deep networks that can flexibly decompose an image passage into components and styles that are rarely found in images.

Nonphysical conventions. Because artists are not constrained to slavishly depict the physical world, they can employ nonphysical conventions in service of their artistic goals. Thus, they can depict figures floating into the heavens, bulls that swim, infants that glow, and so on, as exemplified in Figure 6. The computational problem here is to recognize the non-physical convention given that training sets of natural photographs do not depict scenes in such conventions.

Michelangelo’s The Last Judgement from the Cistine Chapel (1512), The Vatican. — Figure 6. Michelangelo’s *The Last Judgement* from the Cistine Chapel (1512), The Vatican {{PD-U.S.}}.

The development of a work, as revealed by its multiple layers. Many works of art, in particular Old Master easel paintings, were developed through underdrawings, revisions, and so on. These hidden layers are revealed through X-ray, infrared, and other imaging methods. Art scholars seek to understand an artist’s praxis and artistic intent by studying the changes in composition and other formal properties. For example, Figure 7 shows an X-ray and the visible image in the central passage in Rembrandt’s The Night Watch. Careful examination reveals numerous differences—large and small—between these two versions, and these are studied closely by art scholars crafting interpretations.

Rembrandt’s The Night Watch (1642), detail, Rijksmuseum. — Figure 7. Rembrandt’s *The Night Watch* (1642), detail, Rijksmuseum. (Left) X-ray showing underdrawings and pentimenti, the earliest stages of the composition. (Right) The final painting in visible light. {{PD-U.S.}}

The computational task is to detect, represent, analyze, and ultimately interpret such changes, which have no counterpart in the vast number of natural photographs that dominate traditional AI research.

Abstraction. Abstraction is an extremely important genre of art, spanning a great variety of forms and styles (See Figure 8), and one for which natural photographs have little or no relevance. The challenges to AI research include recognizing artists (“authors”) by their abstract works,¹⁹^,³⁰ tracing the development of abstract styles throughout an artist’s career (and thus helping to establish the execution date of works), and related problems.

Morris Louis’ Dalet Kaf (1959), Modern Art Museum of Fort Worth, TX. — Figure 8. Morris Louis’ *Dalet Kaf* (1959), Modern Art Museum of Fort Worth, TX. This work from the Washington Color School was executed by pouring layers and layers of dilute paint onto the canvas draped over supports so that the paint would flow and stain the canvas.

Recovering lost works. A great deal of fine art, including some of the most important and influential artworks ever created, has been lost to war, fire, flood, iconoclasm, theft, and other reasons.³ For example, Diego Velázquez’s Las Meninas is often considered one of the most important paintings of the Western canon; it barely escaped a fire in the Alcazar Palace in 1743. Alas, this artist’s Expulsion of the Moriscos, which in his day was even more highly praised than Las Meninas, was completely burned in that same fire. Recovering the image of such a work would be of immense value to art history and to our cultural heritage more generally. It would provide deeper insight into Velázquez and his oeuvre, the cultural environment of the Spanish Golden Age, provide insights into artists who were influenced by the lost work, and much more.⁷

Proof-of-concept computational reconstruction of Diego Velázquez’s (lost) The Expulsion of the Moriscos (1627). — Figure 9. Proof-of-concept computational reconstruction of Diego Velázquez’s (lost) *The Expulsion of the Moriscos* (1627).

Computational methods based on DNNs have shown promise in recovering properties or portions of lost artwork, and thus show such as the colors in paintings by the Austrian artist Gustav Klimt,²⁶ and of ghost paintings² and missing passages of Rembrandt’s The Night Watch.¹⁰ Computational reconstruction of a full image of a lost artwork would require a sophisticated integration of information in diverse forms: preparatory sketches and other works by the artist, copies by other artists, textual descriptions of the work, knowledge of the author’s working methods and media (available pigments and drawing tools), the likely date of execution, and more, as shown in the proof-of-concept computational reconstruction in Figure 9.¹¹

Semantics. Much of human communications is indirect or non-explicit, and so only relatively narrow forms of intelligence can be captured in datasets of real images, for instance. While one of the core purposes of artwork was for the artist to convey a complex and indirect meaning, verbose descriptions for which accompany virtually every masterpiece.

As a result, perhaps the deepest and most challenging class of problems posed by art that is not adequately addressed by current traditional AI research concerns semantics, that is, deriving coherent “meanings” associated with works.

Mainstream AI approaches to semantic image analysis seek to form text summaries, such as captions, of a photographic image. The semantic problems arising in art are rather different. Here the goal is to infer the artist’s meaning or intention, for example what message, moral, story, or abstract idea the artist seeks to convey. Consider, for instance, René Magritte’s celebrated The Treachery of Images in Figure 10. Mainstream semantic analysis would recognize the pipe, and the text, and presumably detect the contradiction, but not infer why the artist created this work and how this relates to the fact that this is a painting of a pipe, a painting of text, and much more. Preliminary results using deep networks extract components of meaning in religious art.³⁴

René Magritte’s The Treachery of Images (1928), © René Magritte Fair Use, Los Angeles County Museum of Art (LACMA). — Figure 10. René Magritte’s *The Treachery of Images* (1928), © René Magritte Fair Use, Los Angeles County Museum of Art (LACMA). The many readings and interpretations of this Surrealist work include a recognition of the contradiction between what is being depicted and what is stated through text, the fact that “this” could refer to the image or to the artwork or to the sentence itself, and much more.

Conclusions, Opportunities, and Future Directions

Fine art paintings and drawings represent some of the most carefully constructed, memorable, complex, and important images of any form, and they present deep challenges to computer vision and artificial intelligence that in many cases are not addressed by mainstream research focused on natural photographs, medical and remote sensed images, and others.

Ongoing research in the analysis of art is building upon the vast store of algorithms and knowledge from mainstream computer vision, deep learning, and artificial intelligence. Continued progress demands an integration of humanists’ knowledge of art historical facts and contexts as well as on computer scientists’ knowledge of algorithms, and creativity in tailoring them to problems in art. Such a research program promises to empower humanist scholars and shed light upon some of the most sophisticated and fascinating aspects of intelligence—human and machine.¹²^,³³

Acknowledgments

The author wrote much of this article while an External Reader at the Library at the Getty Research Institute, Los Angeles and as a Leonardo@Djerassi Fellow in Residency at the Djerassi Foundation, Woodside, CA, and would like to thank these institutions for their support.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Computer Vision, ML, and AI in the Study of Fine Art

View in the ACM Digital Library

DOI

10.1145/3633454

May 2024 Issue

Vol. 67 No. 5

Pages: 68-75

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Nov 8 2024

The Importance of Robust Documentation in Software Development

Alex Williams

Computing Profession

BLOG@CACM Nov 4 2024

The Gift That Keeps on Giving to Apple and Google

Saurabh Bagchi

Computing Applications

people holding dollar signs stand in line before a giant mobile phone, illustration

BLOG@CACM Nov 1 2024

Computational Thinking: The Idea That Lived

Shuchi Grover

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Key Insights

Computer Vision in the Study of Fine-Art Paintings and Drawings

Computer-based Tools for Image Analysis of Art

Computer Methods in Resolving Debates in Art Analysis

Problems in Art Analysis that Resist Techniques from Mainstream AI Research

Conclusions, Opportunities, and Future Directions

Acknowledgments

Computer Vision, ML, and AI in the Study of Fine Art

DOI

May 2024 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.