Martin Langner, Introduction to Digital Image and Artefact Science (Summer Semester 2021)
III. Analysis:
Lesson 6. Digital Image Analysis (https://youtu.be/lqt0f8euK4Q)

[1]	Introduction
[2]	  Content of this lecture lesson
[3]	1. The Art-historical Method: Comparing and Arranging Images
[14]	2. Image Content and Form
[15]	  a) Iconography and Image Pattern Recognition
[37]	  b) Compositional Analysis
[39]	  Form and Line
[41]	  Colour
[56]	  Perspective and Light
[60]	  Arrangement
[64]	  Picture Quality
[66]	  c) Artist Attribution
[73]	  d) Reconstruction and Restoration
[78]	3. Pictorial Effect and Reception
[79]	  a) Iconology
[83]	  b) Reception
[88]	  c) Cultural Analytics
[94]	Conclusion: Dangers of Automatic Image Recognition
[101]	  Current Research Questions
[102]	  What you know and what you should be able to do
[105]	  Literature

[1] Dear participants of the lecture "Introduction to Digital Image and Artefact Science", I would like to welcome you to the 6th lesson. After we have dealt mainly with the acquisition of images and objects so far, today we will move into the centre of our science and deal with digital image analysis.

[2] The methods must not dictate the questions. Therefore, I would like to base my overview of digital analysis possibilities on the traditional research methods that have been established for a long time. First, there is the art historical method of grouping, ordering, comparing and interpreting images. Traditionally, iconography, form analysis, as well as methods of attributing artists and reconstructing pictorial works are used to analyse pictorial content and form. And in order to examine the effect and reception of images, the image sciences have developed iconology and reception research. Cultural analytics have been added as a new field here.

3] As we learned in the first lesson, for John Unsworth, comparison is one of the Scholarly Primitives, i.e. the basic principles of scientific work. It is a fundamental aspect of what humanities scholars do regardless of their particular specialisation or object of study. Art historical work in particular consists to a large extent of image comparison. In pre-digital times, therefore, slides were sorted on light tables and comparative images were mounted on a slide. On long tables, the pictorial works were laid side by side in the original, in photographs or in publications. Photocopies were cut out and pasted on index cards so that they could be sorted again and again. And in lectures, a scientifically relevant statement can already emerge in the compilation of suitable comparison pieces. Of course, all this is also possible with the aid of digital tools.

[4] A quite practical browser-based tool was developed in 2017 at the Universidade Federal Fluminense, the Frick Art Reference Library and New York University. It works best with Chrome or Edge. Like a light table, ARIES, the ARt Image Exploration Space, lets you move images around at will, group them as desired and compare them visually. In addition, advanced image comparison and feature matching functions are supported, which are only possible through computer-aided image processing. To do this, one uploads a series of images, ...

[5] ... which then appear in the image menu and can be moved to the workspace.
It is very helpful that the active image is transparent so that it can be easily superimposed on another. This makes it very easy to select and group images.

6] Of course you can also confront pictures directly with each other ...

[7] ... and have digital matches between two image sources displayed. Such a superimposed comparison can also be done with an image processing programme, but it is much more time-consuming.

8] The so-called lens, with which you can project a detail from one image onto the other, is also very helpful.

9] Aries is also a good annotation tool. However, I am not quite sure if it is used in large collaborative projects. For students without experience with databases and programming languages, it can certainly achieve quite good results.

10] Once the metadata has been imported or entered, it can also be used for the image display.
A size comparison of the images loaded into the workspace is very practical and creates some unexpected insights, ...

[11] ..., as does a display as a timeline. So ARIES is quite useful and recommendable as a tool for image scientists who are not yet so experienced in digital image science. What bothers me a bit is that it can only be used online and images have to be uploaded to a foreign cloud. For our 50,000 vase images, that would be out of the question. A connection to large image databases would also be very desirable. As a state-of-the-art suggestion for my own projects, I find ARIES exemplary.

[12] Large photo archives are faced with the task of cataloguing the artworks they document, updating changes in attribution or ownership, and merging duplicate photos and entries into a single artwork. They use databases to do this, which we talked about in the last lesson. But finding duplicates in these large data collections is time-consuming to almost impossible. But this is necessary if you want to evaluate the data quantitatively.
Computer vision can help here, i.e. machine methods of visual image acquisition, as John Resig has shown. With the help of TinEye's MatchEngine, he carried out an image similarity analysis of photos in the Frick Photoarchive, which were listed there as "Anonymous Italian Art".
First of all, there is the finding of image details. This happens frequently, in our case the clipping was taken manually from the other file, and therefore gave good results.
The image similarity search can also help to determine the provenance of artworks, for example if a single drawing at Harvard can be assigned to a manuscript that was reproduced in a Christie's auction catalogue in 1936, at that time not yet cropped. Since the artist, genre and provenance are unknown, a traditional image database lacked any clue with which matching could succeed.

[13] The same applies to finding duplicates, copies and image citations. In a picture archive that has existed for over a hundred years, the photos often show different states, e.g. before and after restoration. In the case of Our Lady on the left, chipped paint has been added, the frame removed and the obviously later crowns removed.
There are also later copies of artworks that can easily be found using computer vision: Here, for example, the positioning of the children has been changed, some children at the bottom of the work have been omitted and the candelabra in the background has also been altered.
The use of image similarity analysis in a large photo archive could not only completely change the way it is catalogued. It could also take the art historical discussion of similar pictorial works to a whole new level. Because while with ARIES we still had to upload the images ourselves for comparison, an image similarity search can manage the pre-selection on its own.

14] Digital methods in archaeology and art history are faced with the problem of trying to acquire and evaluate images through numerical processes. This is not possible without formalisation, i.e. abstraction of the spectrum of meaning and reduction to the form and structure of the works. On critical examination, therefore, many digital projects unquestioningly follow the premises of form-analytical procedures, which could be accused of using now outdated methods of counting folds and curls, typology or, at best, iconography. This is why the relationship between image, text and number in digital realisation is currently being discussed anew. For digital procedures must also be evaluated according to whether they can keep up with the analogue tools and methods that have developed and proven themselves historically. This applies to historical-hermeneutic methods and the associated contextualising evaluations as well as to semiotic, media-theoretical and discursive methods.

[15] In this lesson, we would therefore like to start from the established procedures of image science and contrast them with the corresponding methods of computer science. One of the main problems of image analysis is the determination and interpretation of motifs in pictorial works, i.e. the question of who or what is actually depicted. Is the so-called Mona Lisa, for example, Isabella of Aragon?
To answer this question, the visual sciences have developed the method of iconography, in which the subject of the picture is described in detail, followed by the factual identification of the individual pictorial elements and the classification of its motifs and themes. As a rule, this is done by comparison with verified representations or by analysing the spectrum of representation.
These regularities in the use of certain motifs or details could also be called patterns. Pattern recognition as a machine learning method actually does nothing other than automatically determine regularities, repetitions or similarities in data through the use of computer algorithms. Again, the discovery of these patterns is used for actions such as classifying the data into different categories, our image motifs.

[16] We have already talked about pattern recognition in image databases, where image pattern recognition is used to support keyword-based search. And you may also be familiar with this when uploading photos to the internet, where automatic object recognition suggestions are immediately made and saved.
Humans can recognise a wide variety of objects in images with very little effort, even though the image of the objects may vary somewhat when viewed from different angles, at many different sizes and scales, or even when moved or rotated. Objects can even be recognised when they are partially invisible. This task is still a challenge for computer vision systems.

[17] The basis of object recognition is Artificial Neural Networks. Artificial Neural Networks as a branch of machine learning are computational models (essentially algorithms) that mimic the behaviour of a human brain when processing data. Just like the networks that connect real neurons in the human brain, artificial neural networks are made up of layers. Each layer is made up of a series of neurons, all responsible for recognising different things. Input layer and output layer are familiar to us. But what exactly is going on in the self-learning hidden layers we do not know.

[18] Machine learning is the ability of computers to use new capabilities without direct programming. In practice, it is algorithms that learn from data as they process it and use what they learn to make decisions. Machine learning methods are used to exploit the possibilities hidden in big data.
A basic distinction is made between three machine learning methods: In supervised learning, the machine learning model learns by example. This means that the data for a supervised machine learning task must be annotated beforehand (and with the correct, fundamental truth class).
For example, if we want to build a machine learning model to detect whether a certain type of vessel is mapped, we need to annotate the model with a set of annotated examples. Given a new unseen example, the model predicts its learning outcome - e.g. for an amphora 1 if an amphora is mapped, and 0 otherwise).
Unlike supervised learning, unsupervised learning models learn themselves through observation. The data provided for this type of algorithm is unlabelled (no ground truth value is given to the algorithm). Unsupervised learning models are able to find the structure or relationships between different inputs. The most important type of unsupervised learning techniques is "clustering". In clustering, the model creates different clusters of inputs based on the data (where 'similar' inputs are in the same clusters) and is able to put each new, previously unseen input into the appropriate cluster.
Reinforcement learning differs in its approach from supervised or supervised learning. This is because the algorithm plays a 'game' in which it aims to maximise the reward. The algorithm tries out different approaches or ’moves' by applying try and error and finding out which one brings the most success. Reinforcement learning can be used to train multiple AI systems against each other. The best-known use cases are solving a Rubik's cube and playing chess or Go, but reinforcement learning encompasses more than just games. It is particularly suited for unique classification tasks in large unstructured datasets.

[19] Object recognition involves finding and identifying the object (or set of objects) from a predefined set of classes in an image or video sequence. To solve this task, the supervised learning method is usually used, a set of proposals is created and each of them is classified using a neural network that has been designed and trained specifically for this purpose. For this purpose, a neural network is fed with thousands of annotated photos during the training phase. In our example, there are different vessel shapes that are to be learned. This means that the machine learning model sees this data and thereby learns to recognise patterns or determine which features are most important in the prediction.
In addition to the set of training data, validation data is also needed. This is used to tune model parameters and compare different models to determine the best ones. The validation data should be different from the training data and should not be used in the training phase. Otherwise, the model would be overfitted to the new (production) data and would be difficult to generalise.
A third, final set of tests (often referred to as a "hold-out") is also always needed for verification. It is used once the final model has been selected to simulate the behaviour of the model on completely invisible data, i.e. data points that have not been used to build models or even to decide on a model. To be able to train a neural network, one therefore needs three different data sets at the beginning. Each instance should be present approximately equally often in each data set. This makes it clear that such object recognition systems, which require more than a thousand annotated images for training, are only suitable for tasks in which many thousands of images are to be classified.

20] For computer vision tasks, Convolutional Neural Networks are usually used, which employ the mathematical operation of convolution. Very simplified, one could say that convolution multiplies two functions describing the data, producing a set of related, e.g. mirrored or stretched, values. Convolutional Neural Networks have been a major breakthrough in computer vision tasks, but they have also proved very useful in Natural Language Processing problems.
In object recognition, Convolutional Neural Networks are now used for both proposal search and classification. Both networks (Region Proposal Network and Classification Network) share a large portion of their parameters, which speeds up the training process and increases the accuracy in both candidate region selection and final prediction and evaluation.

[21] So how does a neural network recognise a vase shape in a photo when the pre-trained network is shown a non-annotated photo?
A neural network processes data sequentially, which means that only the first layer is directly connected to the input. The neurons on this first layer respond to various simple shapes such as edges.
All subsequent layers or strata recognise features based on the output of a previous layer, allowing the model to learn increasingly complex patterns in the data as the number of layers increases. When the number of layers increases rapidly, the model is often called a deep learning model. It is difficult to determine a specific number of layers above which a network is considered deep. Ten years ago there were three, today there are about twenty.
In the last layer, the top layer, neurons respond to highly complex, abstract concepts that we would call vessel shapes.
But a neural network does not identify the object to be recognised, in this case the vase shape. The network merely predicts, based on its training data, which vase shape is most likely to be represented. In our example with a hit rate of 87%. Networks trained on a large basis are very reliable and now achieve an error rate of less than 3%.
Computational neural networks are therefore extremely useful for searching for similarities in large amounts of data. However, we do not know what exactly happens in the different layers. And also the results are only probability values, not definite determinations.

[22] Automatic pattern recognition helps to index a large amount of image data and find structural similarities. For example, if you search for the motif "Capture of Peter" in the Prometheus image database with the help of the automatic image search, you will find an astonishing number of iconographically correct hits in the image data, which are marked in green here. This makes it possible to search for motifs with the help of images. For me, it is even more interesting to find similar compositions that have a different meaning. For in this way one learns something about the use of figure types in the sarcophagus workshops of late antiquity and the survival of pictorial models.

23] The Sachsenspiegel, the oldest law book of the German Middle Ages, has come down to us in four illuminated manuscripts, some of them gilded, with countless drawings. To make these searchable, a pool of general postural motifs was created offline as classifiers. Here, for example, you can see the comparison of reclining persons in a training tool that makes it possible to exclude false hits. Thus, for online searches in unlabelled datasets, it was possible to achieve a high recall in a very short computation time.

[24] Not only postures, but also gestures were classified in this way. And while the extended finger with an otherwise closed hand or the open hand with bent fingers could be easily extracted as gestures, the semantic context understandably caused difficulties.
This is particularly easy to understand when the outstretched finger taps another person or the open hand holds a book. Here, image pattern recognition would not only have to recognise the specific arrangement of the pixels in relation to each other, but would also have to have learned the depicted motif context. This shows how difficult it is for computer vision to automatically gain a comprehensive understanding of the image content from digital images, or in other words: how much work it takes to reproduce the ability of human vision.

25] For this reason, significant misconceptions of the formal similarity incorrectly recognised by the trained neuronal network are repeatedly circulating on the Internet, and there's at least one, which I can't withhold from you. For only our large experience from everyday life is capable of recognising the situation depicted here as make-up in front of a mirror, which is why we can easily understand the white spots as light reflections from invisible lamps. For the computer, however, one thing is certain: Batman returns!

[26] This search for formal similarity relies solely on image pattern recognition methods. In combination with text mining, however, individual scenes can also be compared at the level of semantic similarity. For example, the Visual Geometry Group at the University of Oxford has examined the Bodleian Ballards. The printed English ballads of the 16th to 20th centuries were often accompanied by woodcuts that succinctly summed up the content. You can search for these and thus link different ballads on the level of content.
On the given page you can try it out for yourself.

27] A similar online database is 15cILLUSTRATION on printed illustrations from the 15th century. It too can be used with both metadata search and visual search.

28] Object recognition in photographs now works quite well and many search engines make use of it. This is much more difficult with artworks, as there is simply not enough annotated source data to train neural networks. However, since human vision is able to recognise real-world objects in paintings by recognising and abstracting the style of the time or artist, attempts to train neural networks in the same way are promising and to increase the training data in such a way that one mixes artworks and photos of real-world objects during training or lets networks trained on photos continue learning on the artworks.

[29] Crowley and Zisserman were able to show that convolutional neural networks trained on Google Images could identify a large number of previously undetected object categories (such as cow or beard) in a dataset of 210,000 paintings.

[30] The method is described in detail in Elliot Crowley's dissertation. His project achieves pleasingly high hit rates, even with very different looking subjects, precisely because it has been trained on countless photographic images of the objects. Even very small objects that are easily overlooked are recognised in this way. This lays the foundations for computer-aided iconographic studies.

[31] Common training data for art historical image classification tasks are the image sets from wikiart and the webgallery of art, which also provide metadata with keywords for the individual paintings.

32] Based on extensive fine-tuning experiments on neural networks, the paintings can now also be assigned to genres. The ten genre classifications by Wikiart were used as a guide and the images annotated there were used, so that an astonishing stylistic range was available and could be trained for portraits, landscapes, genre scenes, still lifes, cityscapes, seascapes, nudes, flower and animal paintings as well as abstract paintings.

33] The hit rates vary depending on the genre, which is due to the fact that the degree of unambiguity is much higher in portraits, landscapes or non-objective paintings than, for example, in still lifes or animal paintings, where elements of the other genres also appear.

[34] Incidentally, the classification of paintings according to artists was similarly successful ...

[35] ... and landscape styles.

[36] In summary, one can say that Convolutional Neural Networks are very well suited for art-historical image classification tasks, if they have been trained for the various tasks in advance on photos of real scenes, ranging from the recognition of objects and scenes to the labelling of moods. Various aspects of image similarity can be analysed. In addition to object recognition, fine-tuning models of Convolutional Neural Networks also help to retrieve images with similar styles or similar content.

[37] Our examples have shown that neural networks are already relatively good at grouping and identifying image motifs, i.e. clusters of related pixels. This is very helpful for image search. But that says nothing about the artistic content. The special aesthetic value of a work of art lies, among other things, in the picture structure or composition. If, for example, as much space is left to the left and right of the Mona Lisa's head as the head occupies, the picture appears to the viewer to be harmoniously constructed. This structure can easily be abstracted and thus formalised as a pattern. Although compositional analysis is a formal problem that could be represented algorithmically, it is not easy to do so. A major difficulty lies in the fact alone that in Mona Lisa, for example, the outline of the hair is not exactly aligned with the drawn axes, so one must allow for a high degree of blurring in the assessment, and that elements other than axial symmetry, such as light-dark contrast, are also important for the composition of the picture. In individual cases, these questions can be answered more or less convincingly, but it is difficult to generalise. Nevertheless, in the following I would like to mention some basic elements of picture composition that could also be acquired by computer.

38] In the visual arts, composition refers to the formal arrangement of visual elements such as figures, trees, etc. in a pictorial work. Composition (pictorial structure) is not identical with motif (pictorial theme) or style (manner of execution). Composition is made up of the following elements: Shape and line, colour, texture, space, and arrangement and axes of vision.

[39] Let us first come to the line, which encloses forms in outline. Form refers to the geometrically or organically shaped areas defined by their outlines (edges) within a work. The planarity or plasticity of a work is determined by the distinctness of these edges. The relationships of the individual parts to each other are governed by the laws of proportion. In two-dimensional pictorial works, the illusion of a three-dimensional form is expressed by light, shadow, brightness and hue. The higher the contrast, the more pronounced the three-dimensional effect. Shapes with low brightness appear flatter than those with greater variation and contrast.

[40] A painterly design makes it difficult to automatically pick out the features that make up the painting from the contours. In the case of Perugino's painting, for example, which they see in figure c, an attempt at edge detection results in figure d, which is difficult to process further. Ommer and Bell therefore came up with the idea of using a tracing made by Johann Anton Ramboux to extract the appropriate features from the painting for an automatic similarity search

[41] The painterly effect of a painting thus arises particularly through the use of colour. The three features of a colour perceived by humans as fundamental include hue, colour brightness and saturation.
Hue simply refers to the respective colour, e.g. yellow or yellow-green. Their mixing ratio can be expressed in RGB or CMYK values.

[42] Colour brightness, on the other hand, describes how much light is reflected by objects of the respective colour and how we see it. The more light is reflected, the higher the value. White is the brightest, black the darkest. Among the bright colours, yellow, for example, has a high value, while blue and red have a low value. This becomes very clear when arranged in the Munsell colour model: each colour differs in brightness from top to bottom in equal steps. The right column, however, experiences a dramatic change in perceived brightness in the bright area, i.e. the steps appear to be larger than they are.

[43] If one converts a colour image into grey levels, only the brightness values remain. This important design element, especially in painting and drawing, enables the artist to create the illusion of light through brightness contrasts. In Claude Monet's famous painting, which founded Impressionism, the opposite effect is visible. The setting sun has the same colour brightness as the clouds and thus only becomes visible through the red hue.

[44] In addition to the hue, the colour brightness (read: the luminosity of the colours) plays an important role in assessing the aesthetic effect. To calculate the luminance distribution, for example, the average luminance histogram and the standard deviation thereof are used. For each pixel graphic, the colour and brightness distribution can be displayed in a histogram. Visualised as a coordinate system, the X-axis denotes brightness, with the axis origin meaning maximum darkness. The Y-axis indicates the number of pixels in the image. The coordinate system thus shows how many pixels have which colour value or brightness.

45] Saturation describes the quality of the colour nuance. A colour has a high saturation if it tends towards the pure colours (without mixing with white, black or grey). The purest colours with maximum colour saturation are the spectral colours. The opposite of high colour saturation is called greyness or dullness.

[46] Colourful images are considered attractive even if the content is not interesting. The colourfulness of an image can be calculated using the average chroma value. It is determined in relation to the brightness and saturation of a similarly illuminated area that appears white or transparent. Hasler and Suesstrunk suggest using the colour pixel distribution of an image in the CIELab colour space to measure chromaticity. According to this, colourfulness is a linear combination of colour variance and chroma value.

[47] Another important factor for image quality is colour harmony, i.e. a colour combination that is pleasing to the human eye and appears harmonious. In general, colour theory examines which colours are suitable for co-occurrence. This theory is based on the colour wheel, where purity and saturation increase along the radius from the centre outwards. In other words, the colour in the centre of the circle has the lowest purity and saturation. With tools such as the Colour Calculator, you can compose such harmonies yourself. Very common are complementary, monochromatic, analogue, split complementary, triadic and tetradic colour harmonies. Current learning-based approaches focus on optimising these parameters when training neural networks. However, computer-based approaches could also be used to study the colour spectra of painters diachronically or of an epoch.

[48] More recent work trains a classifier to calculate and predict complementary colours and uses colour contrast as an aesthetic evaluation criterion. They postulate that the foreground and background should have complementary colours in the optimal case to emphasise the main subject. You can see the result here for the best 50 Flickr photos and below that the 20 worst.

[49] If you look at Johannes Itten's colour theory, however, other contrasts come into consideration for analysing works of art, which the master at the Weimar Bauhaus recognised and systematically studied. They are based on the juxtaposition of two or more contrasting colours.
For example, when at least three pure, bright colours meet, as here the three primary colours magenta, yellow and cyan, this creates an intense colour-aspect contrast.
Light and dark colours next to each other create a light-dark contrast, particularly strong of course are black and white.
From our life experience, yellow, orange and red as the colours of fire are warm, while green and blue appear cold. That is why the contrast between blue-green and red-orange is called cold-warm contrast.
Complementary colours are opposite each other in the colour wheel, such as red and green, yellow and violet or blue and orange. This expressive complementary contrast was used above all by the Expressionists in their paintings.
If pure, colourful, bright colours such as pure red are juxtaposed with cloudy, broken, dull colours such as the equally bright grey, a quality contrast is created by the differences in colour quality.
The juxtaposition of many and few or large and small areas creates a contrast of quantity, like here the small, violet area on a large, yellow ground.
The sense of sight automatically produces the complementary colour of the colour wheel in the vicinity of a colour. For example, red appears simultaneously (or simultaneously) bluish in an orange environment and bright in a green environment.

[50] The German wikipedia article offers very nice examples from photography ...

[51] ... and European painting.

[52] To measure the contrast distribution, one can, for example, use the Fourier transform to calculate the sharpness based on colour, luminance, focus or edge sharpness. This produces a so-called Fourier series, i.e. a periodic function as a series in certain standard functions such as the rectangular or the sinusoidal function.

[53] This is because the intensity of a colour contrast also depends on the hardness with which the two colours meet. Here, the contrast (illustrated here using the example of a stripe pattern) is equal to the amplitude divided by the mean value of the intensity.

[54] Or to explain the phenomenon using three Picasso drawings: In a contour drawing with an abrupt change between light and dark, black and white abruptly meet like two rectangles, which the Fourier transformation would output as a series of rectangular functions.
If, however, the drawing has continuous, normally distributed transitions, the Fourier series results in a sinusoidal function. But if the draughtsman has combined discrete and continuous transitions, the graph of the contrast measurement resembles a sawtooth function.

55] The visual world is basically composed of the arrangement that appears real to our eyes, i.e. figures, trees, landscape with horizon and sky etc.. On top of these elements, secondly, there is a texture or surface shape that is changeable. The texture determines how an object feels (physical texture) or what its haptic qualities are (optical texture). Surfaces such as water, sand, wood or skin have a different appearance. Paintings, drawings, photographs and 3D models use optical textures to create a more realistic appearance. Texture is highly dependent on lighting conditions or, in the case of artwork, on the artist's style. As we saw, neural networks can be trained to identify these idiosyncrasies.

[56] Another important element of composition is space, that is, the area around, above and within an object. Every space has a three-dimensional extension in height, width and depth. With the rules of perspective, a 3D space can be convincingly depicted in two dimensions by implementing the optical distortion correctly in terms of perspective. Incidentally, the rediscovery of this vanishing point perspective goes back to Leonardo and his contemporaries.

[57] The observer's point of view plays the central role here. To choose an extreme example: At first glance, this room appears rectangular here. Only the persons are of different sizes. But this brings us to the realisation that we have to think of the floor plan as acute-angled.

58] Furthermore, the space is strongly defined by the light of the picture. In this montage of a Rembrandt painting by Günther Kebeck, it becomes clear how much weaker the modelling of the room would be if the radiant nimbus of Christ were left out.

[59] And here again the same phenomenon, this time with the addition of another light source.

60] The third thing that distinguishes a work of art from the real world is the successful composition of the picture. By arrangement (or composition in the narrower sense) I mean the formal arrangement of the pictorial elements, i.e. their positioning, orientation and harmony.
In Cezanne's work, for example, it is this triangular composition that symmetrically arranges the two groups of bathers and, together with the trees, frames the view of the city in the distance. Such visual axes lay the visual path through the painting that allows the eye to move within the work.

[61] Common rules of composition concerning symmetry and the preference of pictorial axes and diagonals, are the rule of thirds or the golden ratio. The latter has been considered a criterion for particularly harmonious composition since antiquity. For example, two line segments a and b are in the golden ratio if (a + b) / a = a / b = (1 + √5) / 2 ≈ 1.618. The Mona Lisa is a popular example of the golden ratio, regardless of whether one measures the length and width of the painting or draws a rectangle around the object's face. This could also be called visual weight balance or aspect ratio, which is why approximations to the golden ratio, i.e. dimensional ratios of 4:3 and 16:9 are particularly popular.

[62] For photographers, the rule of thirds, which divides the image section into 3 × 3 equal grids, is considered an aesthetic guideline for any photographic composition. The four intersections through the four dividing lines are preferred positions for the main elements of the image. Aligning the foreground subject at one of these intersections or on a dividing line is likely to make the composition more interesting than centring the main subject in the middle.
In addition, photographs can also measure focus, focal length and shallow depth of field, which are used to emphasise the main subject in the foreground while leaving the background out of focus. For this purpose, one can calculate the amount of blur, for example.

[63] A pleasing picture does not necessarily have to satisfy aesthetic criteria in all areas. Therefore, one tries to assess the aesthetics of individual regions and not of the entire image. Normally, therefore, photographs are segmented into individual regions before their quality is assessed.
Pere Obrador's research group developed a regional image assessment framework that includes measurements of sharpness, contrast and colourfulness. All of these region features combine to produce five segmentation maps on which exposure, size and homogeneity measurements can be made.

[64] Photo agencies are not only dependent on suitable motifs for image search, but also on high quality in terms of design. Therefore, algorithms have already been developed in the field to automatically pick out aesthetically pleasing photos.
People are the most frequent target in photography. Therefore, face recognition is often used to check whether there are people in a photo. If the recognised face area is larger than a quarter of the entire image, this is considered a portrait. With the help of a so-called Support Vector Machine, which effectively divides a set of objects into classes, the presence of animals in photographs can also be assessed. The classifier also divides the content into indoor and outdoor scenes and suggests 15 attributes to describe different general scene types.
Based on the face recognition result, one can calculate the aesthetic rating by assessing the size, colour and expression of the face. With the exception of average colour brightness, contrast, colour and size of the face, this also includes, for example, a smile as a positive quality feature.
For outdoor photographs, sky lighting attributes are used to measure the lighting that affects the perception of photographs. Photographs taken on a sunny day result in a clear sky, while photographs taken on a cloudy day result in a dark sky.
Two methods are commonly used to evaluate photo quality. Binary classification divides photos into beautiful and non-beautiful, while a rating scale ranks photos according to their attractiveness, usually between 1 and 10.
However, user surveys are also conducted to help select appropriate features. They give the photo six ratings: excellent, very good, good, ok, bad and very bad. If you have enough user ratings, you could also use them to train a neural network.

[65] For paintings, the assessment is not quite so simple. One approach builds a model for assessing aesthetic visual quality, based on an evaluation survey of the factors of colour, composition, content, texture/brushstroke, form, movement, balance, style, mood, originality and cohesiveness. Then, the survey data is used to train and test a neural network.

[66] Iconographic questions and analyses of the structure of the picture can be formalised relatively well, as you may have already assumed. That's why we spent a little longer on them. These methods are standards of art historical work. But with support of computational methods these tasks only make sense as Big Data approaches. However, when assessing the individual work other fundamental questions arise. In the case of the copy of the Mona Lisa in the Prado, for example, one would like to know: Who painted the picture? When and where was it painted?
 This brings us to the large field of stylistic research, which assesses the authenticity of works, assigns artists, checks dating and links the painting style to a workshop or an artistic tradition.
This large field, as far as textual sources are concerned, is also ploughed by literary studies. And from there, the field of stylometry, i.e. the measurement of style, has developed as a method that conducts investigations into linguistic and artistic style with the help of quantifying, statistical procedures. As mentioned, this includes the characterisation and comparison of the style of artists and individual works, their periphery, the style of the artistic genre, as well as the style of regions and epochs.

67] There are, for example, reproducing drawings of Michelangelo's frescoes in the Sistine Chapel, that were either made in his school from his preliminary drawings or later directly from the originals. For this purpose, Bell and Ommer again compared the drawings with the photos of the frescoes and were thus able to visualise the alterations, which - according to the result - always refer only to certain parts of the body. The solution to the question, however, cannot be found with computer-assisted methods alone. The interpretive approach is that the master himself will have adhered far less to his specifications than the copyists did, which is why we are perhaps looking at copies of the preparatory sketches in these drawings.

[68] The question of painter attribution often goes beyond the close observation of technical features. Van Gogh's typical brushwork is particularly suitable for this. For this purpose, the distribution and shape of the brushstrokes corresponding to certain surface and form boundaries were extracted. Several representative features such as orientation, length and width can then be calculated on the extracted features, once in an unsupervised neural network and once with annotated stitch sequences. In both cases, we succeeded in automatically calculating objective criteria of painter assignment. We also have a project at the Institute on automatic painter attribution, but on Greek vases, which I will report on in Lesson 12.

69] I also very much liked the approach of directly comparing van Gogh's extracted brushwork with that of other artists. This was particularly significant in cases where copies of Gogh's works are known, or at least works with very similar motifs.
Technically interesting are also the differences seen on the right between the original brushstrokes in the detail photographs, the automatically extracted ones and the manually annotated ones.

[70] Another approach follows the grouping of textons in a painting, i.e. of representative patches, and compares the shapes in texton histograms. The distribution of texton histograms in van Gogh's work differed not only from those of other contemporary painters, but also from his earlier works.

[71] The well-known American painter Jackson Pollock produces his paintings by dripping and pouring paint onto canvases lying on the floor. These visual forms, characterised by fractals, defy traditional art analysis. However, with computer assistance, Pollock's drip style can be broken down into four independent levels and then analysed automatically: from bottom to top: Background layer, irregular shape layer, line layer and colour drop layer. The style thus determined can then be applied to other paintings by Pollock.

72] The same researchers have also looked at Malevich, Miro and Kandinsky. Based on their analysis of Kandinsky's paintings and their reading of his "Theories of Abstract Art", they summarise a number of rules, for example that thin vertical and horizontal lines form the basis of his works. These are intersected by angled lines and dark contours are filled with light colour or: red and black always appear together to create a salient effect. They then use these rules to create their own new Kandinskys, or Malevichs or Miros.

73] The question of how a painting originally looked like is the first thing one will ask before any more precise interpretation follows.
In art historical terms, this concerns the field of reconstruction and restoration.
Technically, optical and chemical methods have been used for a long time, but digital tone value correction can also help here.

74] Digital multispectral analysis of the Mona Lisa helped to reconstruct her original colour. A multispectral camera produces 13 photographs along with measurements of the colour spectrum of all the points that make it up. A spectrum is the intensity of light waves reflected from a surface. For example, a white surface reflects light in a different spectrum than a red surface, which absorbs parts of the light. The light spectrum is thus precisely broken down from ultraviolet to infrared and distributed over 13 photographs. In our case, that was 240 million pixels and 22 GB of data, which was still a huge amount of data in 2004.
The true colours visible to humans only emerge when all the photographs are superimposed.

[75] For each pixel, 13 values were measured in this way, which can be compared as a gradient with the other values.

76] In order to determine the original colours, the influence of the 500-year-old varnish on the painting and the oil in which the pigments are dissolved must be measured and virtually subtracted. Experimentally, the colours used in Leonardo's time were applied to a plate and exposed to great heat, i.e. artificially aged. This colour change was in turn measured with multispectral analysis and the individual values transferred to each pixel of today's state, so that the original spectra could be calculated. The result resembles a tone value correction that subtracts the yellow-green cast, with the difference that the result here is much more accurate.

77] And to mention yet another optical process. When restoring Greek vases, UV light is often used to make post-antique overpaintings visible. The ancient fired clay does not reflect the black light at all. As a result, the surface would remain black.
With this volute crater, however, it was different. Not only the smeared fractures can be seen on the sides. Most of the figures also reflect the light and are therefore painted with secondary colours.
The restored state now shows the vessel again without the sherd in the middle, which was added in the 19th century and painted in a modern way.

78] The fact that after such a restoration the picture appears completely different brings us to the third and last part of the lecture lesson, which will be about the digital measurement of the picture's effect and reception.

[79] Erwin Panofsky understood images as symptoms of how a particular epoch, culture or society deals with "fundamental questions of the human spirit". The Mona Lisa, for example, can only be properly classified against the background of contemporary portraits of women, the image of women in the Renaissance, the courtly and bourgeois culture in the respective cities of Italy and the self-image of the elite of the time.
Iconology, in Panofsky's definition, therefore examines the meaning and function of certain images in their socio-cultural context. Unlike iconography, it determines the deeper meaning of a representation by recourse to the ideological ideas of certain cultures and epochs. This depth of meaning can also be expressed through socially agreed codes and signs, for example, through the clothing and hairstyle of the ladies or the objects they hold. All these details play a role in the iconological and semiological analysis.
These questions of context are notoriously difficult to model computer-aided. However, one could try to infer the viewers' prior knowledge and viewing experience, i.e. the codes of a society from the sum of all representations. Statistics and digital source analysis would at least be a start.

80] For the statistical evaluation of image motifs, empirical social research has developed quantitative image type analysis, in which different image types are developed by reducing them to the central image statement. Two formally dressed people on a red carpet with soldiers in the background, for example, is the state visit type. Another is when the military is involved in communication with the leader. Then it is the troop visit type. In a similar way, the type of signing a contract can be distinguished from the type of mourning. The identification of such image types enables research to draw conclusions about photojournalistic production and selection patterns as well as about the socio-cultural ideas conveyed with and in images. Now that we have seen how neural networks can be trained and how far object recognition has come in the meantime, it is quite possible to also acquire these determinations, which were previously mostly carried out manually, with the aid of computers and then to evaluate them quantitatively.

81] Such a communication structure is also inherent in historical imagery. On Trajan's column, for example, a few scenes such as the pardoning of prisoners, the speech to the army, the building of a camp or the purification sacrifice appear again and again and could be embedded in a similar way in the entire pictorial world of the time, as it appears, for example, on coins. However, they will only contribute to the understanding of an era when all visual and textual evidence is available digitally.

[82] The effect of works of art can also be grasped in reflections of later images. The question of what effect the picture has had up to the present day, and which elements were already used as viewer defaults when the picture was created, is the subject of reception analysis. Leonardo's composition already had a great effect on his contemporaries, as can be seen in Raphael's drawing. However, it only became an icon and perhaps the most famous painting of all when the painting was stolen in 1911, so that for two years a great hype was made about the missing painting. And today it adorns T-shirts and coffee mugs.
The frequency with which the painting has been used and cited can, of course, be easily determined by computer using methods such as Image Collection Exploration. The mentions in the various textual media are also easy to determine and statistically evaluate with data mining methods, and the interpretation in a reception and meaning network could be visualised in a data-driven way. However, a history of the use of images can only be prepared in this way and must currently be done qualitatively and in individual interpretations.

83] Regardless of historical reception, image studies are also interested in the fundamental effect of an image on the viewer.
This is where the relatively young research field of perception and attention analysis comes in. It is mostly conducted with visitor surveys and empirical research methods. However, eye tracking is also being used more and more frequently.

[84] This involves recording and analysing a person's eye movements, and is done with special devices and systems, the so-called eye trackers, which usually remain fixed in one place, and are now increasingly being used as mobile headsets. Eye trackers are able to register fast eye movements as well as points that are looked at closely or to which one returns. The analysis is carried out via so-called areas of interest, which are usually defined manually. to measure the time and duration of the observer's stay at that area. This very extensive data is suitable for statistical evaluations of various aspects of the distribution of attention and comparative visualisation as heat maps.

[85] How such an acquisition can look like, I will demonstrate again with the Mona Lisa. Heat maps show you where the viewers focused their attention in chronological order: first on the eyes and the mouth, before the gazes flow out and take in the background and the clothing. ... In the summary, however, the dominance of the eyes and mouth becomes very clear. ...
The same can be reproduced in the form of individual points that are looked at closely or several times.
A synopsis of the eye movements when looking at Ilya Repin's homecoming painting is visualised here as white dots and their connecting lines. As the central points of the painting, the faces of the three main characters are thus emphasised, from and to which one switches back and forth. The figure of the unexpected returnee is also patterned, probably in search of a key to understanding the painting.

[86] Thanks to Instagram, visual media has become a central part of social life and often captures moments associated with particular emotions. In order to be able to automatically extract these moods contained in the images, Convolutional Neural Networks are now also used to predict whether an image will be received positively or negatively. A comparison of the respective likes in relation to the number of followers results in a measurable quantity on which the neural network can be trained.
On the basis of this data, one could also try to have works of art evaluated. At least, I imagine that this would be an interesting introduction to comprehensive image analysis.

[87] Closely connected with the perception and effect of images is also their use. How and in what contexts was the image used? For which lines of argumentation was it used? How should the Mona Lisa be understood in these adaptations? As a witty boundary shift in relation to established art? As an empty cipher of pop culture? Which image of women, which role clichés and which ideals are transported in contrast to the original with these changes? All these questions can only be answered in the context of contemporary discourses. And here, too, a computer-assisted look at the breadth of image use can help, as offered by Image Collection Exploration or statistical evaluations of image types.

88] In addition to the concrete use of images and the social discourses that are carried out with them, image science is also interested in how images fundamentally guide seeing, thinking, feeling and knowing.
These questions of media psychology and more generally of media studies and cultural history are again directed at image structures in a larger context. Here, the Mona Lisa would only be one individual case among many and one would try to describe and make visible the breadth of phenomena starting from her.
These numerous and complex references can only be determined on a broad basis, whereby Big Data investigations such as Distant Viewing could be helpful. To do this, one would have to group the multitude of cultural data into meaningful data sets in a meaningful way and interweave them several times, as content analysis and cultural analytics do.

[89] First and promising groupings to support Distant Viewing have already been made. Following Franco Moretti's "Distant Reading" and in contrast to "Close Viewing", computer vision applications are used to analyse large amounts of image data without looking at the image or images themselves. Mostly, qualitative analyses are in the foreground here, but qualitative metadata can also be quantitatively compared. With the Vikus Viewer of the Fachhochschule Potsdam, for example, you can display van Gogh's paintings grouped according to the year in which they were painted and thus immediately recognise in a quick glance at the sum of the paintings that his colour palette increasingly lightened in the second half of the 1880s. The visualisation of coin issues can also illustrate remarkable differences in a fleeting glance from above, which can then be studied in detail by zooming in closer. Thus, distant viewing becomes close viewing and vice versa.

[90] The visualisation of a sheer quantity of coins, which can be grouped under various criteria, is achieved by another tool of the FH Potsdam, which they have simply called Coins.

[91] The founder of this direction of visual analysis of large image sets is Lev Manovich. His research, known since 2007 as Cultural Analytics, explores large sets of images and videos in an explorative way using digital image processing and quantitative visualisation methods. Using various computer techniques, large sets of cultural data are used to analyse basic cultural concepts and actions. Somewhat optimistically, he associates this with the hope that in this way new concepts and alternative ways of understanding human culture and its history can be uncovered.

[92] Van Gogh again: Lev Manovich compared van Gogh's 199 paintings done in Paris with the 161 done in Arles a year later by placing their average brightness on the x-axis with their average saturation on the y-axis. As a result, it appears that van Gogh had found a more consistent use of colour in Arles because the paintings are now grouped closer together. The next step would be for the art historian to take a closer look at the outliers in a close-viewing approach. But other investigations can also be stimulated by looking more closely at the sets, such as whether colour brightness and saturation in van Gogh's work are dependent on the motif of the painting.

[93] For exploratory media analysis, Lev Manovich increasingly uses larger image and video sets such as Instagram. He is interested in how to examine patterns in huge image collections that can contain billions of images and videos. In doing so, he focuses on interactive media processes and experiences. For example, since 2012 he has compared the selfies of residents of 17 major cities in terms of age and gender, and is able to uncover interesting identity concepts of the first fully globalised generation of social media users. This is interesting as a Big Data application, but also harbours dangers in the individual analysis as soon as the evaluations are linked to personal data.

[94] This is because facial recognition is now so advanced that we can be easily identified from photos. However, metadata such as time and location are also linked to the photos, so that we are already and often voluntarily completely monitored. Of course I am not happy with e.g. personalised advertising, but I get quite sick when I imagine how easily this data can be misused to suppress minorities or political opponents.

95] That is why I am also divided on the project "The real face of white Australia". Many thousands of non-Europeans were monitored and restricted under the White Australia Policy in the early 20th century on the basis of skin colour. There are extensive government records about them, documenting their lives. Their portraits were extracted from the government documents using facial recognition software and compiled in a browser. There are, of course, some misidentifications among them, but all in all, it's a good example of coming to terms with one's own past. At the same time, I find the approach frightening when you imagine the same procedure being applied to today's residents.

[96] And that's not even pie in the sky. A scientific study has extracted the sexual orientation of people from their facial photos with very good results. Their image set contained 35,000 pictures from a dating platform. It remains to be seen whether the very reason for wanting to appear attractive to a certain audience creates clear stereotypes that can be measured. What is frightening is that governments, even in Europe, which are again increasingly discriminating against lesbians and gays, are given a supposedly objective means to support them in their discrimination. Imagine the anger if you are falsely arrested when entering such a country. Because any kind of pattern recognition only provides percentages and not identifications. Whether you make 80% or 90% the basis of the determination is subject to interpretation. But what is the point of even 90 per cent hit rates if one in ten is misclassified?

[97] This brings us to a very fundamental problem of pattern recognition in the humanities, and that lies in the certainty with which the content of images is inferred from their form. I fear that we are reintroducing structuralism through the back door, i.e. a research approach that was particularly popular in Germany from the 1920s to the 1960s and which also had a strong influence on classical archaeology. Here I show the introductory work of Guido Kaschnitz von Weinberg, who in particular linked the formal similarities of pictorial works with ethnic groups. There is the typical Etruscan, the typical Roman, and of course all this was something that has been used in the terrible times in Germany to read discriminatory determinations from such features.

98] Such generalising determinations, however, only work by neglecting historical conditionality, changeable visual experiences and social developments, because verbal and pictorial expressions not only depict reality, but also construct it. At the Parthenon in Athens, a metope is particularly well preserved, which is due to the fact that in Christian times the scene with the two Greek deities Athena and probably Hera was thought to be an annunciation scene because of its similar composition. The same image is thus understood completely differently in a different cultural context. Accordingly, socio-cultural realities are not constants and not necessarily always the same, but only one possibility of social development among many.

[99] This visual construction of reality brings us to a third danger associated with automatic image recognition, namely that of simulation and fake. Patterns can not only be recognised but also used productively. From the complete works of Rembrandt in high-resolution scans, Rembrandt's artistic DNA has been extracted, so to speak, and used to create a new painting in the manner of Rembrandt. Since the 3D textures had also been scanned, it was thus possible to produce a deceptively real painting with a 3D printer.

[100] And photos are now also frequently the subject of fakes. On the site whichfaceisreal.com you can check for yourself whether you are still able to distinguish photographs of living persons from artificially created ones. Not only for social media platforms do completely undreamt-of possibilities arise here.

101] As has become clear, with digital image science we are on the threshold of completely new research and analysis methods. However, not only do exciting possibilities arise from this, but also scientific challenges. On the one hand, image pattern recognition must be further developed, especially in the historical dimension. Source criticism and context research play a prominent role here.
The combination of Distant Viewing and Close Viewing offers undreamt-of possibilities. Precisely determining and optimising their relationship should be the task of case studies that should accompany the technical development.
The variability and diversity of cultural expressions and processes should be in the foreground. Instead of focusing on the "typical" and "most popular", I see great potential in the breadth of the spectra, with their heterogeneity and fuzziness. Big Data applications in particular also provide a view of the special, which is otherwise rarely the subject of investigation.

102] WHAT YOU SHOULD KNOW
§ Possibilities of digital image processing
§ Different methods of digital image analysis, their advantages and areas of application § Good practice examples of digital image comparison
§ Big Data approaches in digital image science
§ Structure and possible applications of computer vision and convolutional neural networks § Technical methods for measuring images and viewers

103] Practical experience in using an image editing programme (cropping, working in multiple layers, histograms & tonal corrections, use of filters) § Comparing images digitally
§ Develop criteria for creating image sets

104] Which procedures of digital image analysis do you know? (image pattern recognition, stylometry, ) How do you assess their possibilities? What are the advantages of image pattern recognition over textual annotation of images?
In which areas can computer vision facilitate work with large image archives?
What approach does Lev Manovich take with his Cultural Analytics?
Briefly characterise a method for computer-assisted painter attribution.
How do you think image analysis can particularly benefit from the use of computers?

105] Finally, let's look at the literature again. You will also find the books in our institute's handbook. With that, I bid you farewell and wish you a good week. This time it was a relatively long hour because we also had to discuss a central area, explaining the basics of art history and computer science. Next week it will be about the analysis of objects, i.e. 3D analysis. I wish you a nice week and good luck with your learning.