Martin Langner, Introduction to Digital Image and Artefact Science (Summer Semester 2021)
II. Digitisation and Data Management:
Lesson 3. The Digital Image (https://youtu.be/aV9tPEL8cr0)

[1]	Introduction
[2]	  Pictorial / Iconic Turn
[8]	  Content of this lecture lesson
[9]	1. From Code to Image: Computer Graphics Terminology
[11]	  a) Pixel and vector graphics
[15]	  b) Image size, resolution
[17]	  c) Colour depth and colour space
[20]	  d) Processing and storage
[24]	2. The Digital Image
[25]	  a) Image data: Definition
[32]	  b) Properties of the Digital Image
[39]	  c) The Digital Image as a Double Image
[44]	3. Digital Acquisition
[45]	  a) Text-oriented acquisition
[57]	  b) Image-oriented acquisition
[70]	  c) Annotation vs. Pattern Recognition
[76]	Conclusion
[76]	  Current research questions
[77]	  What you know and what you should be able to do
[80]	  Literature

[1] Our third session, to which I welcome you, is once again about digitisation and data management. Last time we already established that digitisation does not simply mean scanning, but that data must be enriched in order to be machine-readable. This applies not only to texts, but also to images, which we want to deal with today.

Pictorial / Iconic Turn
[2] In the last 30 to 40 years, the appearance of the digital has changed considerably. To put it briefly: it has become more pictorial. A milestone was reached in the mid-1980s with the user interfaces of the operating systems still in use today, when they became graphic. The graphic interface now became the user's access to data that was not visible to him. It enabled him to make application software on a computer operable by means of graphic symbols and control elements.
The design of today's graphical interfaces often uses the metaphor of a desk with folders, windows and a wastebasket. This concept became popular from 1984 with Apple's Macintosh, and Microsoft even adopted it in the name of its operating system shortly afterwards. Not only did the user interface become pictorial, but also the complex processes in the computer were translated into generally understandable images.
And what a development has taken place since then! Today you can bring the entire visual world onto the screen and interact with it. Passive consumption, long the hallmark of mass media such as radio and television, is now passé. The user interacts with the computer, and the users with each other in the social media. The basis of this communication is a new aesthetic form of mediality: the interface that makes it possible for people to move in the data space in the first place. Significantly, this happens primarily visually.

3] The change in communication towards pictorial forms of exchange is just as clear in the appearance of websites. The relationship between text and image has shifted extremely in favour of the image, even in traditional dailies like The Telegraph, whose articles are supposed to be the main focus. In short, images have never been as present as they are today. Alongside recognised works of art, technical images, scientific graphics and photographs are at least as important in social media. The ubiquity of images on television and the internet, the increasing visualisation in the natural sciences and imaging techniques in medicine have given images an unprecedented presence and importance that no one can escape.

4] This phenomenon was described in the 1990s with the catchword "Iconic Turn" and called for a more intensive examination of the concept of the image and the use of images in all their facets. However, the change began much earlier. It is well known that the invention of printing represented an epochal turning point. Images, which had previously only ever been individual works, could now be distributed en masse. The communicative possibilities associated with this were used intensively at the latest at the time of the Reformation and Counter-Reformation to illustrate the respective beliefs of the correct doctrine (in the literal sense). It had already established itself as a propaganda medium with the invention of the copperplate engraving, which now knew no limits to reproduction in unprecedented fineness on an almost indestructible printing plate. But this also raised the question of the original work of art. Was the printing block the original or were there now countless originals, a question that became even more virulent with photography. Even though the first photographs were composed in a similar way to paintings and prints, the character of the image changed fundamentally. Now it was possible to record reality supposedly objectively and to reproduce it at will. However, all these processes had a starting point of reproduction in the form of the printing plate and the negative. Without them, there was no reproduction. But this last specific of the traditional image is also eliminated in the digital. With digital photography, the "originals" can be copied and distributed without loss from any copy, so that pictorial social communication could now spread explosively all over the world.

[5] However, this has also significantly changed the nature of communication, because images do not proceed argumentatively, but present visual statements. Renate Bosch puts it in a nutshell: "In general, images can be neither true nor false because they do not possess a clear predicate to which truth or falsity can be referred. Images produce evidence." For example, computed tomography of a wood-fibre-reinforced injection-moulded component, which you can see in the image on the right, quickly makes complex measurements, such as the percentage of fibres by volume in this case, clear. However, the colour distribution is not predetermined, but already a suggestion for interpretation. This obvious evidence of the images thus requires interpretation by the viewer, who can only contextualise and understand them in the respective frame of reference. This context, however, is fluid and can, for example, be recreated again and again in social networks such as Instagram by copying and disseminating the images.

[6] According to Magarete Pratschke, the flood of images in the media world that surrounds us is contrasted by a "Bilderebbe" in the academic discussion of images.
Thus, in contrast to the supposed "decanonisation effects" associated with digitisation, a more far-reaching narrowing of the art canon can usually be observed in art historical projects, which initially, and certainly also due to the digitisation of the diatheques, concentrated on the works of recognised artists in their digital capture strategies. You know the projects on Leonardo and van Gogh. And the big museums understandably proceed in a similar way and first put their highlights on the internet. In a very similar way, archaeologists first visualised Athens and Rome in 3D models, whereby the digital models were often not based on the latest excavation results, but in the case of Rome on Gismondi's model from the 1940s.

[7] Another highly controversial form of image inheritance in terms of science policy can be observed in digital publications, where the research objects cannot be shown due to image restrictions ("image not available online" one then reads). This fundamentally restricts image studies and deprives it of the advantages of the expansion of the canon that has been constantly demanded since the discussion about the Iconic Turn. This can really only be countered if great efforts continue to be made to digitise collections, to make the digitised material freely available, to standardise the acquisition processes, to network the image files and metadata worldwide, and, of course, to support the whole thing financially accordingly.
As IBM already showed in the 1990s, digital images are also of commercial importance. Do individual museums still have a chance at all in a market dominated by players like google or Getty Images? Therefore, a critical examination of the economic aspects of digital images also seems necessary, but we cannot do this here.

Content of this lecture lesson
8] Today we want to approach the digital image in three attempts. From Code to Image deals with the technical aspects, scanning, the peculiarities of image processing, image editing and the storage of images. The second part then deals with a theory of the digital image. What is a digital image and what are its characteristics? And thirdly, we ask what advantages the text-oriented and the image-oriented digital acquisition of images have.


1 From code to image

[9] So let's start with the technical aspects of the digital image.

Code and image: technical terms of computer graphics
[10] While book scanners, such as here at the SUB's Göttingen Digitisation Centre, work with digital cameras to preserve the spines of the books, flatbed scanners are used for the digitisation of images. Here, the documents to be scanned are placed face down on a glass plate. A combined illumination and scanning unit travels in a flat "bed" under the glass plate, similar to the scanning unit in a digital photocopier. An image can therefore only be detected by the detector on the basis of the light it reflects. Transparent images such as photo negatives, film strips or slides require special accessories that illuminate them from the top. Digital cameras can be used for the same purposes as image scanners. Compared to a real scanner, a camera image is subject to a certain degree of distortion, reflections, shadows, low contrast and blurring due to camera shake (reduced in cameras with image stabilisation). The resolution is sufficient for less demanding applications. Digital cameras, on the other hand, offer the advantages of speed, portability and non-contact digitisation of thick documents.
To achieve optimal results, it is important to know the physical resolution of the scanner. For commercially available desktop scanners, this is usually 600 dpi, although manufacturers like to quote much higher figures. However, this is usually only interpolated, i.e. extrapolated by the software.

11] On the screen, everything (be it text, numbers and mathematical symbols, notes or photos) is seen as an image because everything has to be broken down into pixels in order to be displayed. It was not much different with print. Printed images consist of a collection of individual, very fine print dots that are only mixed together in the eye to form a raster-free image. As a measure of spatial print, video or image scanner dot density, dpi, or dots per inch, refers to the number of individual dots contained in a line within an area of one inch (about 2.54 cm). Monitors do not have dots, but pixels do. The closely related concept for monitors and images is pixels per inch or PPI. In each case you see the rasterised version of the print on the left and the pixelated version of the monitor display on the right.

[12] A pixel or raster graphic consists of individual picture elements (pixels). In the coding, each pixel is determined from three values: its position, its colour value and its brightness. If you enlarge such graphics strongly, such staircase effects occur. Even continuous lines are only collections of dots in the scaling.
For line graphics, therefore, another coding is more suitable, namely the description by vectors. These vector graphics consist of individual lines and can be scaled as desired without loss of quality. For large, geometrically structured templates, relatively small files can be created in this way. However, the colour information refers to the respective vectors and the areas they form. Graduated colour transitions are not so easily possible here.

13] By means of corresponding calculation processes, which are available in every image processing application, pixel graphics can be converted into vector graphics, i.e. vectorised, and vice versa vector graphics can be rendered.

[14] As we have already seen, images are rasterised differently for display in print than on the screen. Therefore, a moiré effect easily occurs during scanning, namely when the amplitude-modulated screening from pixels of different sizes, as used in printing, is transferred to a frequency-modulated screen. This effect cannot be reduced without loss after scanning, which is why it is advisable to use the relatively good descreening algorithms in the respective scanner software already when scanning in images.

15] Raster screens are used as output devices for the digital image. They display the image to be displayed as a raster of pixels, each of which is assigned a colour value.
Image size is defined as width × height in pixels, e.g. 600 × 900 px.
Monitors usually have a resolution of 72 pixels/inch, but newer ones are much higher. For printing, 300 dpi (dots per inch) is usually used.

16] Many monitor drivers try to avoid the negative effect of the rasterised display by using dithering for the display. This is an optical aid (or simulation) for colour optimisation and scaling. You can observe it well in the umbrella on the right. The pixel graphic actually has a very small image size. However, the low resolution associated with this is covered up here by simulating non-existent colours by displaying a mixture of neighbouring colours. The dithering method chosen here is called "diffusion". It arranges the pixels according to a random pattern.

17] A pixel in a black-and-white graphic requires exactly one bit. If the bit is 1, the pixel becomes black, if it is 0, it remains white.
A picture with an image size of 100 × 100 pixels consists of 100 × 100 × 1 bit, i.e. 10,000 bits or 1250 bytes, which is 1.22 kbytes.
Accordingly, 2 bits are needed for four gradations, four for 16 and eight for 256. This is then also the usual colour depth of greyscale images. And here, too, the file size can be calculated analogously:
For a colour depth of 8 bit, i.e. 256 colours, the following applies: 100 × 100 × 8 bit = 80,000 bit = 10,000 byte = 9.76 kByte. Colour depth 24 bit= 16,777,216 colours: 100 × 100 × 24 bit = 240,000 bit = 30,000 byte = 29.3 kByte.
The table on the right summarises the relationship between colour depth and file size.

18] The colour depth in bits thus defines the number of colours per pixel. However, the appearance of the colours also depends on the output medium, display or print. This is because a different colour space is available here. In the physiological colour mixing of the monitor (or also of the television), all colours of the colour wheel are created by mixing the light, i.e. the wavelengths of the three primary colours red, green and blue are added (i.e. superimposed). Because of red, green and blue, the colour space is also called the RGB colour space. Physical colour mixing uses a generative colour model that describes the technical mixing ratios of its four primary colours. It describes the change of a colour stimulus when reflected on the surface of a body.
With the help of three colour filters cyan, magenta and yellow connected in series, colours are not mixed, but a change in the light spectrum takes place, as a result of which only changed colours are seen.

19] However, the representation of colours on the monitor does not quite correspond to human perception, because the colour space of the human eye, shown as a coloured parabolic surface in the graphic on the right, is larger than that of the screen representation (RGB). In the meantime, however, the development from the original sRGB colour space, via Colormatch and Adobe RGB to ProPhotoRGB has come close to the human eye. However, in print, i.e. in the CMYK colour space, not even half of the colours can be represented. A scanned image therefore often looks different on screen than it does when printed.

20] As we have seen, image files can become very large. When compressing an image file, clusters of several pixels are formed, which then have to be stored as a set only once and referenced at each occurrence. With very high compression, several clusters with a similar structure are also combined, which is why lossless compression is now no longer possible. Unlike LZW compression in the TIFF file format, JPEG, for example, is such an efficient, but lossy compression. You can see it well in the right section, where e.g. below the lip clear clusters in the form of squares are visible. Such artefacts are typical for high compression in the JPEG format.

[21] JPEG artefacts are not the only problems in dealing with digitised images. I already mentioned the moiré effect. In addition, there are a number of other image errors that will not be discussed in detail today.

Code and image: processing
22] I would just like to mention that in most image editing programmes you can use the automatic batch processing for recurring, similar process steps. For example, the change of size, format or resolution can be automated and carried out on a selection of images at once. Tags and metadata can also be added or replaced automatically.

Code and Image: Storage
[23] And finally, an overview of the most common file formats for images.
.svg (Scalable Vector Graphic Format), as the name suggests, stores the image as a vector graphic. It is scalable, which is why compression is not necessary. The maximum number of colours is only 24, which is sufficient for line graphics. SVG is therefore mainly used for printing line graphics.
The other formats listed here save pixel graphics .png (Portable Network Graphics Format) has the highest colour depth and can also save images in layers and transparently, which is why it is mainly used on websites that define the background in the HTML code. Compression is lossy in this format.
.jpg / .jpeg (Joint Photo-graphic Experts Group) is the format of choice for saving images for screen display because of its efficient, albeit lossy, compression.
This also applies to GIFs, whose colour depth can be reduced and which also allows animation, i.e. the film-like playback of image sequences.
TIFF, on the other hand, is best suited for pre-press. It allows lossless compression and the storage of multiple layers.

2 The Digital Image

[24] The second part will now deal with the properties of the digital image.

Image data: Definition
25]   We have been talking about images and image data for a while now without having defined exactly what an image actually is. In the first lesson, we had emphasised that image sciences use an extended concept of the image, which includes all historical and media forms of the image, namely two-dimensional pictures and graphics, plastic images and artefacts, photographs, electronic and digital images, as well as virtual spaces. Image Studies goes beyond analogue and virtual images and also explores immaterial images and ideas. Thus, we have used a number of examples to outline what an image can be. For a definition of the phenomenon is difficult and controversial. A widespread definition comes from Gottfried Boehm: "What we encounter as an image is based on a single basic contrast, that between a manageable total area and all that it includes in terms of internal events. The relationship between the vivid whole and what it contains in terms of individual determinations (of colour, form, figure etc.) has been optimised in some way by the artist". In short, it is a two-dimensional phenomenon with artistically designed individual determinations.

26] But to what extent does this definition also apply to the digital image? If one takes a look at the creation of a digital image, it is not at all clear what exactly is the stage of image manifestation in this process; in other words, at what point should one speak of an image?
From a motif, in this case a tulip, a mirror image is transferred to the image sensor with the help of a lens, which actually only measures the incidence of light. The result of the measurement is only transformed into pixels by means of image processing and saved on a storage medium. For the human eye, however, the digitally generated image is only visible on the camera's display. As a rule, at least the professional digital cameras store the image information uncompressed in raw format, which is then transformed on the PC with the help of an image processing programme into a format that can be displayed on all systems and also printed out. So now the question arises, what exactly is the digital image? The signal captured by the light sensor? The image transformed into pixels in its processed form or only in its stored form? Or does a digital image only exist as such when it can be seen by the human eye? And when exactly is this the case?

27] As a graphic user interface, the display forms the front end, the receptive surface on which the content called up from the back end (i.e. the invisible interior of the data world) is made visible. Digital images can therefore only be visualised as images by means of imaging processes. The digital image actually only exists now.
So if we define image as a visual phenomenon (which is not necessarily the case with images in the head), we have to agree with the media scientist Claus Pias, who has put forward the thesis that the digital image does not exist at all. For him, there is something that leads to data with the help of information-providing methods and there is something that creates images with imaging methods. Both phenomena are decoupled from each other and completely heterogeneous. (Pias 2003: 18). However, this conclusion is only true if there were actually no connection between coded information and its visual realisation on the screen. But this is not the case in current visual practice. Jens Schröter therefore tries to define the digital image as a pictorial phenomenon with concrete properties that is created with digital code. And Harald Klinke goes one step further and speaks in general terms of "visual information" that can be visualised on a display or printer, altered with image processing software and distributed via email, the internet, computer games and so on.

28] Following Harald Klinke, we would like to attempt a universal concept of the image. The image as a visual phenomenon then consists of light information on an image medium. In the case of a painting, it is colour information in bound pigments on a picture carrier made of wood, canvas or another material.
The photograph consists of brightness information in silver nitrate molecules reproduced on an image carrier (usually silver gelatine on paper).
And the digital image comprises colour information in bits for output to a display, with variable brightness values of the RGB sub-pixels.

[29] If we tie this distinction to the traditional art-historical approach to images, we get a model consisting of seven layers, which we have to imagine like an onion in terms of hermeneutic penetration, where we also only reach the centre layer by layer.

30] But how is the information structured? That depends very much on the file format. Each stores the position, the colour value and the brightness of the pixels in a different coding. The most common file formats you should know are in the area of raster graphics: BMP, DNG, GIF, JPEG, PNG, Photoshop's PSD, RAW and TIFF.
Vector graphics are mostly saved in Adobe Illustrator (AI), Encapsulated PostScript (EPS), WMF or EMF formats. Should you want to learn more about them, you can get a good overview and annotated list of file formats in the wikipedia article on "Image File formats".

31] In addition to the pure code, metadata is also stored in an image file. In addition to automatically generated information on the file format, the file size, the file path or the creation and modification date, camera data (such as exposure time, aperture, etc.) can also be stored here. Text fields are also available for describing the image content.

Properties of the Digital Image
[32] Jens Schröter had defined the digital image as a pictorial phenomenon with concrete properties generated with digital code. These properties include its granularity and addressability. Granularity (or graininess) of data is the number or depth of subdivisions in its acquisition. The information can be fine granular in many pixels or coarse granular in a low resolution with few pixels. For example, the colour depth and dot density of the scanner, digital camera or monitor determine the appearance of the digital image.
Each pixel can be addressed exactly, with a mathematical precision that is not possible in language. This addressability is also given in image processing with the gradation curve, where in the tonal range between 0 and 255 the pixels can be selected and processed independently of each other and with tonal precision. This detaches the examination of the image from textual description, where the image was often only illustration, and gives it its own instance, which can now be examined even in the smallest details. For image science, this means that the analysis of images can now be measured and processed not only linguistically-textually, but also digitally.

[33] Another special feature that image processing programmes have in store for the digital image is working with transparency and layers. The idea originates from animated film production, and individual parts of the image are also visualised on different layers in the image file. The digital image therefore consists not only of individual pixels, but also of layers on which the pixels are arranged in groups (and sometimes in competing versions).

[34] The digital image can be manipulated in a targeted way. Colour corrections and distortions are possible without leaving traces. The visual has developed more and more from the depictive to the constructed. This has far-reaching consequences for the way we act in social discourse. For we live in the age of the deep fake, i.e. images and videos created with deep learning methods that want to convincingly make us believe that what is shown is a real existing document. Famously, in a video sequence released in 2018, Barack Obama looks earnestly into the camera and says: "There's a new age dawning right now where our enemies can make it look like all kinds of people are saying all kinds of things at any time - including things they would never actually say." Except that those words were literally put in his mouth by Jordan Peele.
When Russian state television puts a smile on Kim Jong Un's face when he meets Lavrov, it concerns an illustrative change in the basic message, and is perhaps still acceptable. However, when manipulated images are made the basis of military actions or population groups are specifically influenced with such fake news, digitally manipulated images can shake the foundations of social and political life.

[35] Fortunately, knowledge of this characteristic of the digital image is so widespread that it can now be dealt with ironically in social media. And the use of the word "digital" now always implies that it is not real, that it is fake.

[36] This manipulability, however, not only accommodates self-representation, but also the image-scientific way of working, in that it turns the digital image into an experimental medium that can be reconfigured in many ways, recreates past states, simulates the hypothetical and perhaps even preforms the future. This variability and processability of the digital image can thus certainly also be seen as an advantage, which, on the other hand, naturally also lies in the fact that it is possible to create lossless copies.

[37] The digital image differs fundamentally from its analogue counterpart in that the visual information is separated from its material appearance. For Marijke Goeting, digital images are therefore only mosaics of particles. The essence of digital technology is that it breaks everything down into a collection of point elements. Consequently, in order for a digital image to be seen, it must be (re)calculated. Zero-dimensional points must be converted into an image. The digital image is therefore never static, but always in motion and reacts to our every movement. It flows across the screen 60 times per second, varies depending on the hardware, software or internet connection used and reacts to our input. It is fundamentally malleable, reconfigurable and fluid. Since the digital image can be rendered at any place and at any time, it exists only in the moment. Thus, the continual generation of new images ultimately changes the overall image. In this understanding, images (as well as the facts they illustrate) are situational and thus often fluid, dynamic and unstable.
Fluidity, performativity and reconfiguration are therefore important concepts of the digital, because the digital image is always only a temporal manifestation and virtual configuration of information.

[38] Another important property of the digital image is its ubiquity or, for lack of materiality that binds objects locally, it is simultaneously present and globally available as code. The result is that the digital image, stored in databases and repositories, is comprehensively available to a scientific community and can thus be indexed and linked to other images in a completely different way than, for example, images in illustrated books. Thus, delocalised processing is possible, which can also involve broad sections of the population in the acquisition of information, which is sufficiently referred to as crowdsourcing. In the third dimension of a VR infrastructure, scientists worldwide can, for example, exchange information about certain medical phenomena and their modelling. Hopefully, it won't be long before historical phenomena can also be studied collaboratively in this way.

The Digital Image as a Double Image
[39] The digital image thus initially consists of digital code that takes shape as a sequence of bits. The binary coded information then appears as points of light on the surface of the display. The digital image thus exists in two ways or in two forms - on the one hand as a visual phenomenon on the surface and on the other hand as digitally encoded information on the subsurface, as Frieder Nake, the Bremen computer scientist and pioneer of computer art, calls it. For him, the digital image is therefore a double image. He had already pointed out in 2001 that digital signs are always interpreted in a double way, by humans on the one hand, who see the images on the surface, and by the computer on the other, which can change and transform them on the so-called ‚subsurface' (Unterfläche), whereby the surface does not know this possibility of manipulation (Nake 2001, 740; Nake 2008, 149).

[40]  In defining the surface and the subsurface, Nake distinguishes two different levels of interpretation of how humans and computers deal with digital images. Let's take as an example two tools of common image editing programs to crop photos, i.e. to define what is shown in the foreground and what belongs to the background. The magic wand works on the subsurface and is used to interpret images algorithmically. One could say that the computer defines what to select as background based on thresholds between pixels within coded information about colour and brightness values. 
In contrast, the lasso allows the user to act on the image surface and select objects in the image by outlining them. So it is the user (or his skill) who decides what belongs to the background. Copying or deleting this manually made selection, however, is again the task of the algorithms acting on the subsurface.

41] It is not inherent in the code to be interpreted as an image and displayed as such on the surface. "The medial logic of the image is grafted onto it from outside, it is the result of an imaging process." It is quite possible to render the same visual information differently on the subsurface. Texts, for example, can be encoded as a sequence of characters or as an arrangement of pixels. If the same "image" is stored in different image formats, these are converted differently into code in the subsurface, although they appear to the user as the same image on the screen surface. And even when mediating between the surface and the subsurface, the same thing can lead to different results if, for example, it is not possible to distinguish between differently coded colour tones because the colour depth of the monitor is too low or because of a certain calibration.

[42] But even if the magic wand promises to identify image objects automatically, this tool still requires human action, as the user has to specify with a mouse click which object should be selected. When using the lasso, the user selects image objects on the surface by framing the boundaries of the object, while the user of the magic wand must evaluate the result of the algorithmic selection of the image object and, if necessary, correct it by extending or narrowing the selection range. This makes it clear that, at least in the context of digital media technologies, the surface and the subsurface of digital images tend to be decoupled from each other on the one hand, but on the other hand are mutually dependent. The criteria for the identity of images thus double in the field of digital media, whereby neither the visual phenomena on the surface nor the bit sequences in the depth of the computer can be given preference over the other side.

[43]   However, this terminological distinction between surface and subsurface is not only useful for digital images, but can ultimately be applied to all digital products and processes implemented in the digital world, including the use of databases. Here too, information that is identically arranged on the subsurface can be visualised differently on the surface depending on the query and layout.


3 Digital acquisition

[44] After first dealing with scanning and the basics of computer graphics, and then asking in general what the digital image actually is, what properties it has and what effects result from it, in the concluding third part we would like to pursue the question of what consequences result from this for the digital acquisition of images. Following on from the second lesson, I would like to present the advantages and disadvantages of text-oriented and image-oriented digitisation.

Text-oriented acquisition
45] Digital images are often made available as scans in online databases. This still quite analogue form of acquisition and presentation results from the slide libraries of institutes, museums, archives and agencies that were digitised in the 2000s. The photos and slides are also labelled and sorted according to predefined categories. A central task of these huge image collections is their searchability, which is why keywording is just as important as the subdivision of information according to search criteria, such as here, for example, artist, place of production, title, dating, era, location, etc.

[46] Such image pools and research databases are not only available in art history, but also in archaeology, for example.

[47] The digitisation of large image collections had already begun in isolated cases in the 1970s. With the spread of the internet, higher resolution screens were able to make scanned images accessible worldwide.
Today, the then very impressive online collections of the Art Renewal Center, the Web Gallery of Art and others seem a bit tired. With their text-oriented acquisition in simple database structures, these pioneers set the standard for many other repositories that are still current today, such as the online compilation of collection highlights or even the inventory catalogues of larger museums.

48] Successively, the focus of interest also shifted to the networking of data stocks. An early example of good practice is the British Museum, which has made its entire collection accessible online, linked it with norm data and linked open data, and made it externally linkable for other databases.

[49] Indeed, a Semantic Web version exists that "makes the British Museum's collection data available in the W3C Open Data Standard (RDF), links to and refers to a growing number of linked data published worldwide." Norm data stored in thesauri, highlighted in red font on the website, guarantee a standardised exchange of data.
The photos of the collection objects, which fortunately can be enlarged, have only an illustrative character here, even if in the case of two similar collection objects, the described object can only be clearly determined by the image.

[50] In order to keyword, catalogue and describe the subjects of images represented in artworks, reproductions, photographs and other sources, controlled vocabularies are essential. Associated ontologies, in which classes of terms are defined and also the relationships of these classes to each other are described, ensure that one can read and also evaluate information across the various databases. ICON Class, for example, comprises 28,000 hierarchically ordered definitions, divided into ten main sections. Each definition consists of an alphanumeric classification code and the description of the iconographic motif that names the classification. Significantly, this text-oriented system of capturing the content of images was driven by the libraries, while image studies raised concerns from the outset out of a justified concern about oversimplification, though these now seem to me to have fallen silent (also in the absence of tangible alternatives). Linked open data was long considered the most promising promise of semantic arrangement and penetration of humanities content.

51] Archaeologists in particular, who always find and have to evaluate the same types of material on their excavations, became interested in a standardised description of the found objects very early on. A relatively new example is Kerameikos.org, which is a joint project that provides an ontology for ceramic data, although it is currently still limited to Greek black- and red-figure pottery.

52] There, for example, the following RDF data are provided for the Attic potter Exekias: Beginning with the naming of the XML version and encoding, the various namespaces, or XML name spaces, used here are first evidenced. Their mode of operation could be compared in a certain way to prefixes for telephone numbers, which provide a dial-in node or, in this case, a linguistic frame of reference for all subsequent mentions.

[53] For the indication of the place, namespaces must then also be determined before one can go to the geographical designation Attica. Attica is defined here in a SKOS core. The SKOS core or Simple Knowledge Organisation System core is an RDF schema for representing thesauri and similar types of knowledge organisation systems. SKOS-Core can be used to port existing Knowledge Organisation Systems to the Semantic Web or, as here, to create simple concept schemas for the Semantic Web from scratch. In our case, the resource "Attica" is assigned preferred lexical labels in the different languages and spellings and a definition "Attica is a historical region that encompasses the city of Athens, the capital of Greece". With "exactMatch", reference is also made to other, already existing ontologies and thesauri.

54] In the same way, further concepts linked to Exekias, such as "Black Figure", are then defined.

[55] And then also the name of the Greek vase painter and potter. The establishment of such an ontology now has the effect that in databases and on the internet, the vase painters, their style and their place of production can be sufficiently referenced, regardless of the input language. For example, if one searches for black-figure vases from Athens, using this ontology, images can also be displayed that have only been labelled with the name of the Exekias.

56] In the Image meta search, indexing and searching is thus carried out via metadata such as keywords or links. The images and objects themselves (detached from their appearance) are thus understood and interpreted as bags of words. That such an approach can make sense is beyond question. However, it hardly does justice to the specific character of images. Moreover, the text-oriented acquisition of pictorial works is time-consuming, despite all the ontologies available.

Image-oriented acquisition
[57] On the other hand, there is image-oriented acquisition, which can also only refer to formal criteria, namely visually distinct properties such as colour, texture or shapes. Content-Based Image Retrieval (CBIR) uses computer vision techniques to retrieve the most visually similar images to a given query image from an image database.

[58] This method works relatively well when searching for identical images, possibly in different image sizes. Here, as you can try out for yourself at TinEye.com, excellent results can be achieved. For example, if you downloaded a photo a few months ago and now no longer know on which page you came across the image at that time.

59] The multi-colour search is also interesting. Here you can select four colours and specify their ratio to each other as a percentage, and from the set of all photos uploaded to Flickr you get all the pictures that have the desired colour distribution.

[60] The methods considered so far have not annotated the images in the image itself, but have externally keyworded the respective image files in databases or internally enriched them with metadata. Automatic image annotation takes a different approach. Here, pattern recognition and machine learning techniques are used to automatically assign metadata in the form of captions or keywords to digital images by using extracted feature vectors and training annotations to automatically apply annotations to new images. It is thus a kind of classification for several (strictly speaking, even a great many) classes, which in a sense matches the text vocabulary with the "visual vocabulary" via relevance models.

[61] The basis of automatic image recognition is the annotation of images in training sets. Similar to text annotation, the respective areas of the image can be marked and given meaning. Such methods are used, for example, in autonomous driving, where lines, in this case the road boundaries, are marked or moving cars are outlined as cubes. These markings are used to train neural networks on large image data sets to automatically recognise these areas in the image data.

[62] In the image science domain, bounding boxes and full segmentation are more commonly used. In the first case, the image objects to be recognised are surrounded by a box or a line. In the image file, the pixels selected in this way are then saved as a cluster and marked with a label. More time-consuming, but pixel-precise, is the decomposition of the image into its image components, which are labelled in different colours and annotated with a term. You will get to know these two methods in more detail in the coming semesters. In advance, I would like to point out various annotation tools, as far as they are compiled on wikipedia.

63] The VGG Image Annotator, which was developed at Oxford University primarily for the annotation of objects in films, is easy to use. If the objects are slightly detached from the background, they only need to be roughly circled. From a list previously created by the user, one then selects the appropriate term, in this case Swan. This tool also contains the possibility to designate the recognisability of the picture object. This is important, for example, when overlaps obscure the outer contour.
The area annotation with bounding boxes or with outer contours also works excellently with Greek vase paintings, as you can see here.

64] To mention another annotation tool: The Open Image Annotation Viewer is a web-based tool that can be used to save and annotate a high-resolution image. This tool allows several graphical annotations in the form of a line, a circle, a rectangle and by means of a colour palette. At the same time, you can also enter text and save it as metadata in the image file.

65] The MATLAB Computer Vision Toolbox, which is not free of charge, is very convenient and has integrated image processing functions so that even full segmentation does not take too much time.

66] If you combine all these methods, you get a very powerful tool. Here on the slide you can see a prototype that I used as a first test case to evaluate the board game scenes on Attic vases.

67]   Even though in a few years we will have established methods that can classify images quite accurately, there will still be categories that are not an inherent part of the picture and therefore cannot be extracted from the visual appearance. These include proper names like Parthenon or semantic categories like "Temple on the Acropolis of Athens". The advantage of automatic image annotation over content-based image retrieval is therefore that queries can be entered textually. However, in ignorance of the training data, the results returned by the google Images search are not always comprehensible and therefore hardly usable scientifically.

68] I liked Pixolution.org better, where you could also enter a search term for an uploaded image in the similarity search and also had the option of weighting the ratio of the two queries as a percentage. In 2016, the result for a 90% match was still very much dependent on the colour distribution, so that, for example, the Erechtheion, Hadrian's Gate or a church were also among the results.
The service has unfortunately become chargeable in the meantime, which is understandable to a certain extent, because the collection and maintenance of the necessary image sets is very time-consuming and expensive.

[69] For the same reason, in 2018 Incogna.com had to take its service offline due to data-mining attacks and is now only available on request.

[70] Finally, I would like to draw your attention to the similarity search via "most changed" at Tineye. For the Mona Lisa, you can find quite funny results there, without wanting to go into the basic problem of determining similarity in more detail today. It is a central problem of digital image and object science.

71] As a result of the past lessons, I would like to conclude with a discussion of the methods of digitisation. One standard of text-based acquisition of cultural property is the annotation of image data.
Controlled vocabularies, taxonomies and ontologies are used here, i.e. expensive expert knowledge is used to digitise the images and objects. The advantages are obvious: as a result, the digitised images with their meta- and paradata are very precise and highly structured, schema-consistent and also transparent with regard to the acquisition method and its result.
The annotation of images is therefore well suited for small amounts of data with many dimensions of meaning. Unfortunately, the procedure is expensive because it involves high personnel costs and time-consuming because it cannot be automated. But it also depends on the perception of the people doing the acquisition, in the lucky case of relevant experts, but often also of student beginners in this field. Since the procedure is comparatively slow, it is not well suited for large amounts of data.

72] All image archives know the difficulty of being able to annotate large quantities of images in a manageable time. Crowd-sourcing methods could be of help, in which lay people are motivated in some way to contribute suitable parts of the annotation. This is particularly successful if a personal connection exists or can be established, in the case of data on the history of the hometown, for example, or popular works of art. This is where the game ARTIGO, which was developed in 2010, comes in, and anyone can join in. Users are presented with digital reproductions of works of art for which they have to enter keywords. In the process, they compete for a limited time against a virtual competitor whom they have to outdo. In 10 years, 90 million annotations to some 80,000 pictures have been collected in this way. The images tagged in this way now not only facilitate the search in the art-historical image database, but are also suitable for studies on the perception of artworks today.

73] Automated content indexing of images is possible by means of pattern recognition. Statistical and probabilistic models (such as data mining, image analysis, natural language processing, etc.) are used here. The process can be automated and leads to useful results quickly and cheaply. Therefore, it is also well suited for large amounts of data. Conceptual consistency of the acquisition over the entire process is guaranteed, and the results can be reproduced at any time.
One of the main problems of image recognition is its lack of transparency. What exactly the neural networks used in deep learning methods do is difficult to discern and can only be inferred from the results. Semantic depth is also a problem. Image similarity search can only detect correlations between images, but not generate causality. So far, therefore, pattern recognition is really only suitable for very large amounts of data, especially since most neural networks expect a uniform distribution of features. After all, in order to recognise one motif, countless ones are needed to learn beforehand.
In summary, one can say that small amounts of data and qualitative analyses benefit from the annotation method, while large amounts of images and corresponding quantitative evaluations should use image pattern recognition.

[74] However, the two procedures need by no means be mutually exclusive. After all, even with pattern recognition, the training data are first annotated. One study that uses both methods is, for example, the Passau Neoclassica project. CIDOC CRM enriches an ontology for formalising knowledge of art and architectural history with terms derived from contemporary sources. The controlled vocabulary is multilingual, including English, French and German. Deep learning and distribution semantics algorithms are also applied to image and text data. Automated object recognition searches museum photographs as well as contemporary drawings and engravings for the classified pieces, in this case specific chair shapes, while text mining connects to the terminology and descriptions.

75] One can perhaps gauge from this how much the image and object sciences are now also benefiting from digital processes. This is especially the case when we are dealing with large, best-structured image sets, which unfortunately are not yet generally available. So corpus building is still a very important thing. And to ensure that all efforts do not go to waste, they must remain compatible, which is best achieved with authority records.


Current research questions
76] Which brings us to the challenges in dealing with the digital image. As you have seen today, the acquisition and indexing of images is still a central topic. The critical and reflective handling of visual phenomena and their fluidity has also become more topical now and must be examined again and again, also from the Social Sciences. And last but not least, the theory of the digital image and the manifestations of the digital turn are also the subject of current research discussion and a priority programme of the German Research Foundation.


What you should know and what you should be able to do
77] If you now ask what you should know after this lesson, it is first of all the basics of visualising information. In addition, you should be able to name and distinguish the basic properties of graphics, such as image size, resolution, colour depth, colour space, compression, transparency, layers, etc. The file formats for storing pictorial information and their differences are important in practice, especially for the question of which data format is best to choose. I had already talked about current positions on the concept of the image and its relevance for the humanities in the first lesson. Today, it has been expanded by an important fascet, namely that of the digital image. I demonstrated the possibilities of textual labelling and visual annotation of images at the end, which is important in any kind of digitisation. But the relevance of metadata in (and to) image files also make up an important part of this.

[78] This results in the requirements for what you should be able to do in the future: on the one hand, you should be able to select suitable graphic formats for different usage scenarios and long-term archiving. On the other hand, you should be familiar with an image editing programme and be able to edit digital images, i.e. sizing, manipulation, cropping, working with multiple layers and suitable saving. But also the use of automatic batch processing for similar process steps is something I would expect from you.

[79] In the exam, there will be a first part where mainly terminological questions arise. Here I could ask, for example, what is meant by "image size" in computer graphics? This is followed by more extensive questions, which you should answer with more than one sentence. After today's session I could ask, for example: What is a digital image? What properties does it have and what are its implications? A shorter question would be: Which file format would you use for a printed publication, which for an image database? And of course you have to justify your answer.
Or I could ask: What possibilities for image-based data search do you know? More essayistic and thus located in the third part are questions like: What is the Iconic Turn? What part does the computer play in this development? And I will also ask you for your opinion, e.g. What differences do you see between text and image data? It goes without saying that you will have to justify your assessment here as well.

80] For more in-depth information, here are again some text books and an article that I recommend. And with that, I bid you farewell and wish you a good week.