Martin Langner, Introduction to Digital Image and Artefact Science (Summer Semester 2021) III. Analysis: Lesson 9. Quantification Methods (https://youtu.be/https://youtu.be/oPqOVcUqMdA) [1] Introduction [2] History of Statistics [7] Basic issues in dealing with statistics [10] Content of this lesson [11] 1. Basics of Statistics [12] Quantification of data [15] Descriptive Statistics [17] Inductive Statistics [18] Basic Concepts of Statistics I+II [21] Normal distribution [26] Standard deviation [29] Mean of a distribution [34] Basic concepts of statistics III [35] Correlations and causal relationships [40] Procedure of a quantification study [41] 2. Sampling: Basics [42] Large Data Collections on the Internet [53] Canonisation of knowledge [57] Data Acquisition [62] Sampling Methods [71] 3. Sampling: Examples [72] Sampling: geodata [80] Sampling: Objects (pottery finds) [85] Tests of representativeness [87] Quantification of data (summary) [90] Sampling: posters [97] Conclusion [97] Current research questions [98] What you should know and what you should be able to do [101] Literature [1] We’re living in the age of data. Our entire lives can be acquired in numbers and evaluated in the form of statistics. The fact that not only facts are cast in bars, but rather opinions are disseminated, should be clear to you at the latest with the election forecasts. But what is behind them? What principles should be observed and what mistakes should be avoided? With these fundamental questions, I welcome you to the 9th lesson of our introductory lecture in Digital Image and Artefact Science. Today we will be dealing with quantification methods. [2] Statistical studies in text corpora are very common in the Digital Humanities. They are used for linguistic questions and for semantic analysis of texts as well as for authorship attribution. One of the reasons for this is that texts can be classified relatively easily and unambiguously. With images and artefacts, the assignment to object classes is more difficult. We have already touched on the now very important field of pattern recognition in 2D and 3D data in our lesson on image analysis. Today we will therefore only talk about evaluations that can be made on the basis of measurements and linguistic descriptions. 3] Statistical surveys are already very old, because every state has to collect key figures in order to be able to build roads or equip schools and universities appropriately, and one has to know in advance how much all this will cost. The associated term statistics therefore initially referred to the study of the conditions (status) of the community, as formulated by Gottfried Achenwall, professor of natural law and politics in Göttingen, in 1749. It was not until John Sinclair used the term in his Statistical Account of Scotland of 1791 in the generalised meaning of collecting and evaluating data. However, the state-controlled collection of data is much older, whether in the form of censuses, import lists or accounts such as the building accounts of the Parthenon or the annual accounts of the Nuremberg Master Builders' Office. For this purpose, data is collected with regard to certain criteria and categories and compiled in tabular form. [4] However, a standardised and systematic preparation and mathematical evaluation of these data took place in the 19th century. In Prussia, for example, the Statistical Bureau in Berlin was established as early as 1805, the predecessor of all statistical state offices. Its task was to collect population, economic and financial statistics. With the founding of the German Empire in 1872, the Imperial Statistical Office came into being, which lives on today as the Federal Statistical Office. Parallel to this, statistics was also increasingly pursued scientifically and taught at universities, so that the methods of statistical analysis now also had a theoretical basis. Since the end of the 19th century at the latest, statistics has thus become a scientific instrument for the acquisition of social factors, characteristics and opinions, which is intended to place the formation of wills and decision-making on a solid basis. Therefore, it is no longer just the population that is surveyed, but society as a whole in all its facets of economic, social and cultural life. Quantifying methods have since become a fundamental part of research in economics, political science and the social sciences. [5] But what about the humanities? In 1863, one of the fathers of the historical sciences, Johann Gustav Droysen, used a review of a book to fundamentally address the question of the significance of statistical regularities for the science of history. He denied that statistics alone made it possible to formulate the immutable and general laws that also apply to history, as Henry Thomas Buckle had previously claimed in his two-volume History of Civilization in England. The latter's book had little influence on historical scholarship, not least because of Droysen's criticism, but is considered a milestone of modern sociology. And unlike later critics, Droysen was quite open to statistics: (quote) "It will not occur to anyone with understanding to deny that the statistical approach to human affairs also has its great value; but one must not forget what it can and wants to achieve." For the understanding of the Eroica or Faust must not be limited to legal or statistical laws. [6] Accordingly, today's topic will be the possibilities of quantifying evaluations. It should be said at the outset that even statistics in the form of tables, graphs and diagrams are not an objective compilation of facts, even if they seem to pretend to be, but rather data compiled under certain aspects, which must be understood and critically evaluated just like texts and images as sources. Statistics are in fact "a specific knowledge practice [...] that orders and categorises phenomena in order to enable comparisons as well as inclusion and exclusion". They are thus "components of complex decision-making and communication processes" as the editors of the book "Die Zählung der Welt. Cultural History of Statistics from the 18th to the 20th Century" in their introduction. 7] As is well known, everything can be collected and counted. And our media world is constantly fuelled by corresponding bar charts. For example, one could read in Spiegel, Stern and elsewhere the headline: Easter Bunny Beats Father Christmas! - where they compared the chocolate Easter bunnies produced in Germany with the chocolate Father Christmases. It is easy to imagine the story that can be developed from these figures. The downfall of the Christian West, for example, where the pagan Easter bunny is more revered than the Christian Father Christmas! Or the increasing commercialisation now also of Easter! But what seems comparable at first glance can, on closer inspection, be compared just as well as apples and oranges. For the gift-giving customs at Easter, where the Easter bunny is a standard gift along with Easter eggs, are different from those at Christmas, with more precious gifts. Here, Father Christmas is just one product among many different sweets such as nuts, gingerbread or speculoos. [8] Accordingly, the demand for German chocolate Easter bunnies as a speciality is much higher than for Father Christmas, especially abroad. As you can see, the fact that both are made of chocolate is not yet enough to draw any broader conclusions from their production figures. The basic error here lies in the reversal of thesis and evidence. A thesis was not proved or disproved by an investigation, but figures were first collected and then a suitable thesis was found. 9] A similar approach is sometimes taken in the humanities. For example, when Wolfgang Filser statistically compares vase paintings with depictions of drinking and sporting activities, and then draws conclusions from them about the self-image of the Athenian elite. We will take this example today to point out certain errors that occur more frequently. But we want to deal primarily with fundamental questions in dealing with statistics and ask: What are actually the basics of evaluation? and how convincing are then the results? What procedures are there? How do you create a good data basis? And what are typical mistakes? 10] The lesson is again divided into three parts. First, we will deal with the simple quantification of data in the form of descriptive and inferential statistics. We will learn about the normal distribution and look at the standard deviation. And the second part is about the important question of proper sampling, i.e. representativeness and data selection. In the third part we will look at sampling again from the point of view of two examples, namely geodata and pottery finds, and also learn to appreciate the concept of result accuracy. [11] The older ones among you will perhaps still remember the 10 DM note. The Göttingen mathematician and astronomer Carl Friedrich Gauss was depicted there, together with the Gauss curve. You can find out what this is all about in this chapter, which deals with the basics of statistics. 12] In the humanities, people are usually sceptical about quantitative research methods and investigations. You may be familiar with the sentence: Only trust statistics that you have faken yourself. Behind this is the view that you can really prove everything with numbers. Which is true if you don't follow the basic rules of statistics, which is what today is all about. Moreover, depth is often confused with width. Qualitative methods are needed for the precise analysis of individual research objects, a work of art or a literary text. But if we do not want to make the mistake of developing a generally valid research result on only three supposedly significant examples, we need a broad-based investigation. This may give us a result that we have already foreseen, a result that others have already uncovered in three examples, but we have now established it validly. Much more often, however, quantifying methods do not provide any evidence at all, but only indicate trends that we have to interpret further. What exactly does it mean if armament expenditure increased sharply in Germany and Russia shortly before the outbreak of the First World War? Did industrialisation now also reach the military, which led to an arm spiral, or was war specifically being prepared? Quantitative research methods are thus a complement to qualitative approaches. They can stimulate qualitative research or make its results probable. [13] The basis of statistics is therefore probability. What we consider probable is based on our experience. In this respect, we are constantly unconsciously collecting data in our everyday world and evaluating it inductively. "The jam sandwich always falls with the bottom side to the floor" or "The RE to Göttingen is usually ten minutes late." A statistical analysis can now help us verify these prejudices. In the humanities, it is usually true that a thesis with the best arguments for its correctness is valid until other facts or better arguments against it are found. The question, for example, of how to date an object depends on statements in historical documents, stylistic comparisons or stratigraphic findings. All these sources only indirectly tell us when the object was created. Historical dating therefore claims the highest possible probability, but is not proof in the mathematical sense. They can rather be called proofs of probability. And the more evidence there is, the higher the probability, of course. [14] To test a hypothesis, such as the shape development of Copper Age flat axes, a suitable data basis and appropriate methods are necessary, which must be applied depending on the thesis in question. To prove our point, we need as many axes as possible that we can compare with each other and whose characteristics we can acquire. These characteristics can then be evaluated statistically. For the question of the stylistic development of prehistoric axes in Europe, however, other finds from Europe or axes from other periods and places of production are irrelevant. As a first principle, therefore, we can already state: The data basis must be established in dependence on the hypothesis to be tested, i.e. it must be representative for this question. [15] Strictly speaking, a distinction is made between two types of statistics: Descriptive statistics sort the data according to certain criteria and illustrate this in simple key figures, tables and graphs such as bar charts, which show the numerical values in the form of horizontally arranged bars. Here, for example, the number of museum visits and number of museums in Germany in 2017. [16] A distinction is made between categorical and numerical data. Categorical are all values that measure the number of people or things and that have been divided into linguistic categories for this purpose, such as genre, theme, colour or epoch. These are measured by the number of individuals in a group, which is also called frequency. The frequency can be expressed both in absolute values and relatively as a percentage. The best visualisation of these data are bar charts for absolute frequencies and pie charts for relative frequencies. Numerical data, on the other hand, are measured values such as height, weight or distance. Here the values also have numerical meaning, which is why calculated values such as mean, dispersion or relations between the values can also be given here. Histograms or box plots are best suited for visualisation here. The main task is thus to find out how a distribution of a characteristic could best be described and presented. [17] For the investigation of the probability of differences belongs to inductive or inferential statistics. It asks to what extent what is measured corresponds to reality by deriving properties of a population from the data of a sample. Inferential statistics is therefore essentially concerned with the question of the randomness of statistically measured phenomena. Thus, one asks oneself to what extent a mean value measured in a sample could deviate from the mean value of the population; one asks oneself in the case of different samples whether they can still belong to the same population in view of their measured differences and more. Here, therefore, an attempt is made to classify the examined sample into a larger whole, whereby broad space is also given to the examination of the probability or differences of correlations. 18] Some important basic terms have already been mentioned in my explanations: A characteristic (also variable) is the respective peculiarity of the object of investigation, the characteristics of which can vary (in contrast to a constant). In the case of collection objects, for example, a characteristic could be the affiliation to a certain genre or epoch, the weight, the inventory number or the market value. A characteristic is understood to be the totality of possible values of a characteristic. For example, the variable "genus" in our case can take the form of animal specimen, plant specimen, mineral and model, and the variable "sex" of the specimen can take the forms male, female and bisexual. On the other hand, the numerical description of the characteristics of a variable on the basis of measurements or counts is called quantification. In the collection shown here, the number of animal specimens (if I counted the butterflies correctly) is 13. [19] You may have noticed that the feature types can be different. Qualitative characteristics are those whose expression can be recorded as terms on a nominal scale. We also spoke above of categorical data. Quantitative, on the other hand, are measured values or numerical data. And thirdly, there are ordinal numbers, which arrange the characteristic in a ranked list, as is the case, for example, with evaluations in the form of school grades. [20] And once again for the record: The population (or basic population) refers to the totality of all elements for which the statements of the study are to apply. Since a complete survey is rarely possible, one usually selects a sample, i.e. a selection of observation units from a defined (basic) population. A sample should reflect this basic population without bias, e.g. through the model of representativeness, which we will talk about later. However, I would like to point out here already that the basic population must also be determined exactly. If you read, for example: Germans prefer beer to wine or would vote for a certain party if there were federal elections on Sunday, the population here is not all Germans, but in the last example all persons older than 18 who had a German passport in May 2020 and were registered with their main residence in Germany, or in the first case all persons over 16 who shop in Germany. The "Germans" here therefore form two different populations! 21] The reason why it is so easy to work with random samples in empirically collected data is that in nature measured values are usually normally distributed, as the Göttingen mathematician Carl Friedrich Gauss already discovered. Let's say that on a chicken farm with a lot of chickens, the individual eggs were weighed for a week. Let's define the random variable X as the weight of an egg in grams. Then it turns out that an egg weighs 50 grams on average. The expected value is therefore 50. Furthermore, let it be known that the variance varX equals 25 g2. So we can approximate the distribution of the weight as shown in the graph. We can see that most of the eggs are near the expected value of 50 and that the probability of getting very small or very large eggs becomes very small. We are looking at a normal distribution. It is typical for random variables that are made up of very many different influences that can no longer be separated, e.g. weight of the chicken, age, health, location, heredity, etc. For example, the probability that an egg weighs at most 55 g is 0.8413. This corresponds to the red area in the figure. "Normal distribution" is therefore understood to mean a steady (continuous) distribution of randomly collected data. This includes, for example, the height or intelligence of people. But also the amount of ceramics in a deposit or the settlement density in a quadrant is normally distributed, if it is not subject to special culture-related changes. The normal distribution therefore applies to all random samples of a population. 22] The special significance of the normal distribution is based, among other things, on the central limit theorem, which states that a sum of n independent, identically distributed random variables with finite variance in the limit n→∞ is normally distributed. This means that random variables can also be regarded as normally distributed if they result from the superposition of a large number of independent influences, with each individual influencing variable making an insignificant contribution in relation to the total sum. [23] The normal distribution applies to measured values that cluster around the arithmetic mean, provided the sample size is large enough. In our chicken example, it was the value 50. This average value is obtained by dividing the sum of all measured values by their number. Usually, the measured values are plotted on the x-axis and their frequency on the y-axis. This distribution only makes sense for characteristic values in numerical form, i.e. for characteristics of an ordinal scale. Qualitative or quantitative characteristics cannot be normally distributed, even if they are subject to chance. Approval of political parties, for example, is a qualitative characteristic. If it were normally distributed, we would no longer need to hold elections. Or to stay with the chicken example. The amount of eggs laid varies from day to day. It is possible to calculate the mean value here, i.e. to indicate how many eggs were laid on average per day, but such a frequency distribution as a bar chart with the days of the week on the x-axis only visualises the collected data. There is no frequency distribution, because unlike weight, there is no numerical correlation for days of the week. A sample was not taken, but a count was made on each of seven days. 24] This is why Wolfgang Filser's iconographic study has several problems. On the one hand, the sample did not come about by chance due to the special conditions of transmission, which we still have to discuss, and on the other hand, year numbers like weekdays are not measured values and have nothing to do with normal distribution! Thus, approximating the bar chart to two Gaussian curves suggests a representativeness in the data basis, which is not given here. 25] This raises the question of how representative such statistics are in the first place. After all, representativeness is only given if the composition of the basic population is replicated in the selection of the elements of the sample or is proven by tests. In order to draw conclusions about the basic population from the results of the statistical evaluation, the sample must be representative so that generalised statements can be made. The term "representative sample" is actually misleading, because samples usually never represent the population exactly. It is more correct to ask about the significance of the sample, but the term has become commonplace and we will continue to use it. Let us start with the sample size. This is usually abbreviated with a small n. Please get into the habit of always stating the sample size. This is a requirement of scientific rigour. Wolfgang Filser has printed his bar charts in percentages and without this indication, and you have to search for the values in his book to find this important information: In Symposia he examines 373 representations, and with athletic subject matter he has collected 1216 vase paintings. So that means his samples vary in size, which is why a comparison in percentages is useful. But do you consider the total number of vase paintings examined to be meaningful at all? And within what limits? [26] These limits can be estimated. To assess the sample size, we are interested in the standard deviation, i.e. we ask about the fluctuations or dispersion of the values in a sample. If you are looking for a flat in Göttingen, for example, the average price for rented flats of 8.40 € per sqm says nothing about the prices that have to be paid for student flats here if you do not know their range. For this, you first need the deviation from the mean value or the arithmetic mean, as the statistician says. The standard deviation is therefore nothing more than the average distance from the arithmetic mean. Imagine the measured values in a coordinate system. The standard deviation is then the average deviation of a point from the centre and the centre is precisely the arithmetic mean of all values, i.e. the average price. If you calculate the average distance of all points from the centre, you get the standard deviation. Here it is 1.20 €. 27] Instead of calculating the standard deviation exactly, an estimated value (s) is often used that applies to 95% of the measurements. Depending on the sample size (n), this is ± s divided by √n or in percent ± 100 / √n. This value applies to normally distributed values of a population. The error limits of normally distributed values are calculated from the sample size in relation to the total quantity. This is also called statistical uncertainty, which is given as the confidence interval (e). It indicates how much the measured value is estimated to deviate from the true result value (with an approximate 95% probability). This value also does not decrease linearly, but exponentially with the size of the collected data. I will spare you the complicated calculation and just give a few confidence intervals (e) to indicate the error limits for orientation: For a sample size of 30, the deviation is ±18.3%, for n = 100 it is ±10.0%, for 200 it is ±7.1% and for 1000, which is common in surveys, the statistical imprecision is ±3.2%. As you can see, a sample size of about 5000 with an error margin of ±1.4% would also be ideal for humanities studies. Larger samples would hardly change the result. 28] Applied to our example, this means, if you follow the green line, that with this sample size of 373 vase paintings with symposion representations, each value could also be 5.8% higher or lower. Most of the values thus lose their significance completely. The situation is somewhat better with the second list, where the margin of error is only plus/minus three percent. However, the confidence intervals only apply to normally distributed values. In our example, the error limits will certainly be much higher. In order to assess such graphs, it is therefore absolutely necessary to specify the sample size (n) that was surveyed! And please never underestimate the suggestive power of such visualisations. In colloquial language, statistics is often equated with graphics. These graphs are based on a quantity distribution, not a statistical study. If we were to arrange the numbers as a pie chart, for example, they would be relatively inconsequential. The effect of the numbers is that they are plotted on a time axis and thus visualise a development in the manner of "rise, blossom and decline". As an archaeologist, I would be sceptical about whether the selection of the sample does not play a role here. We will have to talk about this in more detail in the third part. 29] If the values are not normally distributed because the sample is not subject to chance but has historical, social or cultural causes, for example, it makes sense to calculate the median in addition to the arithmetic mean. It denotes the mean value in a sorted series of values. For the egg-laying hens in our example, the arithmetic mean is 312.86, i.e. an average of 313 eggs were laid per day during the week. The median, as the middle value in the series, is 300. A value m is the median of a sample if at least half of the sample elements are not greater than m and at least half are not less than m. The median of a sample is the value m of the sample. Of course, this distinction is only meaningful at all in large data sets, where one wants to remove "outliers" from the data. 30] In a famous study, Franco Moretti examined the title length of British novels, where a significant shortening can be observed at the beginning of the 19th century. His chart gives both the average and mean length to give a fuller picture of the degree of variation in titles: The average gives information about the often extravagant length of some titles - while the mean indicates the "central" length of the year in question, (i.e. what has the same number of entries above and below it). The difference between the two forms of measurement is particularly evident in years such as 1780 (with the 346-word History of Miss Harriot Fairfax) or 1784 (with the 273 words of The Maid of the Farm): in these two cases the average is 37.9 and 19.7 respectively, while the mean (8.5 and 7) is barely affected. [31] In digital image science, the median is used especially for the comparison of measurable values such as brightness, contrast or saturation by reducing the image to a single value to describe the image effect. In our example, the cover of the fashion magazine Vogue in the German edition of April 2004, for example, one could use the mean value of the brightness of all 629,184 pixels. In this way, the covers of this magazine can be compared chronologically, as Peter Leonard did in 2013. You can see here the distribution of the mean brightness of all international issues of Vogue plotted on a timeline. So you can easily see that since the late 1980s the spectrum has always been broad and balanced overall. This suggests that, for marketing reasons, attention may have been paid to variety in the design. 32] For qualitative characteristics, the mode or modal value is the only obvious choice. This shows the most frequent value of a distribution, i.e. the value with the greatest probability. Since a distribution, when represented graphically, can have several peaks or summits, a distribution can also have several modes assigned to it. If there is only one modal value, the distribution is called unimodal. But if there are about the same number of hits for one value as for the other, the distribution is bimodal. To take an example from one of our research projects. If, for example, we sort all the marble busts in the antique collection of Tomasso Obizzi, which were estimated to be two Venetian Zecchini after his death, according to their height, the average value is 58.68 cm, while the mean value, which does not take so much account of the outliers, is 61 cm high, and thus better reflects the standard expectation at the time of the size of an antique marble bust. Put into a diagram, however, it becomes clear that the distribution of sizes is multimodal and has several focal points. There is no test for the representativeness of data, but you use descriptive methods here by showing that the values of the sample do not differ significantly from the values of the population. Which descriptive methods you use for this depends on the type of variable. With nominal, i.e. ordinal data, you take relative frequencies. For metric data, use the mean or median, for example. In addition, you specify the confidence intervals in each case. By comparing these numbers and discussing the difference, you can show that there is (hopefully) no clear difference and thus representativeness exists. [33] These ratios of a sample are also called location measures or central tendency. They indicate, for example, where the centre of a distribution is located. In addition, there are measures of dispersion for the variability (or dispersion) of a frequency distribution. For example, the empirical variance, which is the mean squared deviation from the arithmetic mean, the range, which is the difference between the largest and smallest observation, and the standard deviation, which is calculated by taking the square root of the empirical variance. All of these values are important for gaining insight into the nature of large data sets. We will return to this briefly in the next lesson. 34] And to summarise, I have written down some more important descriptive statistics terms for you on this slide. Representativeness is given when the composition of the basic population is reproduced or approximated by tests when selecting the elements of the sample. The term probability is used to describe the classification of phenomena according to their degree of certainty. The probability p is expressed with values between 0 (impossibility) and 1 (certainty of occurrence). Relationship in the statistical sense refers to a systematic correspondence between the characteristics of two variables, and correlation to the relationship between two quantitative characteristics. The strength of the correlation is expressed by the correlation coefficient. It lies between the extremes -1 and +1. If it is positive, this means that a high value of variable A is accompanied by a high value of variable B, and the same applies to low values. [35] Indeed, an important task of statistics is not only the distribution of values, but also the study of correlations between variables. Take, for example, the graphics of the Brücke artists Karl Schmidt-Rottluff, Hermann Max Pechstein, Ernst Ludwig Kirchner, Erich Heckel and Otto Mueller, who you see here in self-portraits. At the beginning of the 20th century, the members of the artists' association "Die Brücke" rediscovered the woodcut for their expressive art, which seemed particularly original to them. If one now wants to examine the position of the woodcut in the graphic work of these artists, i.e. if one wants to know what connection exists between the characteristics "artist" and "printing technique", one must interweave them and present them in table form. This table is then called a cross table, frequency table or contingency table. In our example, there are two variables. I have entered the values of the variable "artist" in rows from top to bottom and the values of the variable "printing technique" in columns from left to right. In each individual cell, the specific frequency of the respective combination of the expression of the variable "artist" with the expression of the variable "printing technique" is noted. With a manageable number of columns, crosstabs can be output quite well as bar charts or, as here, as column charts. In this way, the frequency of the expressions can be recorded relatively easily at a glance and it is also possible to see whether an artist frequently produced graphic art. Heckel and Kirchner were generally fond of this form of artistic expression; unlike Otto Mueller, for example. [36] A cross table can show both absolute and relative frequency. For our question about the position of the woodcut in the work of the respective artist, we are more concerned with the relative distribution, which is represented in percentages. We now see that the woodcut plays a prominent role for Schmidt-Rottluff, Heckel and Kirchner reach for the woodblock with roughly equal frequency, and for the painterly Otto Mueller, lithography is the absolute medium of choice. Stacked column charts are well suited for visualisation, or pie charts, which are only suitable for relative frequencies. [37] In our example, we have evaluated the catalogue raisonnés and thus all the prints created by the artists. We are talking here about the basic population, which with 4817 prints was not very large. With larger quantities, such as the quantity of all Germans who go on holiday, one only examines a sample. In this case, the statements of descriptive statistics always refer only to the examined sample. With descriptive statistics we do not investigate whether these differences could possibly be random (because the sample was too small; the assessment distances too small; or because the selection was not representative). 38] Many readers of statistical analyses are interested not only in the general distribution of the data, but also in the underlying correlation. They want to know whether, for example, there is a causal relationship between expressionist artists and the printing technique used or between chocolate consumption and the number of Nobel Prize winners in a country. In 2012, an article in the New England Journal of Medicine tried to establish such a connection and recommended the daily consumption of preferably dark chocolate. Franz Messerli had related the amount of chocolate that people consume on average (in kg per person per year) to the number of Nobel laureates from that country (per 10 million inhabitants, to correct for population size). The author then found the correlation shown in the graph (r=0.79, p<0.0001). However, if two variables correlate very strongly with each other, this does not at all mean that these two variables actually influence each other. One may find this plausible, but it is wrong. Often, a third variable is causally involved in the correlation that was not recorded at all, such as the availability of chocolate or the wealth of the respective countries, which has an effect on higher spending on education and research, but also on chocolate consumption. Sanjay Basu, Assistant Professor of Medicine at Stanford University proves in his blog that there are stronger correlations between per capita borrowing from commercial banks and Nobel Prize winners (r=0.92, p<0.05) or owning luxury cars and Nobel Prize winners (r=0.85, p<0.0001) than between chocolate consumption and Nobel Prize winners, and therefore rightly asks in conclusion: does this mean that using credit and buying an Audi makes you smarter? And in this case, there is another problem: when you examine aggregate data, you cannot determine whether the higher chocolate consumption actually occurs among those individuals who have higher cognitive functions, let alone are Nobel Prize winners, but only that there are both Nobel Prize winners and chocoholics among the residents, to put it pointedly. And thirdly, it is important to note that the presence of Nobel Prize winners is not a good measure of nationwide cognition because they are extremely rare individual cases and therefore cannot be considered representative. [39] For historical periods before the 19th century, however, there are usually no data that have been systematically collected for statistical purposes. Rather, the data are published in a scattered manner and are incomplete. This inconsistency of source material confronts researchers with the problem of having to make estimates from other data. For example, if one does not know the size of the total population of a place, but does know the number of taxpaying citizens, one can try to extrapolate these figures with the help of estimates such as the assumed average size of a household. The reliability of the estimated values determined in this way then determines the further investigation to a large extent. If, for example, one estimates the population of Pompeii from the number of seats in the amphitheatre or the area within the city walls, which is what archaeologists like to do, one fails to recognise the prestige that this small city wanted to claim in regional and supra-regional comparisons. For not all quarters of the city are equally densely populated, and the amphitheatre is one of the largest of its time. 40] From what has been said so far, a typical sequence of a quantifying investigation in the humanities emerges: At the beginning there is the research question or hypothesis, which refers to sources that can be collected and evaluated. These sources must be critically sifted and interpreted. This usually involves a clarification of the most important terms as preparatory work for operationalisation, by which is meant the definition of the variables. Subsequently, the survey process should be conceptualised, which also includes the coding or classification of the characteristics. The subsequent sampling determines the nature and size of the sample and, if necessary, prepares it. Now we are in a position to analyse the data collected in this reflective way. In doing so, we have to decide (still depending on the research question) whether we want to proceed descriptively or inductively, because as you now know, this determines the choice of means. And the results of our analysis are then presented graphically in a way that makes them easy to follow. [41] The two most common bases for statistical investigations are surveys and experiments. Unfortunately, neither of these are available to the historical sciences, because we cannot subsequently interview anyone from the past, and we usually do not have enough representative data for an experiment either. Therefore, the most important task of any quantifying evaluation is to create a suitable material basis. In doing so, one must always keep in mind: Data is not given, but compiled, constructed as an interpretation of the world we're living in and which is not inherent in the data. Data sets are therefore never random samples: they are collected by people according to parameters, however complex, hidden, unconscious or random they may be. In the second part, therefore, we want to talk about sampling, that is, the meaningful compilation of data. [42] The process of digitisation is already quite advanced in some areas, and there are also some large data collections available online. The oldest is Project Gutenberg, founded back in 1971, which makes over 60,000 e-books available free of charge. At the moment, however, access is blocked in Germany for copyright reasons and only the German sister site is accessible. 43] Another portal for books is the Haithi Trust Digital Library with just over 9 million digitised books. 44] And the largest portal is Europeana. There are, as of May 2021, 52,254,880 works of art, collectibles, books, films and musical pieces from European museums, archives and libraries. [45] while its American counterpart, the Digital Public Library of America, listed almost 44 million images, texts, videos and sound files in the same period. [46] The Internet Archive has been storing web pages in various versions and other content freely accessible on the internet since 1996 and has now also developed into a large platform for data of all kinds. Among them are more than 3.8 million image files and 6.9 million films, mainly from American television channels. [47] Google Arts & Culture (or Google Art Project, as it used to be called) is more oriented towards the digitisation of cultural heritage. The museum-like online platform includes high-resolution images and videos of art and buildings from a wide variety of cultural institutions around the world. We'll look at this in more detail in the Virtual Museums lesson. 48] Artstor.org makes more than 1.3 million works from public collections available free of charge, but the total volume of 2.5 million files is only available to subscribers. [49] And the situation is similar with the German image archives Prometheus with 2.8 million digitised images ... [50] and Foto Marburg with about 2 million images. [51] On wikipedia you will find clear lists of the most common online archives, libraries and museums. All these data collections are very suitable for researching individual sources, but are not suitable for Big Data analyses because the data sets cannot be addressed or downloaded in their entirety for copyright reasons. 52] And here is another list of data collections on modern history that are suitable for quantitative analyses. [53] But that is not the only problem: the collections of museums and archives were not digitised in their entirety one after the other, but in selections. Each museum wanted to put its highlights on the web first. Portals such as artstor.org, which compile digitised artworks from all over the world, increase this effect even more. Lev Manovich and his students searched there for works of art from 1930 and found only a few dozen paintings that were created outside Europe and North America. By contrast, a search for "Picasso" alone yielded nearly 700 hits. From this you can see how even mass digitisation deepens canonical notions of art and stereotypes instead of broadening our knowledge of global art development. [54] The 2003 book Human Accomplishment: The Pursuit of Excellence in the Arts and Sciences, 800 B.C. to 1950 by Charles Murray was one of the first Big Data applications in historical scholarship and founded the so-called historiometry. Murray had scanned encyclopaedias and handbooks and calculated the significance of personalities from culture, politics and science based on the frequency of mention and length of discussion. From the 4,000 innovators in the respective fields extracted in this way, a raw value is determined and normalised so that the lowest value is 1 and the highest value 100. 55] One will be justifiably shocked at how the 19th century idea of genius is being revived in the 21st century. Even in the digital age, cultural history is written using the example of a few 'masterpieces', geniuses and exceptional artists, and thus continues to look more for the exceptional than the typical. What Murray's study shows, however, is how strongly the acquisition of knowledge is linked to canonical pre-selections, just as if one could grasp the essence of the Renaissance by only thoroughly studying Michelangelo, Raphael and Leonardo (and in that order!). In the meantime, the digital humanities are countering this deficiency in all fields, and perhaps soon, conversely, one will recognise the exceptions alongside the mainstream, but whether these will then again be the same celebrated masterpieces and exceptional personalities that used to dominate cultural history remains to be seen. Since the selection processes are manifold. [56] These selection processes are so firmly anchored in our minds that it is not easy to circumvent them. In a study of photos taken in Manhattan in 2014 and uploaded to Instagram, it became clear that half of the photos were taken in only 12% of the area, namely the tourist areas of Manhattan. This means that even in extraordinarily populous and technologically advanced areas of the world, the existing monuments and institutions have no chance of being handed down to posterity unless they have been hyped up into tourist attractions. The digital turn has not yet led to a significant broadening of the data base! 57] For even if we significantly broaden the data base in terms of categories and take into account as many aspects as possible, all this richness and diversity of works, as currently presented to us in Europeana, for example, does not automatically reflect a comprehensive cross-section of cultural products. So the question is: how do you create an appropriate data set that systematically covers what was created in a particular period, region or medium? If we wanted to conduct a study on painting books, for example, we would have to compile all the painting books, at least of a certain period or culture. But, of course, not all painting books have been digitised by a long shot and we would not be able to do that. But even if we limited our question to the painting books in a large museum like the Metropolitan Museum in New York, whose holdings are available online, we would have to ask ourselves whether everything has really been digitised and put online. And even more, what collecting events, what acquisitions and donations underlie this collection and perhaps distort our picture in a certain direction. 58] However, in some areas of research, such as Greek vases, there are scholarly databases that supposedly list all the important objects in this subject area. We had already talked about this. The Beazley Archive Pottery Database is currently the largest database on pictorial sources from Greek antiquity and contains almost 100,000 entries. But is a database like this, whose data can even be exported, suitable for statistical analysis? To answer this question, we need to take a look at the origin of this data: The foundation of the Beazley Archives is the photo collection of the British vase researcher John Davidson Beazley, who spent his life researching the attribution of Attic vases to different painters' hands. He compiled his findings in extensive lists, and archaeological researchers have used the pages and numbers in these volumes as addressable entries for each vase mentioned there. Beazley's lists included 12,786 Attic black-figure and 21,286 red-figure vessels from Athens, and the Beazley Archives database included 42,265 and 51,908 entries (as of May 2021). The number of catalogued vases has thus tripled since 1970! But you can already see the first limitation here. The Beazley Archive continues to collect primarily Attic vases, even though vessels from other styles and production sites are now also included. [59] Let us take a closer look at the data by evaluating the chronological distribution. Filippo Giudice and his team did this in 2012 and it is immediately apparent that the vases recorded are not evenly distributed chronologically, but have a focus in the first half of the 5th century. However, we are not looking at a Gaussian distribution curve here, but at an art historical model of rise, flowering and decline, because exhibitions and archaeological literature like to regard this period as the heyday of Attic pottery, and have accordingly reproduced a particularly large number of vases from this period in publications. Even Beazley (and the research after him) was not interested in the 4th century and he only dealt with these one hundred years summarily. In our chart, the century, in contrast to the other bars, does not take 25 but 100 years. We will discuss such visualisation errors next week. But the distortion is also evident in the figures. The 2,000 vases in the Beazley archive are contrasted with 12,131 entries in my vase repertory. We thus find that, at least for the 4th century BC, the data in the Beazley Archive are not representative, because collections were not made here with the same intensity as for the 5th century BC. 60] Beazley was already selective and only recorded what he could assign to a painter or a workshop. Of the vases in his private possession, including iconographically very interesting pieces, only very few were recorded by him. As an example, I show a double page from the catalogue of his collection, where only a small lekythos is recorded in his lists of painters, which, after all, form the basis of the Beazley Archives. It should therefore be noted that the recording criteria of the data basis are essential for the evaluation! 61] Lisa Hannestad had already drawn attention to this problem in 1988, using the example of the pottery from two central sites in Athens. She made it clear that Beazley did not record a proportionally constant share of painted Attic vases from the finds of the Agora and Acropolis compared to the known quantity, but rather, for example, a particularly large number of lekythoi, which he could easily connect with workshops. The more sophisticated bowl paintings, on the other hand, are underrepresented here. His sample is therefore not suitable for statements about the geographical distribution of Attic pottery or even individual types of vessels. 62] As you can see, the basic population of all painting books or vases cannot be compiled. It is impossible to gather and examine all the data in question because they are inaccessible or unrepresentative in this form. Therefore we have to make a selection, which conditions our sample. Sampling is therefore a defined method of selecting data for a statistical investigation in such a way that analyses on these data allow conclusions to be drawn about the population without systematic error. But before choosing a method, the frame of reference must be determined, i.e. the collection of units from which the researcher then selects his sample, i.e. the category of the population, such as an epoch or a genre. 63] The simplest would be random sampling. It is like drawing names out of a hat. In random sampling, every entity and every subset in the population has an equal chance of being selected. So each element of the reference frame has the same probability of being selected: the frame is not subdivided or partitioned. This is easy because it is quick and can even be done automatically. In particular, the variance between individual outcomes within the sample is a good indicator of the variance in the population as a whole, which makes it relatively easy to estimate the accuracy of the results. However, because it is a random selection, small samples may result in an unrepresentative sample that may not include an entity at all. For example, in a simple random sample of ten people from the German population, one would expect five women and five men, but it might just be women. Systematic and stratified methods attempt to overcome this problem by using "information about the population" to select a more "representative" sample. [64] In systematic sampling, the population of the study is ordered according to a specific ordering scheme, and then items are selected from this ordered list at regular intervals. An example of systematic sampling would be selecting one person in ten from a sorted list or database. Selecting names from the telephone book, for example, gives a representative distribution by first letter of the surname. The same applies to data sorted by, for example, measurements, dates or location coordinates. As long as the starting point of the selection is chosen randomly and the variable according to which the list is ordered correlates with the variable of interest, systematic sampling is a kind of probability sampling. However, it carries the same risk as random sampling of not being representative if the sample is too small and not precisely matched to the research question. [65] However, systematic sampling is particularly susceptible to periodicities in the list, such as peculiarities in odd house numbers, estimated values in even numbers or weekdays in leap years. This is because if the period is a multiple or factor of the interval used, the sample is likely to be even less representative of the total population than a simple random sample because of the systematic error. This is also true for elements whose peculiarity depends on the predecessors or successors in the list, such as the always identical sequence of motifs or techniques in artists' œuvres ordered by year. Here, there is a high probability that the data will be skewed in one direction, for example, by tending to select watercolours or still lifes more frequently. [66] In stratified sampling, the sampling frame is divided into separate strata, i.e. subgroups, which have been divided according to clear criteria such as genre, material, epoch, motif, origin, size, etc.. Individuals are then randomly drawn from these subpopulations. If, for example, you are examining picture postcards and have divided the sampling frame into motifs, you can specify the sample proportion precisely. For example, you would draw a quarter of the items from the group "Famous buildings" if you know from a previous study that 25.6% of the picture postcards produced in Germany show landmarks and tourist attractions. [67] The stratified random sample offers several advantages. On the one hand, additional statements can be made in this way about subgroups that would not be sufficiently accurately represented in a random sample. Secondly, it may be that elements in one sub-area are better documented than in another. In such cases, using a stratified sampling approach may be more convenient than aggregating data across groups, and more representative because one can use the most appropriate approach for the particular stratum. If, for example, there is a dependence of the range of motifs on the preferences of the respective patrons, one can take this factor into account in the selection of certain motifs, for example by weighting the ruler portraits of absolutism, which were demanded by aristocratic clients and are very numerous, differently from the portraits of 20th century politicians, who were not so inclined to pictorial self-portrayal. However, such weightings are often too complex and also not researched closely enough to actually apply. [68] By focusing on important subpopulations and ignoring irrelevant subgroups, one increases the accuracy and efficiency of the estimation. Therefore, a stratified sampling approach ideally has three advantages: Variability within strata is minimised. Variability between strata is maximised. The variables by which the population is stratified are highly correlated with the desired dependent variable. However, it necessitates the selection of relevant stratification variables, which can be difficult. Therefore, the approach is not useful if there are no homogeneous subgroups. [69] In quota sampling, the population is first divided into mutually exclusive subgroups, just as in stratified sampling, and then a certain proportion of units is selected from each segment. This is the case in targeted surveys where, for example, out of 100 participants, 50 are supposed to be unemployed. The interviewer could then easily find these 50 by going to a job centre. Or you determine that twice as many oil paintings as watercolours of each painter are to be included in order to adequately account for the art claim. By specifying this, quota sampling is not probability sampling, because the selection of the sample is not random. Therefore, one should carefully consider the consequences of segmentation and whether the advantages really outweigh them. [70] Sometimes it is more efficient to select samples by groups ("clusters") clustered by geography or by time period. For example, if we wanted to study the furnishings of church interiors in Germany, we could select fifty socially, denominationally or chronologically stratified congregations and then catalogue each item within the selected churches. However, we could also randomly identify regions, take random churches from those regions and select random furnishings from those churches to compile the sample. But this requires a larger sample than simple random sampling because of the higher variance. [71] Actually, we are almost at the end of our lesson in terms of time. However, since the topic is relatively obscure, I will go overboard this time and add a third part with examples. 72] Geodata can also be statistically evaluated if they have been acquired using random methods. Basically, there are various methods for the acquisition of spatial data from historical and prehistoric periods. As we have already discussed in the previous lesson, information about the settlement structure of past times can be obtained from their above-ground remains, by picking up surface finds (in the survey), by cleaning and smaller excavations, by geodetic and geomagnetic methods or by large-scale excavations. As an example we choose a landscape in western Montana. [73] The example is hypothetical and lists eight different prehistoric and historic structures that would have come to light through excavation: a historic wagon track, three Archaic necropolises, a Palaeoindian quarry, a historic homestead and two Archaic settlements. In order to be able to statistically evaluate the area, it was divided into 27 x 37 (i.e. 999 in total) quadrants of equal size. The diagram here shows all the structures we hypothetically assumed. In the following, we will deal with the question of which sampling, i.e. which type of sample, will yield the best results. [74] A targeted selection of the study area (i.e. a non-probabilistic sample) is used when one is only interested in certain structures that are already known. In our example, this means that the wagon track and the farmstead are well researched, but the six prehistoric structures remain unknown in this way. Therefore the purposive sample is not a representative sample. 75] Probabilistic sampling, on the other hand, uses statistical methods to study only representative areas of a territory. The percentage depends on the natural conditions, the types of settlement remains and the financial and time possibilities of the researchers. Here, only 5% randomly selected quadrants were investigated, resulting in the acquisition of six of the eight sites. However, larger lots remain unexplored in this way. A different random selection might have identified more (or fewer) sites. 76] The stratified random sample takes more account of the terrain structure. Here, the study area is first roughly divided into different, topographically distinguishable areas, within which quadrants are again randomly selected. In our case, the river was thus a landmark that could be used to distinguish a sloping riverbank zone (above the dotted blue line) and a flatter prairie zone (below the dotted blue line). According to the size of the two parts, two- fifths of the quantifiers were randomly distributed over the riverbank zone visible in the aerial photograph above and three-fifths over the flatter prairie zone. With this stratified random sample, four of the eight sites were encountered. So you can see that the distribution of sites and monuments is not normally distributed, but is subject to spatial or socio-cultural conditions. What one achieves with a random sample therefore depends only on the fortune of the researcher. [77] With a systematic sample, the units are evenly distributed over the study area. In this way, there are no large areas that are not covered, whereby already known structures or areas of special interest (such as valleys or hills) may indicate settlement forms that could be studied in a more small-scale manner. 78] However, a combination of stratified and systematic sampling might be the best. For this purpose, the study area is divided into different, topographically distinguishable areas, within which quadrants are randomly selected. For each of the above-mentioned samples can be useful and none has any prospect of complete acquisition of the existing structures. The consequence of this, however, is that large-scale structures are more likely to be acquired than small-scale ones. [79] Most archaeological excavations are similarly laid out and divide the excavation area into such quadrants. The American excavations at the Agora of Athens, for example, proceed in this way. Here, almost the entire mapped area has been excavated. If we are now interested in the distribution of finds, we can also consider the quantity of what has been preserved as a random sample. For of all the objects that were used in antiquity, each generally has the same probability of having been lost or disposed of. In this way, for example, we can draw a reasonably representative picture of where figuratively painted red-figure pottery was used. [80] Let us stay with red-figure pottery and ask ourselves which fragments we need to include in our database and which not, if we want to work on the motifs on these vessels. As an example, I have chosen small-format amphorae, the so-called pelikai, from the settlement excavation in Olbia on the Black Sea, because the excavators presented all the excavation material in colour photographs. However, you could also take any other settlement excavation. [81] Only fragments that can be clearly assigned to a motif are suitable for iconographic questions. In our examples you may be possible to match the fragments to vase paintings of unbroken examples. [82] This makes it clear that we can only include a few fragments in our corpus and by no means all of them. [83] Another problem is that not all material from all excavations has been published. If one compares the published fragments from Olbia with those from Athens, it is immediately noticeable that the publication of the Agora fragments does not include marginal and foot fragments, fragments of secondary sides and smaller fragments. The different documentation and publication situations have the consequence that the find spectra of the respective sites are difficult to compare with regard to their distribution according to vessel shape on the basis of the published inventory. However, this has hardly any effect on the fragments compiled in our sample, because only vases and fragments were recorded here for which the image motif of the main side can be determined. This means that despite very different recording intensities, the same selection criteria were applied everywhere when compiling our sample. This is because red-figure sherds with significant remains of figural painting were recorded at the same intensity at all excavation sites. On this level, the different sites are therefore very comparable with each other. [84] A stratified sample can minimise the imbalance caused by the selective publication of the pieces if the same recording criteria apply to all sites and contexts. Then, as with random sampling, each element of the population again has the same chance of being included in the sample. If we plot these data on a map, however, we do not get a visualisation of all sites where Attic pottery has been found, but only those from which sherds with figural decoration are known to a significant extent. The caption must therefore read: Significant sites of late red-figure vases from Athens. [85] There are a number of statistical tests for checking representativeness, most of which can only be usefully applied to empirically collected data. However, we can do another test that examines our sample in terms of publication status. To do this, we compare the distribution in different publication phases, e.g. by evaluating the increase in the last 25 years separately. If the results are identical, our sample is representative. In our example, the sample contains 3363 entries that were presented for the first time in the last 25 years, i.e. between 1986 and 2010. This corresponds to 35.9% of the total material. With regard to the distribution across vessel shapes, this increase leads to minor deviations, which are less than 2%. Thus, our sample is likely to be independent of preselections and thus representative. 86] Another test concerns the accuracy of the results. The "result accuracy" has nothing to do with statistical uncertainty. Rather, it refers to the accuracy of a data basis with regard to a change in the data basis by a certain number of data. For example, by specifying the accuracy of results below the total number, an attempt is made to indicate the statistical bias that results from a small size of the material base. This is done by noting how much the percentage distribution of the data would change if a value were unilaterally increased by three. For example, with a base of one hundred objects, E(3) is 3%, which means that the given values would deviate by +/- three percent if three additional objects of a criterion were included. With 400, it is only 0.75%. Within this size, the numerical ratios are therefore no longer relevant. The result accuracy thus refers to a fundamental statistical inaccuracy, which is particularly important with small amounts of data, while the confidence interval expresses the estimated representativeness and maximum deviation of the data, which, however, cannot be determined exactly in archaeological evaluations anyway. 87] The methods for quantifying data depend on the data basis, and today we have become acquainted, albeit somewhat mixed up, with three different compositions of data material on which we can conduct our investigations. First, there are analyses on the population, i.e. all existing elements that we want to survey. This population can come from complete surveys such as censuses, inventories, corpora or catalogues of works, or it can be born- digital content, e.g. all posts on Instagram, Twitter, YouTube, on television, etc. The collection of data can be based on a set of data that we have collected in the past. Gathering the data can be quite time-consuming, but after that, apart from a completeness check, no further preliminary investigations are necessary and you can get started with the statistical analysis straight away. On this data, all statistical calculations such as the determination of location and dispersion measures can be carried out, which is especially useful in Big Data applications to get an idea of the composition, distribution and correspondence of the data. There are researchers, such as Lev Manovich, who want to allow only complete populations, if possible, even for cultural datasets. [88] However, this can be very time-consuming and cost-intensive, especially if you have to acquire the data first. That is why analyses based on representative samples are very popular, as they are e.g. surveys, image sets, geo-surveys, etc. are very popular. A representative sample is a proportion of all works available in a particular medium, time or place, selected according to fixed rules. These rules can refer to producers, users or recipients. What is crucial is that they are applied uniformly. In order to be able to create a representative sample, preliminary research on the composition of the population is necessary in order to be able to select the right sampling method depending on the characteristics to be studied. Sampling only makes sense depending on the characteristics to be investigated, because every reduction of the data basis bears the risk of excluding something important. Accordingly, the research question determines the composition and size of the sample. If the sample is representative, i.e. if it represents the population in its entirety, it is also possible to calculate the measures of location and dispersion. 89] In historical image and artefact studies, however, analyses are often only possible on indeterminate subsets of the population. Great caution is required here, for example, when using data from internet portals and databases, digitising collection catalogues or compiling excavation finds. First of all, you should find out exactly how the data collection came about in order to be able to detect distortions in the data straight away. Subsequently, re-sampling is necessary, which must be carried out depending on the features to be investigated. Calculating location and dispersion measures on this data, which is still relatively uncertain with regard to the population, can be counterproductive, as individual phenomena may be masked in this way. [90] In 2020, a large poster exhibition took place in Hamburg, which was announced as follows: "With nearly 400 exhibits by around 200 artists and designers, the exhibition The Poster at the Museum für Kunst und Gewerbe ... offers a large-scale, representative overview of the history of the poster from its beginnings in the early nineteenth century to today." And I must say, this was a really beautiful and multi-faceted exhibition in over 170 sections! In conclusion, I would like to ask how representative an exhibition like this can be? [91] The Hamburg Museum für Kunst und Gewerbe owns one of the largest poster collections in the world and the exhibition only shows works from its own collection. First, we would like to examine whether their collection in its entirety can be a representative cross-section of European poster production at all. To do this, we will take a brief look at the acquisition history and, with the help of the exhibition catalogue, try to determine the collecting interests and focal points. The first examples came to the museum in the 1880s, but not as posters, but as "examples of lithographic colour printing", in which the then director Brinckmann was particularly interested. In 1896, Hamburg already had 400 posters. Another focus arose in 1915, when the museum decided to collect all printed matter published on the occasion of the war. Whereas before, it had been technical aspects, now it was historical and thematic. "After that, however, apart from minor donations and the usual mailings, the collection lay dormant for almost half a century." In 1964, following an exhibition of the Alliance Graphique International, all the exhibits remained in Hamburg. The focus of the posters now acquired was therefore on graphic design and thus formal design criteria. In the following years, posters by individual artists were again added in connection with exhibitions, forming a further focal point in terms of numbers. Again and again, local designers donated their designs to the museum. In addition, the museum took over a private collection of artists' posters, which shifted the emphasis from graphic and typographic design to artistic realisation. And in the 1990s thematic exhibitions on perfume, sneakers, Art Nouveau or Art Deco led to an expansion of the collection in terms of motifs and chronology. This brief overview makes it clear that collections were made with changing focal points. The collection can therefore not be representative despite its size and diversity. [92] The catalogue names three central genres for the poster: the cultural poster, such as theatre and exhibition posters, the political poster and product advertising. With a study period from 1770 to 2020, the sample does not even include two posters per year. It is obvious that already the three genres cannot be sufficiently represented, but even less so different countries, art movements and topics. The sample is simply too small for an investigation period of 250 years. Chronological focal points are the 1890s (Affichomania or poster mania, i.e. the French poster art of the late 19th century), the period around 1930 (with its, above all, Russian avant-gardes) and around 1970 (with Pop Art). According to the catalogue, the aim was not "to make as broad a selection as possible, but to present the decisive epochs of poster history in the necessary diversity. ... Much had to be omitted, some could only be hinted at." (S. 11) In the introduction to the catalogue (p. 9) one reads expressions such as "the most interesting designs", "outstanding works", "posters of lasting significance", "the great designers" etc.. This already makes it clear that aesthetic criteria determined the selection and not statistical ones. In addition, the museum was able to acquire a significant private collection of artists' posters in the 1990s, which shifted the emphasis from graphic and typographic design to artistic realisation. It should have long been clear to you that this was also a canonical selection. Strictly speaking, the term "representative" is therefore inaccurate. [93] But what would you have to do it if you wanted to examine the history of poster design in a statistically correct way? First of all, it is important to be precise about the question you want to answer with the statistical investigation. Because the type of sample depends on this. Let's say you are interested in the chronological distribution of certain motifs. Then you have to make sure that a sufficient number of posters from each year or at least each decade are included. If you recall again the confidence intervals to indicate the margins of error: If you include 400 posters, the standard deviation is ±5% as long as you are talking about the distribution on one characteristic. If you wanted to evaluate the motifs by decade, for example, you would need 400 posters for each decade to maintain this margin of error. [94] As you will recall, each element of the population must have the same chance of being included in the sample. Therefore, you must not prejudge the outcome with your selection. This is easily the case with stratified samples. But it is also the case with collections that have been created with a certain focus in mind. Conversely, you must not simply close the gaps discovered in the collection (e.g. for posters of the Third Reich), because this would unilaterally expand the data base and possibly distort it with regard to another characteristic. Instead, you must try to achieve an even distribution of all relevant characteristics that is tailored to your question. 95] If necessary, break down the sampling frame into several sub-populations! For if you align the sample not randomly but stratified, for example by having each of the three genres equally represented, you could then also examine your corpus separately by genus. Of course, you may now no longer evaluate the frequency of the genera or subgenera chronologically, since you previously included them in an even distribution. With such a weighting, you can now no longer determine whether, for example, the political poster only gained in importance after the First World War, which is quite possible historically. [96] But the greatest difficulty is probably to determine the total population. And we won't be able to do that without major surveys. In the end, the Museum für Kunst- und Gewerbe had no choice, given the current state of research, but to select typical (non-representative) representatives for a number of posters from their own collection from certain perspectives. This is not a bad thing, as long as the ratios are not expressed in numbers and thus a statistical representativeness is pretended. [97] The quantifying methods make the tension between the humanities and empirical or mathematical disciplines particularly clear. On the one hand, even rudimentary knowledge of statistics is often lacking in the humanities, so that many a pointless investigation is carried out; on the other hand, many researchers are also not sufficiently aware of the manufactured nature of the data, the processes of pre-selection and canonisation. The challenges that this poses on a daily basis concern precisely this reflective handling of data in the humanities. Data sets, tables and diagrams are to be treated like historical sources and questioned about their basis, originator and intention. This also includes the discussion of the right data basis, the appropriate statistical method and adequate visualisation, which we will talk about in more detail in the next lesson. In the past lessons we have often talked about the modelling of blur(s). Of course, it is always important to keep the statistical evaluability in mind. 98] At the end I always list the knowledge we expect from you: you should know different quantitative methods. It is indispensable that you master the basics of statistics. The meaning of normal distribution, types of sample, confidence intervals, accuracy of results, etc. should now no longer be unknown to you. This also includes the analysis procedures of common measures and structures as well as concepts and theories of sampling. But also the basics of georeferenced analyses. 99] As far as your skills are concerned, I would like to pick out three, namely the ability to describe and analyse subject data quantitatively and qualitatively and to select (sample) data appropriately. You should be able to calculate various measures of location and to understand and verify results using statistical methods. In addition, it is important in the study of digital humanities to be able to model correlations between two characteristics and to distinguish between correlations and causal relationships. [100] The possible exam questions should not surprise you now. What do you mean by result accuracy? What is the difference to standard deviation? could be a task. Which sampling methods do you know? Describe a method using an example from image science and name its advantages. When should statistical methods be used in the humanities? What are the advantages and difficulties? or: Why should one take a critical view of the visualisation of the data collected by Wolfgang Filser shown opposite? 101] With a look at the literature, this time mainly textbooks, I say goodbye to you again. In the next lesson we will continue seamlessly with data visualisation and data exploration. Until then, I wish you all the best! [1] We’re living in the age of data. Our entire lives can be acquired in numbers and evaluated in the form of statistics. The fact that not only facts are cast in bars, but rather opinions are disseminated, should be clear to you at the latest with the election forecasts. But what is behind them? What principles should be observed and what mistakes should be avoided? With these fundamental questions, I welcome you to the 9th lesson of our introductory lecture in Digital Image and Artefact Science. Today we will be dealing with quantification methods. [2] Statistical studies in text corpora are very common in the Digital Humanities. They are used for linguistic questions and for semantic analysis of texts as well as for authorship attribution. One of the reasons for this is that texts can be classified relatively easily and unambiguously. With images and artefacts, the assignment to object classes is more difficult. We have already touched on the now very important field of pattern recognition in 2D and 3D data in our lesson on image analysis. Today we will therefore only talk about evaluations that can be made on the basis of measurements and linguistic descriptions. 3] Statistical surveys are already very old, because every state has to collect key figures in order to be able to build roads or equip schools and universities appropriately, for example, and it has to know in advance how much all this will cost it. The associated term statistics therefore initially referred to the study of the conditions (status) of the community, as formulated by Gottfried Achenwall, professor of natural law and politics in Göttingen, in 1749. It was not until John Sinclair used the term in his Statistical Account of Scotland of 1791 in the generalised meaning of collecting and evaluating data. However, the state-controlled collection of data is much older, whether in the form of censuses, import lists or accounts such as the building accounts of the Parthenon or the annual accounts of the Nuremberg Master Builders' Office. For this purpose, data is collected with regard to certain criteria and categories and compiled in tabular form. [4] However, a standardised and systematic preparation and mathematical evaluation of these data took place in the 19th century. In Prussia, for example, the Statistical Bureau in Berlin was established as early as 1805, the predecessor of all statistical state offices. Its task was to collect population, economic and financial statistics. With the founding of the German Empire in 1872, the Imperial Statistical Office came into being, which lives on today as the Federal Statistical Office. Parallel to this, statistics was also increasingly pursued scientifically and taught at universities, so that the methods of statistical analysis now also had a theoretical basis. Since the end of the 19th century at the latest, statistics has thus become a scientific instrument for the acquisition of social factors, characteristics and opinions, which is intended to place the formation of wills and decision-making on a solid basis. Therefore, it is no longer just the population that is surveyed, but society as a whole in all its facets of economic, social and cultural life. Quantifying methods have since become a fundamental part of research in economics, political science and the social sciences. [5] But what about the humanities? In 1863, one of the fathers of the historical sciences, Johann Gustav Droysen, used a review of a book to fundamentally address the question of the significance of statistical regularities for the science of history. He denied that statistics alone made it possible to formulate the immutable and general laws that also apply to history, as Henry Thomas Buckle had previously claimed in his two-volume History of Civilization in England. The latter's book had little influence on historical scholarship, not least because of Droysen's criticism, but is considered a milestone of modern sociology. And unlike later critics, Droysen was quite open to statistics: (quote) "It will not occur to anyone with understanding to deny that the statistical approach to human affairs also has its great value; but one must not forget what it can and wants to achieve." For the understanding of the Eroica or Faust must not be limited to legal or statistical laws. [6] Accordingly, today's topic will be the possibilities of quantifying evaluations. It should be said at the outset that even statistics in the form of tables, graphs and diagrams are not an objective compilation of facts, even if they seem to pretend to be, but rather data compiled under certain aspects, which must be understood and critically evaluated just like texts and images as sources. Statistics are in fact "a specific knowledge practice [...] that orders and categorises phenomena in order to enable comparisons as well as inclusion and exclusion". They are thus "components of complex decision-making and communication processes" as the editors of the book "Die Zählung der Welt. Cultural History of Statistics from the 18th to the 20th Century" in their introduction. 7] As is well known, everything can be collected and counted. And our media world is constantly fuelled by corresponding bar charts. For example, one could read in Spiegel, Stern and elsewhere the headline: Easter Bunny Beats Father Christmas! - where they compared the chocolate Easter bunnies produced in Germany with the chocolate Father Christmases. It is easy to imagine the story that can be developed from these figures. The downfall of the Christian West, for example, where the pagan Easter bunny is more revered than the Christian Father Christmas! Or the increasing commercialisation now also of Easter! But what seems comparable at first glance can, on closer inspection, be compared just as well as apples and oranges. For the gift-giving customs at Easter, where the Easter bunny is a standard gift along with Easter eggs, are different from those at Christmas, with more precious gifts. Here, Father Christmas is just one product among many different sweets such as nuts, gingerbread or speculoos. [8] Accordingly, the demand for German chocolate Easter bunnies as a speciality is much higher than for Father Christmas, especially abroad. As you can see, the fact that both are made of chocolate is not yet enough to draw any broader conclusions from their production figures. The basic error here lies in the reversal of thesis and evidence. A thesis was not proved or disproved by an investigation, but figures were first collected and then a suitable thesis was found. 9] A similar approach is sometimes taken in the humanities. For example, when Wolfgang Filser statistically compares vase paintings with depictions of money and sporting activities, and then draws conclusions from them about the self-image of the Athenian elite. We will take this example today to point out certain errors that occur more frequently. But we want to deal primarily with fundamental questions in dealing with statistics and ask: What are actually the basics of evaluation? and how convincing are then the results? What procedures are there? How do you create a good data basis? And what are typical mistakes? 10] The lecture hour is again divided into three parts. First, we will deal with the simple quantification of data in the form of descriptive and inferential statistics. We will learn about the normal distribution and look at the standard deviation. And the second part is about the important question of proper sampling, i.e. representativeness and data selection. In the third part we will look at sampling again from the point of view of two examples, namely geodata and found pottery, and also learn to appreciate the concept of accuracy of results. [11] The older ones among you will perhaps still remember the 10 DM note. The Göttingen mathematician and astronomer Carl Friedrich Gauss was depicted there, together with the Gauss curve. You can find out what this is all about in this chapter, which deals with the basics of statistics. 12] In the humanities, people are usually sceptical about quantitative research methods and investigations. You may be familiar with the sentence: Only trust statistics that you have falsified yourself. Behind this is the view that you can really prove everything with numbers. Which is true if you don't follow the basic rules of statistics, which is what today is all about. Moreover, depth is often confused with breadth. Qualitative methods are needed for the precise analysis of individual research objects, a work of art or a literary text. But if we do not want to make the mistake of developing a generally valid research result on only three supposedly significant examples, we need a broad-based investigation. This may give us a result that we have already foreseen, a result that others have already uncovered in three examples, but we have now established it validly. Much more often, however, quantifying methods do not provide any evidence at all, but only indicate trends that we have to interpret further. What exactly does it mean if armament expenditure increased sharply in Germany and Russia shortly before the outbreak of the First World War? Ha#e industrialisation now also acquired the military, which led to an arms spiral, or was war specifically being prepared? Quantitative research methods are thus a complement to qualitative approaches. They can stimulate qualitative research or make its results probable. [13] The basis of statistics is therefore probability. What we consider probable is based on our experience. In this respect, we are constantly unconsciously collecting data in our everyday world and evaluating it inductively. "The jam sandwich always falls with the bottom side to the floor" or "The RE to Göttingen is usually ten minutes late." A statistical analysis can now help us verify these prejudices. In the humanities, it is usually true that a thesis with the best arguments for its correctness is valid until other facts or better arguments against it are found. The question, for example, of how to date an object depends on statements in historical documents, stylistic comparisons or stratigraphic findings. All these sources only indirectly tell us when the object was created. Historical dating therefore claims the highest possible probability, but is not proof in the mathematical sense. They can rather be called proofs of probability. And the more evidence there is, the higher the probability, of course. [14] To test a hypothesis, such as the shape development of Copper Age flat axes, a suitable data basis and appropriate methods are necessary, which must be applied depending on the thesis in question. To prove our point, we need as many axes as possible that we can compare with each other and whose characteristics we can acquire. These characteristics can then be evaluated statistically. For the question of the stylistic development of prehistoric axes in Europe, however, other finds from Europe or axes from other periods and places of production are irrelevant. As a first principle, therefore, we can already state: The data basis must be established in dependence on the hypothesis to be tested, i.e. it must be representative for this question. [15] Strictly speaking, a distinction is made between two types of statistics: Descriptive or descriptive statistics sort the data according to certain criteria and illustrate this in simple key figures, tables and graphs such as bar charts, which show the numerical values in the form of horizontally arranged bars. Here, for example, the number of museum visits and museums in Germany in 2017. [16] A distinction is made between categorical and numerical data. Categorical are all values that measure the number of people or things and that have been divided into linguistic categories for this purpose, such as genre, theme, colour or epoch. These are measured by the number of individuals in a group, which is also called frequency. The frequency can be expressed both in absolute values and relatively as a percentage. The best visualisation of these data are bar charts for absolute frequencies and pie charts for relative frequencies. Numerical data, on the other hand, are measured values such as height, weight or distance. Here the values also have numerical meaning, which is why calculated values such as mean, dispersion or relations between the values can also be given here. Histograms or box plots are best suited for visualisation here. The main task is thus to find out how a distribution of a characteristic could best be described and presented. [17] For the investigation of the probability of differences belongs to inductive or inferential statistics. It asks to what extent what is measured corresponds to reality by deriving properties of a population from the data of a sample. Inductive statistics is therefore essentially concerned with the question of the randomness of statistically measured phenomena. Thus, one asks oneself to what extent a mean value measured in a sample could deviate from the mean value of the population; one asks oneself in the case of different samples whether they can still belong to the same population in view of their measured differences and more. Here, therefore, an attempt is made to classify the examined sample into a larger whole, whereby broad space is also given to the examination of the probability of correlations or differences. 18] Some important basic terms have already been mentioned in my explanations: A characteristic (also variable) is the respective peculiarity of the object of investigation, the characteristics of which can vary (in contrast to a constant). In the case of collection objects, for example, a characteristic could be the affiliation to a certain genre or epoch, the weight, the inventory number or the market value. A characteristic is understood to be the totality of possible values of a characteristic. For example, the variable "genus" in our case can take the form of animal specimen, plant specimen, mineral and model, and the variable "sex" of the specimen can take the forms male, female and bisexual. On the other hand, the numerical description of the characteristics of a variable on the basis of measurements or counts is called quantification. In the collection shown here, the number of animal specimens (if I counted the butterflies correctly) is 13. [19] You may have noticed that the feature types can be different. Qualitative characteristics are those whose expression can be recorded as terms on a nominal scale. We also spoke above of categorical data. Quantitative, on the other hand, are measured values or numerical data. And thirdly, there are ordinal numbers, which arrange the characteristic in a ranked list, as is the case, for example, with evaluations in the form of school grades. [20] And once again for the record: The population (or basic population) refers to the totality of all elements for which the statements of the study are to apply. Since a complete survey is rarely possible, one usually selects a sample, i.e. a selection of observation units from a defined (basic) population. A sample should reflect this basic population without bias, e.g. through the model of representativeness, which we will talk about later. However, I would like to point out here already that the basic population must also be determined exactly. If you read, for example: Germans prefer beer to wine or would vote for a certain party if there were federal elections on Sunday, the population here is not all Germans, but in the last example all persons older than 18 who had a German passport in May 2020 and were registered with their main residence in Germany, or in the first case all persons over 16 who shop in Germany. The "Germans" here therefore form two different populations! 21] The reason why it is so easy to work with random samples in empirically collected data is that in nature measured values are usually normally distributed, as the Göttingen mathematician Carl Friedrich Gauss already discovered. Let's say that on a chicken farm with a lot of chickens, the individual eggs were weighed for a week. Let's define the random variable X as the weight of an egg in grams. Then it turns out that an egg weighs 50 grams on average. The expected value is therefore 50. Furthermore, let it be known that the variance varX = 25 g2. So we can approximate the distribution of the weight as shown in the graph. We can see that most of the eggs are near the expected value of 50 and that the probability of getting very small or very large eggs becomes very small. We are looking at a normal distribution. It is typical for random variables that are made up of very many different influences that can no longer be separated, e.g. weight of the chicken, age, health, location, heredity, etc. For example, the probability that an egg weighs at most 55 g is 0.8413. This corresponds to the red area in the figure. Normal distribution" is therefore understood to mean a steady (continuous) distribution of randomly collected data. This includes, for example, the height or intelligence of people. But also the amount of ceramics in a landfill or the settlement density in a quadrant is normally distributed, if it is not subject to special culture-related changes. The normal distribution therefore applies to all random samples of a population. 22] The special significance of the normal distribution is based, among other things, on the central limit theorem, which states that a sum of n independent, identically distributed random variables with finite variance in the limit n→∞ is normally distributed. This means that random variables can also be regarded as normally distributed if they result from the superposition of a large number of independent influences, with each individual influencing variable making an insignificant contribution in relation to the total sum. [23] The normal distribution applies to measured values that cluster around the arithmetic mean, provided the sample size is large enough. In our chicken example, it was the value 50. This average value is obtained by dividing the sum of all measured values by their number. Usually, the measured values are plotted on the x-axis and their frequency on the y-axis. This distribution only makes sense for characteristic values in numerical form, i.e. for characteristics of an ordinal scale. Qualitative or quantitative characteristics cannot be normally distributed, even if they are subject to chance. Approval of political parties, for example, is a qualitative characteristic. If it were normally distributed, we would no longer need to hold elections. Or to stay with the chicken example. The amount of eggs laid varies from day to day. It is possible to calculate the mean value here, i.e. to indicate how many eggs were laid on average per day, but such a frequency distribution as a bar chart with the days of the week on the x-axis only visualises the collected data. There is no frequency distribution, because unlike weight, there is no numerical correlation for days of the week. A sample was not taken, but a count was made on each of seven days. 24] This is why Wolfgang Filser's iconographic study has several problems. On the one hand, the sample did not come about by chance due to the special conditions of transmission, which we still have to discuss, and on the other hand, year numbers like weekdays are not measured values and have nothing to do with normal distribution! Thus, approximating the bar chart to two Gaussian curves suggests a representativeness in the data basis, which is not given here. 25] This raises the question of how representative such statistics are in the first place. After all, representativeness is only given if the composition of the basic population is replicated in the selection of the elements of the sample or is proven by tests. In order to draw conclusions about the basic population from the results of the statistical evaluation, the sample must be representative so that generalised statements can be made. The term "representative sample" is actually misleading, because samples usually never represent the population exactly. It is more correct to ask about the significance of the sample, but the term has become commonplace and we will continue to use it. Let us start with the sample size. This is usually abbreviated with a small n. Please get into the habit of always stating the sample size. This is a requirement of scientific rigour. Wolfgang Filser has printed his bar charts in percentages and without this indication, and you have to search for the values in his book to find this important information: In Symposia he examines 373 representations, and with athletic subject matter he has collected 1216 vase paintings. So that means his samples vary in size, which is why a comparison in percentages is useful. But do you consider the total number of vase images examined to be meaningful at all? And within what limits? [26] These limits can be estimated. To assess the sample size, we are interested in the standard deviation, i.e. we ask about the fluctuations or dispersion of the values in a sample. If you are looking for a flat in Göttingen, for example, the average price for rented flats of 8.40 € per sqm says nothing about the prices that have to be paid for student flats here if you do not know their range. For this, you first need the deviation from the mean value or the arithmetic mean, as the statistician says. The standard deviation is therefore nothing more than the average distance from the arithmetic mean. Imagine the measured values in a coordinate system in a coordinate system. The standard deviation is then the average deviation of a point from the centre and the centre is precisely the arithmetic mean of all values, i.e. the average price. If you calculate the average distance of all points from the centre, you get the standard deviation. Here it is 1.20 €. 27] Instead of calculating the standard deviation exactly, an estimated value (s) is often used that applies to 95% of the measurements. Depending on the sample size (n), this is ± s divided by √n or in percent ± 100 / √n. This value applies to normally distributed values of a population. The error limits of normally distributed values are calculated from the sample size in relation to the total quantity. This is also called statistical uncertainty, which is given as the confidence interval (e). It indicates how much the measured value is estimated to deviate (with an approximate 95% probability) from the true result value. This value also does not decrease linearly, but exponentially with the size of the collected data. I will spare you the complicated calculation and just give a few confidence intervals (e) to indicate the error limits for orientation: For a sample size of 30, the deviation is ±18.3%, for n = 100 it is ±10.0%, for 200 it is ±7.1% and for 1000, which is common in surveys, the statistical imprecision is ±3.2%. As you can see, a sample size of about 5000 with an error margin of ±1.4% would also be ideal for humanities studies. Larger samples would hardly change the result. 28] Applied to our example, this means, if you follow the green line, that with this sample size of 373 vase paintings with symposion representations, each value could also be 5.8% higher or lower. Most of the values thus lose their significance completely. The situation is somewhat better with the second list, where the margin of error is only plus/minus three percent. However, the confidence intervals only apply to normally distributed values. In our example, the error limits will certainly be much higher. In order to assess such graphs, it is therefore absolutely necessary to specify the sample size (n) that was surveyed! And please never underestimate the suggestive power of such visualisations. In colloquial language, statistics is often equated with graphics. These graphs are based on a quantity distribution, not a statistical study. If we were to arrange the numbers as a pie chart, for example, they would be relatively inconsequential. The effect of the numbers is that they are plotted on a time axis and thus visualise a development in the manner of "rise, blossom and decline". As an archaeologist, I would be sceptical about whether the selection of the sample does not play a role here. We will have to talk about this in more detail in the third part. 29] If the values are not normally distributed because the sample is not subject to chance but has historical, social or cultural causes, for example, it makes sense to calculate the median in addition to the arithmetic mean. It denotes the mean value in a sorted series of values. For the egg-laying hens in our example, the arithmetic mean is 312.86, i.e. an average of 313 eggs were laid per day during the week. The median, as the middle value in the series, is 300. A value m is the median of a sample if at least half of the sample elements are not greater than m and at least half are not less than m. The median of a sample is the value m of the sample. Of course, this distinction is only meaningful at all in large data sets, where one wants to remove "outliers" from the data. 30] In a famous study, Franco Moretti examined the title length of British novels, where a significant shortening can be observed at the beginning of the 19th century. His chart gives both the average and mean length to give a fuller picture of the degree of variation in titles: The average gives information about the often extravagant length of some titles - while the mean indicates the "central" length of the year in question, (i.e. what has the same number of entries above and below it). The difference between the two forms of measurement is particularly evident in years such as 1780 (with the 346-word History of Miss Harriot Fairfax) or 1784 (with the 273 words of The Maid of the Farm): in these two cases the average is 37.9 and 19.7 respectively, while the mean (8.5 and 7) is barely affected. [31] In digital image science, the median is used especially for the comparison of measurable values such as brightness, contrast or saturation by reducing the image to a single value to describe the image effect. In our example, the cover of the fashion magazine Vogue in the German edition of April 2004, for example, one could use the mean value of the brightness of all 629,184 pixels. In this way, the covers of this magazine can be compared chronologically, as Peter Leonard did in 2013. You can see here the distribution of the mean brightness of all international issues of Vogue plotted on a timeline. So you can easily see that since the late 1980s the spectrum has always been broad and balanced overall. This suggests that, for marketing reasons, attention may have been paid to variety in the design. 32] For qualitative characteristics, the mode or modal value is the only obvious choice. This shows the most frequent value of a distribution, i.e. the value with the greatest probability. Since a distribution, when represented graphically, can have several peaks or summits, a distribution can also have several modes assigned to it. If there is only one modal value, the distribution is called unimodal. But if there are about the same number of hits for one value as for the other, the distribution is bimodal. To take an example from one of our research projects. If, for example, we sort all the marble busts in the antique collection of Tomasso Obizzi, which were estimated to be two Venetian Zecchini after his death, according to their height, the average value is 58.68 cm, while the mean value, which does not take so much account of the outliers, is 61 cm high, and thus better reflects the standard expectation at the time of the size of an antique marble bust. Put into a diagram, however, it becomes clear that the distribution of sizes is multimodal and has several focal points. There is no test for the representativeness of data, but you use descriptive methods here by showing that the values of the sample do not differ significantly from the values of the population. Which descriptive methods you use for this depends on the type of variable. With nominal, i.e. ordinal data, you take relative frequencies. For metric data, use the mean or median, for example. In addition, you specify the confidence intervals in each case. By comparing these numbers and discussing the difference, you can show that there is (hopefully) no clear difference and thus representativeness exists. [33] These ratios of a sample are also called location measures. They indicate, for example, where the centre of a distribution "lies". In addition, there are measures of dispersion for the variability (or dispersion) of a frequency distribution. For example, the empirical variance, which is the mean squared deviation from the arithmetic mean, the range, which is the difference between the largest and smallest observation, and the standard deviation, which is calculated by taking the square root of the empirical variance. All of these values are important for gaining insight into the nature of large data sets. We will return to this briefly in the next lesson. 34] And to summarise, I have written down some more important descriptive statistics terms for you on this slide. Representativeness is given when the composition of the basic population is reproduced or approximated by tests when selecting the elements of the sample. The term probability is used to describe the classification of phenomena according to their degree of certainty. The probability p is expressed with values between 0 (impossibility) and 1 (certainty of occurrence). Correlation in the statistical sense refers to a systematic correspondence between the characteristics of two variables, and correlation to the relationship between two quantitative characteristics. The strength of the correlation is expressed by the correlation coefficient. It lies between the extremes -1 and +1. If it is positive, this means that a high value of variable A is accompanied by a high value of variable B, and the same applies to low values. [35] Indeed, an important task of statistics is not only the distribution of values, but also the study of correlations between variables. Take, for example, the graphics of the Brücke artists Karl Schmidt-Rottluff, Hermann Max Pechstein, Ernst Ludwig Kirchner, Erich Heckel and Otto Mueller, who you see here in self-portraits. At the beginning of the 20th century, the members of the artists' association "Die Brücke" rediscovered the woodcut for their expressive art, which seemed particularly original to them. If one now wants to examine the position of the woodcut in the graphic work of these artists, i.e. if one wants to know what connection exists between the characteristics "artist" and "printing technique", one must interweave them and present them in table form. This table is then called a cross table, frequency table or contingency table. In our example, there are two variables. I have entered the values of the variable "artist" in rows from top to bottom and the values of the variable "printing technique" in columns from left to right. In each individual cell, the specific frequency of the respective combination of the expression of the variable "artist" with the expression of the variable "printing technique" is noted. With a manageable number of columns, crosstabs can be output quite well as bar charts or, as here, as column charts. In this way, the frequency of the expressions can be recorded relatively easily at a glance and it is also possible to see whether an artist frequently produced graphic art. Heckel and Kirchner were generally fond of this form of artistic expression; unlike Otto Mueller, for example. [36] A cross table can show both absolute and relative frequency. For our question about the position of the woodcut in the work of the respective artist, we are more concerned with the relative distribution, which is represented in percentages. We now see that the woodcut plays a prominent role for Schmidt-Rottluff, Heckel and Kirchner reach for the woodblock with roughly equal frequency, and for the painterly Otto Mueller, lithography is the absolute medium of choice. Stacked column charts are well suited for visualisation, or pie charts, which are only suitable for relative frequencies. [37] In our example, we have evaluated the catalogue raisonnés and thus all the prints created by the artists. We are talking here about the basic population, which with 4817 prints was not very large. With larger quantities, such as the quantity of all Germans who go on holiday, one only examines a sample. In this case, the statements of descriptive statistics always refer only to the examined sample. With descriptive statistics we do not investigate whether these differences could possibly be random (because the sample was too small; the assessment distances too small; or because the selection was not representative). 38] Many readers of statistical analyses are interested not only in the general distribution of the data, but also in the underlying correlation. They want to know whether, for example, there is a causal relationship between expressionist artists and the printing technique used or between chocolate consumption and the number of Nobel Prize winners in a country. In 2012, an article in the New England Journal of Medicine tried to establish such a connection and recommended the daily consumption of preferably dark chocolate. Franz Messerli had related the amount of chocolate that people consume on average (in kg per person per year) to the number of Nobel laureates from that country (per 10 million inhabitants, to correct for population size). The author then found the correlation shown in the graph (r=0.79, p<0.0001). However, if two variables correlate very strongly with each other, this does not at all mean that these two variables actually influence each other. One may find this plausible, but it is wrong. Often, a third variable is causally involved in the correlation that was not recorded at all, such as the availability of chocolate or the wealth of the respective countries, which has an effect on higher spending on education and research, but also on chocolate consumption. Sanjay Basu, Assistant Professor of Medicine at Stanford University proves in his blog that there are stronger correlations between per capita borrowing from commercial banks and Nobel Prize winners (r=0.92, p<0.05) or owning luxury cars and Nobel Prize winners (r=0.85, p<0.0001) than between chocolate consumption and Nobel Prize winners, and therefore rightly asks in conclusion: does this mean that using credit and buying an Audi makes you smarter? And in this case, there is another problem: when you examine aggregate data, you cannot determine whether the higher chocolate consumption actually occurs among those individuals who have higher cognitive functions, let alone are Nobel Prize winners, but only that there are both Nobel Prize winners and chocoholics among the residents, to put it pointedly. And thirdly, it is important to note that the presence of Nobel Prize winners is not a good measure of nationwide cognition because they are extremely rare individual cases and therefore cannot be considered representative. [39] For historical periods before the 19th century, however, there are usually no data that have been systematically collected for statistical purposes. Rather, the data are published in a scattered manner and are incomplete. This inconsistency of source material confronts researchers with the problem of having to make estimates from other data. For example, if one does not know the size of the total population of a place, but does know the number of taxpaying citizens, one can try to extrapolate these figures with the help of estimates such as the assumed average size of a household. The reliability of the estimated values determined in this way then determines the further investigation to a large extent. If, for example, one estimates the population of Pompeii from the number of seats in the amphitheatre or the area within the city walls, which is what archaeologists like to do, one fails to recognise the prestige that this small city wanted to claim in regional and supra-regional comparisons. For not all quarters of the city are equally densely populated, and the amphitheatre is one of the largest of its time. 40] From what has been said so far, a typical sequence of a quantifying investigation in the humanities emerges: At the beginning there is the research question or hypothesis, which refers to sources that can be collected and evaluated. These sources must be critically sifted and interpreted. This usually involves a clarification of the most important terms as preparatory work for operationalisation, by which is meant the definition of the variables. Subsequently, the survey process should be conceptualised, which also includes the coding or classification of the characteristics. The subsequent sampling determines the nature and size of the sample and, if necessary, prepares it. Now we are in a position to analyse the data collected in this reflective way. In doing so, we have to decide (still depending on the research question) whether we want to proceed descriptively or inductively, because as you now know, this determines the choice of means. And the results of our analysis are then presented graphically in a way that makes them easy to follow. [41] The two most common bases for statistical investigations are surveys and experiments. Unfortunately, neither of these are available to the historical sciences, because we cannot subsequently interview anyone from past epochs, and we usually do not have enough representative data for an experiment either. Therefore, the most important task of any quantifying evaluation is to create a suitable material basis. In doing so, one must always keep in mind: Data is not given, but compiled, constructed as an interpretation of the experiential world, which is not inherent in the data. Data sets are therefore never random samples: they are collected by people according to parameters, however complex, hidden, unconscious or random they may be. In the second part, therefore, we want to talk about sampling, that is, the meaningful compilation of data. [42] The process of digitisation is already quite advanced in some areas, and there are also some large data collections available online. The oldest is Project Gutenberg, founded back in 1971, which makes over 60,000 e-books available free of charge. At the moment, however, access is blocked in Germany for copyright reasons and only the German sister site is accessible. 43] Another portal for books is the Haithi Trust Digital Library with just over 9 million digitised books. 44] And the largest portal is Europeana. There, as of May 2021, 52,254,880 works of art, collectibles, books, films and musical pieces from European museums, archives and libraries could be found, [45] while its American counterpart, the Digital Public Library of America, listed almost 44 million images, texts, videos and sound files in the same period. [46] The Internet Archive has been storing web pages in various versions and other content freely accessible on the internet since 1996 and has now also developed into a large platform for data of all kinds. Among them are more than 3.8 million image files and 6.9 million films, mainly from American television channels. [47] Google Arts & Culture (or Google Art Project, as it used to be called) is more oriented towards the digitisation of cultural heritage. The museum-like online platform includes high-resolution images and videos of art and buildings from a wide variety of cultural institutions around the world. We'll look at this in more detail in the Virtual Museums lesson. 48] Artstor.org makes more than 1.3 million works from public collections available free of charge, but the total volume of 2.5 million files is only available to subscribers. [49] And the situation is similar with the German image archives Prometheus with 2.8 million digitised images ... [50] and Foto Marburg with about 2 million images. [51] On wikipedia you will find clear lists of the most common online archives, libraries and museums. All these data collections are very suitable for researching individual sources, but are not suitable for Big Data analyses because the data sets cannot be addressed or downloaded in their entirety for copyright reasons. 52] And here is another list of data collections on modern history that are suitable for quantitative analyses. [53] But that is not the only problem: the collections of museums and archives were not digitised in their entirety one after the other, but in selections. Each museum wanted to put its highlights on the web first. Portals such as artstor.org, which compile digitised artworks from all over the world, increase this effect even more. Lev Manovich and his students searched there for works of art from 1930 and found only a few dozen paintings that were created outside Europe and North America. By contrast, a search for "Picasso" alone yielded nearly 700 hits. From this you can see how even mass digitisation deepens canonical notions of art and stereotypes instead of broadening our knowledge of global art development. [54] The 2003 book Human Accomplishment: The Pursuit of Excellence in the Arts and Sciences, 800 B.C. to 1950 by Charles Murray was one of the first Big Data applications in historical scholarship and founded the so-called historiometry. Murray had scanned encyclopaedias and handbooks and calculated the significance of personalities from culture, politics and science based on the frequency of mention and length of discussion. From the 4,000 innovators in the respective fields extracted in this way, a raw value is determined and normalised so that the lowest value is 1 and the highest value 100. 55] One will be justifiably shocked at how the 19th century idea of genius is being revived in the 21st century. Even in the digital age, cultural history is written using the example of a few 'masterpieces', geniuses and exceptional artists, and thus continues to look more for the exceptional than the typical. What Murray's study shows, however, is how strongly the acquisition of knowledge is linked to canonical pre-selections, as if one could grasp the essence of the Renaissance by only thoroughly studying Michelangelo, Raphael and Leonardo (and in that order!). In the meantime, the digital humanities are countering this deficiency in all fields, and perhaps soon, conversely, one will recognise the exceptions alongside the Main team, but whether these will then again be the same celebrated masterpieces and exceptional personalities that used to dominate cultural history remains to be seen. For the selection processes are manifold. [56] These selection processes are so firmly anchored in our minds that it is not easy to circumvent them. In a study of photos taken in Manhattan in 2014 and uploaded to Instagram, it became clear that half of the photos were taken in only 12% of the area, namely the tourist areas of Manhattan. This means that even in extraordinarily populous and technologically advanced areas of the world, the existing monuments and institutions have no chance of being handed down to posterity unless they have been hyped up into tourist attractions. The digital turn has not yet led to a significant broadening of the data base! 57] For even if we significantly broaden the data base in terms of categories and take into account as many aspects as possible, all this richness and diversity of works, as currently presented to us in Europeana, for example, does not automatically reflect a comprehensive cross-section of cultural products. So the question is: how do you create an appropriate data set that systematically covers what was created in a particular period, region or medium? If we wanted to conduct a study on painting books, for example, we would have to compile all the painting books, at least of a certain period or culture. But, of course, not all painting books have been digitised by a long shot and we would not be able to do that. But even if we limited our question to the painting books in a large museum like the Metropolitan Museum in New York, whose holdings are available online, we would have to ask ourselves whether everything has really been digitised and put online. And even more, what collecting events, what acquisitions and donations underlie this collection and perhaps distort our picture in a certain direction. 58] However, in some areas of research, such as Greek vases, there are scholarly databases that supposedly list all the important objects in this subject area. We had already talked about this. The Beazley Archive Pottery Database is currently the largest database on pictorial sources from Greek antiquity and contains almost 100,000 entries. But is a database like this, whose data can even be exported, suitable for statistical analysis? To answer this question, we need to take a look at the origin of this data: The foundation of the Beazley Archives is the photo collection of the British vase researcher John Davidson Beazley, who spent his life researching the attribution of Attic vases to different painters' hands. He compiled his findings in extensive lists, and archaeological researchers have used the pages and numbers in these volumes as addressable entries for each vase mentioned there. Beazley's lists included 12,786 Attic black-figure and 21,286 red-figure vessels from Athens, and the Beazley Archives database included 42,265 and 51,908 entries (as of May 2021). The number of catalogued vases has thus tripled since 1970! But you can already see the first limitation here. The Beazley Archive continues to collect primarily Attic vases, even though vessels from other styles and production sites are now also included. [59] Let us take a closer look at the data by evaluating the chronological distribution. Filippo Giudice and his team did this in 2012 and it is immediately apparent that the vases recorded are not evenly distributed chronologically, but have a focus in the first half of the 5th century. However, we are not looking at a Gaussian distribution curve here, but at an art historical model of rise, flowering and decline, because exhibitions and archaeological literature like to regard this period as the heyday of Attic pottery, and have accordingly reproduced a particularly large number of vases from this period in publications. Even Beazley (and the research after him) was not interested in the 4th century and he only dealt with these one hundred years summarily. In our chart, the century, in contrast to the other bars, does not take 25 but 100 years. We will discuss such visualisation errors next week. But the distortion is also evident in the figures. The 2,000 vases in the Beazley archive are contrasted with 12,131 entries in my vase repertory. We thus find that, at least for the 4th century BC, the data in the Beazley Archive are not representative, because collections were not made here with the same intensity as for the 5th century BC. 60] Beazley was already selective and only recorded what he could assign to a painter or a workshop. Of the vases in his private possession, including iconographically very interesting pieces, only very few were recorded by him. As an example, I show a double page from the catalogue of his collection, where only a small lekythos is recorded in his lists of painters, which, after all, form the basis of the Beazley Archives. It should therefore be noted that the recording criteria of the data basis are essential for the evaluation! 61] Lisa Hannestad had already drawn attention to this problem in 1988, using the example of the pottery from two central sites in Athens. She made it clear that Beazley did not record a proportionally constant share of painted Attic vases from the finds of the Agora and Acropolis compared to the known quantity, but rather, for example, a particularly large number of lekythoi, which he could easily connect with workshops. The more sophisticated bowl paintings, on the other hand, are underrepresented here. His sample is therefore not suitable for statements about the geographical distribution of Attic pottery or even individual types of vessels. 62] As you can see, the basic population of all painting books or vases cannot be compiled. It is impossible to gather and examine all the data in question because they are inaccessible or unrepresentative in this form. Therefore we have to make a selection, sampling in English, which conditions our sample. Sampling is therefore a defined method of selecting data for a statistical investigation in such a way that analyses on these data allow conclusions to be drawn about the population without systematic error. But before choosing a method, the frame of reference must be determined, i.e. the collection of units from which the researcher then selects his sample, i.e. the category of the population, such as an epoch or a genre. 63] The simplest would be random sampling. It is like drawing names out of a hat. In random sampling, every entity and every subset in the population has an equal chance of being selected. So each element of the reference frame has the same probability of being selected: the frame is not subdivided or partitioned. This is easy because it is quick and can even be done automatically. In particular, the variance between individual outcomes within the sample is a good indicator of the variance in the population as a whole, which makes it relatively easy to estimate the accuracy of the results. However, because it is a random selection, small samples may result in an unrepresentative sample that may not include an entity at all. For example, in a simple random sample of ten people from the German population, one would expect five women and five men, but it might just be women. Systematic and stratified methods attempt to overcome this problem by using "information about the population" to select a more "representative" sample. [64] In systematic sampling, the population of the study is ordered according to a specific ordering scheme, and then items are selected from this ordered list at regular intervals. An example of systematic sampling would be selecting one person in ten from a sorted list or database. Selecting names from the telephone book, for example, gives a representative distribution by first letter of the surname. The same applies to data sorted by, for example, measurements, dates or location coordinates. As long as the starting point of the selection is chosen randomly and the variable according to which the list is ordered correlates with the variable of interest, systematic sampling is a kind of probability sampling. However, it carries the same risk as random sampling of not being representative if the sample is too small and not precisely matched to the research question. [65] However, systematic sampling is particularly susceptible to periodicities in the list, such as peculiarities in odd house numbers, estimated values in even numbers or weekdays in leap years. This is because if the period is a multiple or factor of the interval used, the sample is likely to be even less representative of the total population than a simple random sample because of the systematic error. This is also true for elements whose peculiarity depends on the predecessors or successors in the list, such as the always identical sequence of motifs or techniques in artists' œuvres ordered by year. Here, there is a high probability that the data will be skewed in one direction, for example, by tending to select watercolours or still lifes more frequently. [66] In stratified sampling, the sampling frame is divided into separate strata, i.e. subgroups, which have been divided according to clear criteria such as genre, material, epoch, motif, origin, size, etc.. Individuals are then randomly drawn from these subpopulations. If, for example, you are examining picture postcards and have divided the sampling frame into motifs, you can specify the sample proportion precisely. For example, you would draw a quarter of the items from the group "Famous buildings" if you know from a previous study that 25.6% of the picture postcards produced in Germany show landmarks and tourist attractions. [67] The stratified random sample offers several advantages. On the one hand, additional statements can be made in this way about subgroups that would not be sufficiently accurately represented in a random sample. Secondly, it may be that elements in one sub-area are better documented than in another. In such cases, using a stratified sampling approach may be more convenient than aggregating data across groups, and more representative because one can use the most appropriate approach for the particular stratum. If, for example, there is a dependence of the range of motifs on the preferences of the respective patrons, one can take this factor into account in the selection of certain motifs, for example by weighting the ruler portraits of absolutism, which were demanded by aristocratic clients and are very numerous, differently from the portraits of 20th century politicians, who were not so inclined to pictorial self-portrayal. However, such weightings are often too complex and also not researched closely enough to actually apply. [68] By focusing on important subpopulations and ignoring irrelevant subgroups, one increases the accuracy and efficiency of the estimation. Therefore, a stratified sampling approach ideally has three advantages: Variability within strata is minimised. Variability between strata is maximised. The variables by which the population is stratified are highly correlated with the desired dependent variable. However, it necessitates the selection of relevant stratification variables, which can be difficult. Therefore, the approach is not useful if there are no homogeneous subgroups. [69] In quota sampling, the population is first divided into mutually exclusive subgroups, just as in stratified sampling, and then a certain proportion of units is selected from each segment. This is the case in targeted surveys where, for example, out of 100 participants, 50 are supposed to be unemployed. The interviewer could then easily find these 50 by going to a job centre. Or you determine that twice as many oil paintings as watercolours of each painter are to be included in order to adequately account for the art claim. By specifying this, quota sampling is not probability sampling, because the selection of the sample is not random. Therefore, one should carefully consider the consequences of segmentation and whether the advantages really outweigh them. [70] Sometimes it is more efficient to select samples by groups ("clusters") clustered by geography or by time period. For example, if we wanted to study the furnishings of church interiors in Germany, we could select fifty socially, denominationally or chronologically stratified congregations and then catalogue each item within the selected churches. However, we could also randomly identify regions, take random churches from those regions and select random furnishings from those churches to compile the sample. But this requires a larger sample than simple random sampling because of the higher variance. [71] Actually, we are almost at the end of our lesson in terms of time. However, since the topic is relatively obscure, I will go overboard this time and add a third part with examples. 72] Geodata can also be statistically evaluated if they have been acquired using random methods. Basically, there are various methods for the acquisition of spatial data from historical and prehistoric periods. As we have already discussed in the previous lesson, information about the settlement structure of past times can be obtained from their above-ground remains, by picking up surface finds (in the survey), by cleaning and smaller excavations, by geodetic and geomagnetic methods or by large-scale excavations. As an example we choose a landscape in western Montana. [73] The example is hypothetical and lists eight different prehistoric and historic structures that would have come to light through excavation: a historic wagon track, three Archaic necropolises, a Palaeoindian quarry, a historic homestead and two Archaic settlements. In order to be able to statistically evaluate the area, it was divided into 27 x 37 (i.e. 999 in total) quadrants of equal size. The diagram here shows all the structures we hypothetically assumed. In the following, we will deal with the question of which sampling, i.e. which type of sample, will yield the best results. [74] A targeted selection of the study area (i.e. a targeted 'sample') is used when one is only interested in certain structures that are already known. In our example, this means that the wagon track and the farmstead are well researched, but the six prehistoric structures remain unknown in this way. Therefore the purposive sample is not a representative sample. 75] Probabilistic sampling, on the other hand, uses statistical methods to study only representative areas of a territory. The percentage depends on the natural conditions, the types of settlement remains and the financial and time possibilities of the researchers. Here, only 5% randomly selected quadrants were investigated, resulting in the acquisition of six of the eight sites. However, larger lots remain unexplored in this way. A different random selection might have identified more (or fewer) sites. 76] The stratified random sample takes more account of the terrain structure. Here, the study area is first roughly divided into different, topographically distinguishable areas, within which quadrants are again randomly selected. In our case, the river was thus a landmark that could be used to distinguish a sloping riverbank zone (above the dotted blue line) and a flatter prairie zone (below the dotted blue line). According to the size of the two parts, two-fifths of the quantifiers were randomly distributed over the riverbank zone visible in the aerial photograph above and three-fifths over the flatter prairie zone. With this stratified random sample, four of the eight sites were encountered. So you can see that the distribution of sites and monuments is not normally distributed, but is subject to spatial or socio-cultural conditions. What one achieves with a random sample therefore depends only on the fortune of the researcher. [77] With a systematic sample, the units are evenly distributed over the study area. In this way, there are no large areas that are not covered, whereby already known structures or areas of special interest (such as valleys or hills) may indicate settlement forms that could be studied in a more small-scale manner. 78] However, a combination of stratified and systematic stratified and systematic sampling. For this purpose, the study area is divided into different, topographically distinguishable areas, within which quadrants are randomly selected. For each of the above-mentioned samples can be useful and none has any prospect of complete acquisition of the existing structures. The consequence of this, however, is that large-scale structures are more likely to be acquired than small-scale ones. [79] Most archaeological excavations are similarly laid out and divide the excavation area into such quadrants. The American excavations at the Agora of Athens, for example, proceed in this way. Here, almost the entire mapped area has been excavated. If we are now interested in the distribution of finds, we can also consider the quantity of what has been preserved as a random sample. For of all the objects that were used in antiquity, each generally has the same probability of having been lost or disposed of. In this way, for example, we can draw a reasonably representative picture of where figuratively painted red-figure pottery was used. [80] Let us stay with red-figure pottery and ask ourselves which fragments we need to include in our database and which not, if we want to work on the motifs on these vessels. As an example, I have chosen small-format amphorae, the so-called pelicas, from the settlement excavation in Olbia on the Black Sea, because the excavators presented all the excavation material in colour photographs. However, you could also take any other settlement excavation. [81] Only fragments that can be clearly assigned to a motif are suitable for iconographic questions. [82] This makes it clear that we can only include a few fragments in our corpus and by no means all of them. [83] Another problem is that not all material from all excavations has been published. If one compares the published fragments from Olbia with those from Athens, it is immediately noticeable that the publication of the agora fragments does not include marginal and foot fragments, fragments of secondary sides and smaller fragments. The different documentation and publication situations have the consequence that the find spectra of the respective sites are difficult to compare with regard to their distribution according to vessel form on the basis of the published inventory. However, this has hardly any effect on the fragments compiled in our sample, because only vases and fragments were recorded here for which the image motif of the main side can be determined. This means that despite very different recording intensities, the same selection criteria were applied everywhere when compiling our sample. This is because red-figure sherds with significant remains of figural painting were recorded at the same intensity at all excavation sites. On this level, the different sites are therefore very comparable with each other. [84] A stratified or stratified sample can minimise the imbalance caused by the selective publication of the pieces if the same recording criteria apply to all sites and contexts. Then, as with random sampling, each element of the population again has the same chance of being included in the sample. If we plot these data on a map, however, we do not get a visualisation of all sites where Attic pottery has been found, but only those from which sherds with figural decoration are known to a significant extent. The caption must therefore read: Significant sites of late red-figure vases from Athens. [85] There are a number of statistical tests for checking representativeness, most of which can only be usefully applied to empirically collected data. However, we can do another test that examines our sample in terms of publication status. To do this, we compare the distribution in different publication phases, e.g. by evaluating the increase in the last 25 years separately. If the results are identical, our sample is representative. In our example, the sample contains 3363 entries that were presented for the first time in the last 25 years, i.e. between 1986 and 2010. This corresponds to 35.9% of the total material. With regard to the distribution across vessel forms, this increase leads to minor deviations, which are less than 2%. Thus, our sample is likely to be independent of preselections and thus representative. 86] Another test concerns the accuracy of the results. The "accuracy of results" has nothing to do with statistical uncertainty. Rather, it refers to the accuracy of a data basis with regard to a change in the data basis by a certain number of data. For example, by specifying the accuracy of results under the total number, an attempt is made to indicate the statistical bias that results from a small size of the material base. This is done by noting how much the percentage distribution of the data would change if a value were unilaterally increased by three. For example, with a base of one hundred objects, E(3) is 3%, which means that the given values would deviate by +/- three percent if three additional objects of a criterion were included. With 400, it is only 0.75%. Within this size, the numerical ratios are therefore no longer relevant. The accuracy of the results thus refers to a fundamental statistical inaccuracy, which is particularly important with small amounts of data, while the confidence interval expresses the estimated representativeness and maximum deviation of the data, which, however, cannot be determined exactly in archaeological evaluations anyway. 87] The methods for quantifying data depend on the data basis, and today we have become acquainted, albeit somewhat mixed up, with three different compositions of data material on which we can conduct our investigations. First, there are analyses on the population, i.e. all existing elements that we want to survey. This population can come from complete surveys such as censuses, inventories, corpora or catalogues of works, or it can be born-digital content, e.g. all posts on Instagram, Twitter, YouTube, on television, etc. The collection of data can be based on a set of data that we have collected in the past. Gathering the data can be quite time-consuming, but after that, apart from a completeness check, no further preliminary investigations are necessary and you can get started with the statistical analysis straight away. On this data, all statistical calculations such as the determination of location and dispersion measures can be carried out, which is especially useful in Big Data applications to get an idea of the composition, distribution and correspondence of the data. There are researchers, such as Lev Manovich, who want to allow only complete populations, if possible, even for cultural datasets. [88] However, this can be very time-consuming and cost-intensive, especially if you have to acquire the data first. That is why analyses based on representative samples are very popular, as they are e.g. surveys, image sets, geo-surveys, etc. are very popular. A representative sample is a proportion of all works available in a particular medium, time or place, selected according to fixed rules. These rules can refer to producers, users or recipients. What is crucial is that they are applied uniformly. In order to be able to create a representative sample, preliminary research on the composition of the population is necessary in order to be able to select the right sampling method depending on the characteristics to be studied. Sampling only makes sense depending on the characteristics to be investigated, because every reduction of the data basis bears the risk of excluding something important. Accordingly, the research question determines the composition and size of the sample. If the sample is representative, i.e. if it represents the population in its entirety, it is also possible to calculate the measures of location and dispersion. 89] In historical image and object studies, however, analyses are often only possible on indeterminate subsets of the population. Great caution is required here, for example, when using data from internet portals and databases, digitising collection catalogues or compiling excavation finds. First of all, you should find out exactly how the data collection came about in order to be able to detect distortions in the data straight away. Subsequently, re-sampling is necessary, which must be carried out depending on the features to be investigated. Calculating location and dispersion measures on this data, which is still relatively uncertain with regard to the population, can be counterproductive, as individual phenomena may be masked in this way. [90] In 2020, a large poster exhibition took place in Hamburg, which was announced as follows: "With almost 400 exhibits by around 200 artists and designers, the exhibition Das Plakat (The Poster) at the Museum für Kunst und Gewerbe offers ... a large-scale and representative overview of the history of the poster from its beginnings in the early 19th century to the present day." And I must say, this was a really beautiful and multi-faceted exhibition in over 170 sections! In conclusion, I would like to ask how representative an exhibition like this can be? [91] The Hamburg Museum für Kunst und Gewerbe owns one of the largest poster collections in the world and the exhibition only shows works from its own collection. First, we would like to examine whether their collection in its entirety can be a representative cross-section of European poster production at all. To do this, we will take a brief look at the acquisition history and, with the help of the exhibition catalogue, try to determine the collecting interests and focal points. The first examples came to the museum in the 1880s, but not as posters, but as "examples of lithographic colour printing", in which the then director Brinckmann was particularly interested. In 1896, Hamburg already had 400 posters. Another focus arose in 1915, when the museum decided to collect all printed matter published on the occasion of the war. Whereas before it had been technical aspects, now it was historical and thematic. "After that, however, apart from minor donations and the usual mailings, the collection lay dormant for almost half a century." In 1964, following an exhibition of the Alliance Graphique International, all the exhibits remained in Hamburg. The focus of the posters now acquired was therefore on graphic design and thus formal design criteria. In the following years, posters by individual artists were again added in connection with exhibitions, forming a further focal point in terms of numbers. Again and again, local designers donated their designs to the museum. In addition, the museum took over a private collection of artists' posters, which shifted the emphasis from graphic and typographic design to artistic realisation. And in the 1990s thematic exhibitions on perfume, sneakers, Art Nouveau or Art Deco led to an expansion of the collection in terms of motifs and chronology. This brief overview makes it clear that collections were made with changing focal points. The collection can therefore not be representative despite its size and diversity. [92] The catalogue names three central genres for the poster: the cultural poster, such as theatre and exhibition posters, the political poster and product advertising. With a study period from 1770 to 2020, the sample does not even include two posters per year. It is obvious that already the three genres cannot be sufficiently represented, but even less so different countries, art movements and topics. The sample is simply too small for an investigation period of 250 years. Chronological focal points are the 1890s (Affichomania or poster mania, i.e. the French poster art of the late 19th century), the period around 1930 (with its, above all, Russian avant-gardes) and around 1970 (with Pop Art). According to the catalogue, the aim was not "to make as broad a selection as possible, but to present the decisive epochs of poster history in the necessary diversity. ... Much had to be omitted, some could only be hinted at." (S. 11) In the introduction to the catalogue (p. 9) one reads expressions such as "the most interesting designs", "outstanding works", "posters of lasting significance", "the great designers" etc.. This already makes it clear that aesthetic criteria determined the selection and not statistical ones. In addition, the museum was able to acquire a significant private collection of artists' posters in the 1990s, which shifted the emphasis from graphic and typographic design to artistic realisation. It should have long been clear to you that this was also a canonical selection. Strictly speaking, the term "representative" is therefore inaccurate. [93] But how would you have to do it if you wanted to examine the history of poster design in a statistically correct way? First of all, it is important to be precise about the question you want to answer with the statistical investigation. Because the type of sample depends on this. Let's say you are interested in the chronological distribution of certain motifs. Then you have to make sure that a sufficient number of posters from each year or at least each decade are included. If you recall again the confidence intervals to indicate the margins of error: If you include 400 posters, the standard deviation is ±5% as long as you are talking about the distribution on one characteristic. If you wanted to evaluate the motifs by decade, for example, you would need 400 posters for each decade to maintain this margin of error. [94] As you will recall, each element of the population must have the same chance of being included in the sample. Therefore, you must not prejudge the outcome with your selection. This is easily the case with stratified samples. But it is also the case with collections that have been created with a certain focus in mind. Conversely, you must not simply close the gaps discovered in the collection (e.g. for posters of the Third Reich), because this would unilaterally expand the data base and possibly distort it with regard to another characteristic. Instead, you must try to achieve an even distribution of all relevant characteristics that is tailored to your question. 95] If necessary, break down the sampling frame into several sub-populations! For if you align the sample not randomly but stratified, for example by having each of the three genres equally represented, you could then also examine your corpus separately by genus. Of course, you may now no longer evaluate the frequency of the genera or subgenera chronologically, since you previously included them in an even distribution. With such a weighting, you can now no longer determine whether, for example, the political poster only gained in importance after the First World War, which is quite possible historically. [96] But the greatest difficulty is probably to determine the total population. And we won't be able to do that without major surveys. In the end, the Museum für Kunst- und Gewerbe had no choice, given the current state of research, but to select typical (non-representative) representatives for a number of posters from their own collection from certain perspectives. This is not a bad thing, as long as the ratios are not expressed in numbers and thus a statistical representativeness is pretended. [97] The quantifying methods make the tension between the humanities and empirical or mathematical disciplines particularly clear. On the one hand, even rudimentary knowledge of statistics is often lacking in the humanities, so that many a pointless investigation is carried out; on the other hand, many researchers are also not sufficiently aware of the manufactured nature of the data, the processes of pre-selection and canonisation. The challenges that this poses on a daily basis concern precisely this reflective handling of data in the humanities. Data sets, tables and diagrams are to be treated like historical sources and questioned about their basis, originator and intention. This also includes the discussion of the right data basis, the appropriate statistical method and adequate visualisation, which we will talk about in more detail in the next lesson. In the past lessons we have often talked about the modelling of blur(s). Of course, it is always important to keep the statistical evaluability in mind. 98] At the end I always list the knowledge we expect from you: you should know different quantitative methods. It is indispensable that you master the basics of statistics. The meaning of normal distribution, types of sample, confidence intervals, accuracy of results, etc. should now no longer be unknown to you. This also includes the analysis procedures of common measures and structures as well as concepts and theories of sampling. But also the basics of georeferenced analyses. 99] As far as your skills are concerned, I would like to pick out three, namely the ability to describe and analyse subject data quantitatively and qualitatively and to select (sample) data appropriately. You should be able to calculate various measures of location and to understand and verify results using statistical methods. In addition, it is important in the study of digital humanities to be able to model correlations between two characteristics and to distinguish between correlations and causal relationships. [100] The possible exam questions should not surprise you now. What do you mean by accuracy of results? What is the difference to the standard deviation? could be a task. Which sampling methods do you know? Describe a method using an example from image science and name its advantages. When should statistical methods be used in the humanities? What are the advantages and dangers? or: Why should one take a critical view of the visualisation of the data collected by Wolfgang Filser shown opposite? 101] With a look at the literature, this time mainly textbooks, I say goodbye to you again. In the next lesson we will continue seamlessly with data visualisation and data exploration. Until then, I wish you all the best!