Adrian Calma - Active Learning with Uncertain Annotators

40,10 €

Direkt bei Thalia AT bestellen

Produktbeschreibung

In the digital age, many applications can benefit from collecting data. Classification algorithms, for example, are used to predict the class labels of samples (also termed data points, instances or observations). However, these methods require labeled instances to be trained on. Active learning is a machine learning paradigm where an active learner has to train a model (e.g., a classifier) which is in principle trained in a supervised way. Active learning has to be done by means of a data set where a low fraction of samples are labeled. To obtain labels for the unlabeled samples, the active learner has to ask an annotator (e.g., a human expert), generally called oracle, for labels. In most cases, the goal is to maximize some metric assessing the task performance (e.g., the classification accuracy) and to minimize the number of queries at the same time. Therefore, active leaning strategies aim at acquiring the labels of the most useful instances. However, many of those strategies assume the presence of an omniscient annotator providing the true label for each instance. But humans are not omniscient, they are error-prone. Thus, the previous assumption is often violated in real-world applications, where multiple error-prone annotators are responsible for labeling. First, the concept of dedicated collaborative interactive learning is described with focus on the first two research challenges: uncertain and multiple uncertain oracles. Next, the state-of-the-art in the field of active learning is presented is presented by an extended literature review. As there is a lack of publicly available data sets that contain information regarding the degree of belief (confidence) of an annotator regarding the provided labels, methods for realistically simulating uncertain annotators are introduced. Then, a first approach that considers the confidences provided by an annotator and transforms them into gradual labels is presented. The suitability of the gradual labels is evaluated in a case study with two annotators that label 30 000 handwritten images. Afterward, the meritocratic learning is introduced, which adopts the merit principle to select annotators for labeling an instance and to weigh their provided labels. By preferring superior annotators, a better label quality is reached at smaller labeling costs. These important steps pave the way to future dedicated collaborative interactive learning, where many experts with different expertise collaborate, label not only samples but also supply knowledge at a higher level such as rules, with labeling costs that depend on many conditions. Moreover, human experts may even profit by improving their own knowledge when they get feedback from the active learner.