• No results found

72. Recognition and datamining for handwritten text collections

Anders Brun, Ewert Bengtsson, Fredrik Wahlberg, Tomas Wilkinson, Kalyan Ram, Anders Hast, Ekta Vats Partner: Carl Nettelblad, Dept. of Information Technology, Lasse Martensson, Dept. of Business and Economics Studies, H¨ogskolan i G¨avle; Mats Dahll¨of, Dept. of Linguistics and Philology, UU; Alicia Forn´es, Universitat Autonoma de Barcelona, Spain; Jonas Lindstr¨om, Dept. of History, UU

Funding: UU; Swedish Research Council; Riksbankens Jubileumsfond; eSSENCE Period: 20120101–

Abstract: This cross disciplinary initiative takes its point of departure in the analysis of handwritten text manuscripts using computational methods from image analysis and linguistics. It sets out to develop a manuscript analysis technology providing automatic tools for large-scale transcription, linguistic analysis, digital paleography and generic data mining of historical manuscripts. The mission is to develop technology that will push the digital horizon back in time, by enabling digital analysis of handwritten historical materials for both researchers and the public. One postdoc started and several new results were presented. See Figure 64.

Figure 64: Recognition and Datamining for Handwritten Text Collections

73. Writer identification and dating

Anders Brun, Fredrik Wahlberg, Anders Hast, Ekta Vats

Partner: Lasse Martensson, Dept. of Business and Economics Studies, H¨ogskolan i G¨avle; Mats Dahll¨of, Dept. of Linguistics and Philology, UU; Alicia Forn´es, Universitat Autonoma de Barcelona, Spain

Funding: UU; Swedish Research Council; Riksbankens Jubileumsfond; eSSENCE Period:

201401—-Abstract: The problem of identifying the writer of some handwritten text is of great interest in both forensic and historical research. Sadly the magical CSI machine for identifying a scribal hand does not exist. Using image analysis, statistical models of how a scribe used the quill pen on a parchment can be collected. These measurements are treated as a statistical distribution over writing practices. We are using this information to identify single writers and perform style based dating of historical manuscripts. During 2016 we continuted to analyze over 10000 manuscript pages form the collection Svenskt Diplomatarium, from Riksarkivet.

Using our newest methods, based on recent trends in deep learning, we are able to estimate the production

date of a manuscript in this collection with a median error of less than 12 years. See Figure 65.

Figure 65: Writer Identification and Dating

74. Image analysis for landscape analysis Anders Brun

Partner: Bo Malmberg, Michael Nielsen, Dept. of Human Geography, Stockholm University; Anders W¨astfelt, Dept. of Economics, SLU

Funding: SLU; Stockholm University Period:

200901—-Abstract: This project is a collaboration with researchers at SU and SLU. It aims to derive information about rural and city landscapes from satellite images. The project focuses on using texture analysis of im-ages, rather than only pixelwise spectral analysis, to segment the image into different meaningful regions.

This is an ongoing collaboration, which has so far resulted in one patent and one journal publication on the detection of damaged forest from aerial photographies. See Figure 66.

Figure 66: Image Analysis for Landscape Analysis

75. Color names Gunilla Borgefors Funding: UU Period: 20160701–

Abstract: Recently, there is a trend in machine and deep learning applications to use many different, rather random, colour names for image annotation, retrieval, and training. Therefore, naming colours is also important. But what is cerulean to an artist may be just blue to you and the same colour as grass to a Zulu!

In fact, there are many languages that do not have a term for ”blue”, while Russian has two: light blue and dark blue. And these two ”blues” are as different to them as blue and green is to us. The five ”blue” patches in the Figure are taken from an often used set in deep learning applications. One is called just ”blue”, while the others have - according to the authors - self-explanatory names. Can you name them? In this project I investigate results from colour semantics and colour perception experiments to get a better understanding on how different people understand colour names and what the consequences for how you should name colours in various applications. The paper ”The Scarcity of Universal Colour Names” was published in the proceedings of ICPRAM 2018. See Figure 67.

Figure 67: Color names

76. Computerised image processing in handwritten text recognition Raphaela Heil, Anders Hast, Ekta Vats, Anders Brun

Partner: Lasse M˚artensson, Dept. of Swedish Language and Multilingualism, Stockholm University.

Funding: TN-Faculty Period: 20180115–

Abstract: This project is concerned with handwritten text recognition with a special focus on the handling of historical documents. It encompasses the development and implementation of new computational methods for the recognition, transcription and analysis of manuscripts. The long-term strategic goal is to develop a user-friendly tool to support historians, palaeographers and other researchers from the digital humanities in the transcription and analysis of historical material.

77. Historical handwritten text recognition Ekta Vats, Anders Hast

Partner: Per Cullhed - University Library, UU, Lasse M˚artensson - Dept. of Swedish Language and Mul-tilingualism, Stockholm University, Alicia Forn´es - Universitat Autonoma de Barcelona, Spain, Prashant Singh - Dept. of Information Technology, UU

Funding: Swedish e-Science Academy (eSSENCE) Period: 20170501–

Abstract: Automatic recognition of poorly degraded handwritten text is challenging due to complex lay-outs and paper degradations over time. Typically, an old manuscript suffers from degradations such as paper stains, faded ink and ink bleed-through. There is variability in writing style, and the presence of text and symbols written in an unknown language. This hampers the document readability, and renders the task of transcription and word spotting in a set of non-indexed documents, to be more difficult. The aim of this project is to facilitate basic research on handwritten text recognition by developing efficient methods for recognition of complex handwritten text using advanced HTR technology. The present investigation be-longs to a set of methods known as word spotting, that accelerate the word recognition process by finding multiple instances of a word on-the-fly in a set of unedited material. PI Anders Hast, along with postdoc Ekta Vats, have achieved significant advances in HTR research with scientific peer-reviewed publications that are highly relevant to this project. See Figure 68.

Figure 68: Historical Handwritten Text Recognition

5.6 Cooperation partners

Related documents