Launch of the Europeana Newspapers project
A group of 17 European partner institutions have joined forces in the “Europeana Newspapers” project and will, over the next 3 years, provide more than 10 million newspaper pages to the Europeana service.
The Europeana Newspapers project is funded under the Competitiveness and Innovation Framework Program 2007-2013 of the European Commission and aims at the aggregation and refinement of newspapers for The European Library and Europeana.
The project addresses challenges linked with digitized newspapers such as the use of refinement methods for Optical Character Recognition (OCR), article segmentation or Optical Layout Recognition (OLR), named entity recognition (NER) and page class recognition. Refinement methods are used to convert an abstract data model into implementable data structures. OCR is the electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OLR is the electronic segmentation of articles from a scanned page with more than one article. NER seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations etc. These methods will make the newspaper content easier to search and retrieve via a specialist interface at Europeana. The interface will allow the user to find individual articles, perform subject searches and display the full text online; it will be developed by The European Library, Europe’s aggregator for the library sector.
The project will also evaluate the quality of the refinement technologies and transform the local metadata into the Europeana Data Model standardization in close collaboration with stakeholders from the public and private sector. Each library participating in the project will distribute digitized newspapers and full-texts to Europeana.
The Europeana Newspapers project is being led by the Berlin State Library.
Follow the advancements of the Europeana Newspapers project at www.europeana-newspapers.eu. For any further information please contact Hans-Jörg Lieder (Project Leader) or Thorsten Siegmann (Project Coordinator) via info at europeana-newspapers dot eu.
Europeana in a nutshell
Europeana is a multi-lingual online collection of millions of digitized items from European museums, libraries, archives and audiovisual collections. Currently Europeana gives integrated access to 23 million books, films, paintings, museum objects and archival documents from some 2200 content providers from across Europe.