• No results found

The Swedish National Archives digital preservation

N/A
N/A
Protected

Academic year: 2022

Share "The Swedish National Archives digital preservation"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

The Swedish National Archives digital preservation

Mats Berggren, IT-department, 2018-11-29

(2)

Swedish National Archives digital preservation

• Born-digital information

• Digitization of documents

• Digital preservation at the National Archives

• The use of standards

(3)

• Born-digital information

• Digitization of documents

• Digital preservation at the National Archives

• The use of standards

Swedish National Archives digital preservation

(4)

• No fixed delivery time, data files recieved can be new and old

• Deliveries are negotiated between the agencies and the National Archives.

Funding are transferred from the agencies to the National Archives

• When agencies are closed down the archives are transferred to the National Archives

• Register laws

• Currently no common record management standard in Sweden

Recieving born-digital data from agencies

(5)

• The National Archives issues regulations for digital preservation in the Swedish agencies

– RA-FS 2009:1, RA-FS 2009:2

• Archive file formats

– Text files (ISO 8859-1, Unicode) – HTML

– XML (also GML and SGML) – PDF (PDF/A-1)

– JPEG, TIFF and PNG

Regulations for agencies

(6)

Common deliveries of ”born-digital”-material

• Databases, data exported as textfiles or XML-files

• Web-pages, Agency web sites are archival data

• Record management systems, database and PDF-documents

• Collections of documents

• Government committes, many small deliveries

(7)

• Born-digital information

• Digitization of documents

• Digital preservation at the National Archives

• The use of standards

Swedish National Archives digital preservation

(8)

• Scanning of documents, church records etc, MKC Fränsta

• Microfilm scanning, SVAR Ramsele

• Microfilm scanning by FamilySearch in Salt Lake City, USA.

Delivery to SVAR Ramsele. Church records and judicial records

Digitization of documents

(9)

• Most scanning projects within the National Archives produce raw TIFF- files of these three types:

– TIFF/IT (TIFF 6.0), Grayscale, BitsPerSample=8, 300dpi – TIFF/IT (TIFF 6.0), Group4 B/W, BitsPerSample=1, 400 dpi

– TIFF/IT (TIFF 6.0), Colour RGB, BitsPerSample=8x3, 300 dpi

• DJVU, Used for presentation and public access. Converted from TIFF.

Proprietary format

• JPEG, Used by a few projects. Accepted as delivery format from agencies

Image formats

(10)

1 Planning 4 Preparation 6 Ocular image control

12 Import of references to

Arkis2 11 Image

server 2 Database

registration 3 Fetch originals 7 TIFF header

update and extract

9 Create DJVU- files for viewing 5 Scanning

13 LTO-tapes in Stacker

FOSAM / MKC

10 Delivery viaFTP 10 Delivery on

LTO-tape

(11)

1 Planning 4 Scanning 5 Ocular image control

12 Import of references to

Arkis2 11 Image

server 2 Database

registration 3 Preparation 8 TIFF header

update and extract

9 Create DJVU- files for viewing

13 LTO-tapes in Stacker

Microfilm scanning

10 Delivery viaFTP 10 Delivery on

LTO-tape 7 Control

(12)

1 Planning

12 Import of references to

Arkis2 11 Image

server 2 Database

registration 8 TIFF header

update and extract

9 Create DJVU- files for viewing

13 LTO-tapes in Stacker

GSU

10 Delivery viaFTP 10 Delivery on

LTO-tape 7 Control

Delivery of scanned TIFF-images from Genealogical Society of Utah (GSU)

Salt Lake City, USA

(13)

Digitization of audiovisual media

Project DIANA:

Digitization of audiovisual media, audio and video

Digitization done in house by the National Archives

Digitization also done by the Royal Library for the National Archives

Project started 2015, digitization started 2017

(14)

• Formats for long term storage:

– Audio: WAV

– Video: Matroska / FFV1

• Presentation formats:

– Audio: MP3 – Video: MPEG-4

Audiovisual formats

(15)

• Born-digital information

• Digitization of documents

• Digital preservation at the National Archives

• The use of standards

Swedish National Archives digital preservation

(16)

Digital preservation at the National Archives

History:

Archival deliveries of digital data since the 1970:s

Large scale digitization of documents since 2003

A Hierarchical Storage System (HSM) installed 2004

A new storage platform becomes necessary 2007

A new platform RADAR is developed based on the OAIS-model

RADAR (archiving digital images) since 2009

RADAR (archiving “Born-Digital” from agencies) since 2013

(17)

What is RADAR:

Digital preservation of both “Born-Digital” data and digital images

Several copies in geographically separated locations

Provenance and descriptive metadata (ARKIS/NAD)

Technical metadata and preservation metadata (ARKIS)

Standardized metadata formats (METS, PREMIS etc)

Specially developed system for archival storage (ESSArch)

Can be extended with new modules and tools

Media migration (Not automated)

Scheduled media validation (Not automated)

Format migration (Not automated)

A platform for digital preservation (RADAR)

(18)

OAIS model

(19)

The Swedish National Archives platform for digital preservation (RADAR)

ESSArch

Archival Storage System

Public use Searching in the national archival database (NAD)

Digital Chain

Ingest from scanning Application for RALF

control and preparation at

the agencies

Application forKRAM

ingest and control

ARKIS

Archival Information System

Employee agency

Employee National Archives

Access andKRAM

dissemination of databases

Employee National Archives

CARMEN

Search applications for databases from

agencies

Employee National Archives

Employee National Archives

(20)

RALF – The National Archives tool for preparation of archival transfers. Used by agencies.

Can do basic controls and creates a submission package (SIP)

KRAM – Control and validation framework. An application that controls and validates SIP:s from agencies. KRAM kan also be used to convert data from older transfers. KRAM is also used to load files exported from agency databases into a SQL-database

Digital chain – The National Archives digitization of documents. Masterfiles in TIFF-format is packed in AIP:s and stored for long term preservation in RADAR

ARKIS – The National Archives archival information system. Contains archival descriptions and metadata about all archival objects, including digital objects

ESSArch – The National Archives ”storage management system”. Manages the physical storage on tape (LTO4) and disks. Packs AIP:s in TAR-format. Performs checksum-controls.

Logs all ingest- and dissemination-events. ESSArch is an Open Source application and is also used by the National Archives in Norway

CARMEN – Search applications for databases (about 30) delivered from agencies

RADAR parts

(21)

• Born-digital files from agencies: about 8 TB

– Currently in RADAR: 1972 AIP:s (about 6.1 TB)

• Audio-video files and multimedia: approximately 100 TB (so far)

• Digitized paper volumes (one AIP per volume): 524144

• Digitized images (TIFF-format): 2.9 PB (In one copy)

• Images total: 208.2 million

• Images published on Internet: 65.7 million

• DJVU-files (presentation format): 40 TB

• Total storage: 5.8 PB (Two copies)

Digital information at the National Archives

(22)

• Born-digital information

• Digitization of documents

• Digital preservation at the National Archives

• The use of standards

Swedish National Archives digital preservation

(23)

• ISAD(G) and ISAAR(CPF)

– The Archival information system ARKIS is modelled after these standards

• EAD and EAC-CPF

– These formats are used as exchange formats for archival description information in Sweden

– Supported by several commercial archival information systems – Import and export functions in ARKIS

– A new Swedish EAD and EAC-CPF adaptation (FGS)

• OAIS

– Widely adopted in Sweden not only by the Swedish National Archives – Several commercial E-Archive system claim to be OAIS-compliant

Archival standards

(24)

METS (Metadata Encoding & Transmission Standard) - Structure for encoding descriptive, administrative, and structural metadata (DLF/LOC) (2004)

PREMIS (Preservation Metadata) - A data dictionary and supporting XML schemas for core preservation metadata needed to support the long-term preservation of digital materials (OCLC/LOC) (2005)

MIX (NISO Metadata for Images in XML) - XML schema for encoding technical data elements required to manage digital image collections (ANSI/NISO) (2006)

EBUCore – XML-format for metadata for audio files and video files. Developed and supported by the European Broadcasting Union (EBU)

Other formats

ADDML (Archival Data Description Markup Language) – XML-format used by the National Archives of Norway and Sweden, XML-format for describing flat files exported from databases (2001, 2009)

Standards for preservation metadata

(25)

Thank you!

Tack så mycket!

mats.berggren@riksarkivet.se

References

Related documents

Basal plasma levels of PREG, DHEA and CORT, as well as stress levels of CORT, were compared between With Leghorn (WL) and Red Junglefowl (RJF) birds (12 birds of each breed and

Järveläinen (2016) in his literature review argued that organizations can have several kinds of security policies for different employees and purposes. Based on that

den som Lewis i anslutning till Raleigh vältaligt prisat, och hon kommer när­ mast att med den famösa kvinnliga intuitionen uppleva det som sin plikt att följa

The Swedish historical archives permit us to systematically study aspects of the social life of families who lived more than a hundred years ago.. Analytical and

The different work elements in the plots can be done individually or as a team. The most elements are divided for individual work and only the first tasks in the plot are done as

Specifically, we sought to determine the length of synthetic peptides of the gen- eral formula Cys-(Pro) n -Trp by (i) measuring intramolecular quenching of conjugated TMRM

* data used in this study The following analyses were applied: Prevalence of PPI use 2006 A one year prevalence of PPI use was calculated by identifying the number of individuals

HTML skapades helt enkelt inte för att man skulle kunna göra avancerade sökningar gentemot stora databaser utan är ett instrument för att presentera information på ett snyggt