Skip to content
Snippets Groups Projects
Commit 2ab84eda authored by Michael Davis's avatar Michael Davis
Browse files

Adds a few key glossary definitions

parent 75ae3908
Branches
Tags
No related merge requests found
No preview for this file type
No preview for this file type
*.aux
*.glo
*.gls
*.glsdefs
*.ist
*.log
*.out
*.pdf
*.toc
......@@ -27,7 +27,7 @@
\usetikzlibrary{positioning}
%\usepackage{hyperref}
% Glossaries
% Glossary
\usepackage{glossaries}
\makeglossaries
......@@ -41,6 +41,8 @@
% Front Matter
\tableofcontents
\input{cta_Glossary.tex}
% Chapters
......
\setacronymstyle{long-short-desc}
% Acronyms
\newacronym{cta}{CTA}{CERN Tape Archive}
\newacronym{castor}{CASTOR}{CERN Advanced STORage Manager}
\newacronym{hsm}{HSM}{Hierarchical Storage Management}
% Term definitions
\setacronymstyle{long-short-desc}
\newacronym[description={Refers to CERN experiments (ALICE, ATLAS, CMS, LHCb) or other groups which require separate data
storage, such as the International Linear Collider (ILC)}]{vo}{VO}{Virtual Organisation}
\newacronym[description=Data and Storage Services group in the CERN IT department]{dss}{DSS}{Data and Storage Services}
% Glossary entries
\newglossaryentry{eos}
{
name=EOS,
description={A disk-based, low-latency storage service with a highly-scalable hierarchical namespace, using the XRoot
protocol for data access possible}
}
% Print the glossary
\printglossaries
\chapter{Introduction}
The main objective of the CTA project is to develop a prototype tape archive system that transfers files directly between
remote disk storage systems and tape drives. The concrete remote storage system of choice is EOS.
The main objective of the \gls{cta} project is to develop a prototype tape archive system that transfers files directly
between remote disk storage systems and tape drives. The concrete remote storage system of choice is \gls{eos}.
The data and storage services (DSS) group of the CERN IT department currently provides a tape archive service. This
service is implemented by the hierarchical storage management (HSM) system named the CERN advanced storage manager
(CASTOR). This HSM has an internal disk-based storage area that acts as a staging area for tape drives. Until now this
staging area has been a vital component of CASTOR. It has provided the necessary buffer between the multi-stream,
block-oriented disk drives of end users and the single-stream, file-oriented tape drives of the central tape system.
Assuming the absence of a sophisticated disk to tape scheduling system, at any single point in time a disk drive will be
required to service multiple data streams whereas a tape drive will only ever have to handle a single stream. This means
that a tape stream will be at least one order of magnitude faster than a disk stream. With the advent of disk storage
solutions that stripe single files over multiple disk servers, the need for a tape archive system to have an internal
disk-based staging area has become redundant. Having a file striped over multiple disk servers means that all of these
disk-servers can be used in parallel to transfer that file to a tape drive, hence using multiple disk-drive streams to
service a single tape stream.
The \gls{dss} currently provides a tape archive service. This service is implemented by the \gls{hsm} system
named the \gls{castor}. This \gls{hsm} has an internal disk-based storage area that acts as a staging area for
tape drives. Until now this staging area has been a vital component of \gls{castor}. It has provided the necessary
buffer between the multi-stream, block-oriented disk drives of end users and the single-stream, file-oriented tape
drives of the central tape system. Assuming the absence of a sophisticated disk to tape scheduling system, at any
single point in time a disk drive will be required to service multiple data streams whereas a tape drive will only
ever have to handle a single stream. This means that a tape stream will be at least one order of magnitude faster
than a disk stream. With the advent of disk storage solutions that stripe single files over multiple disk servers,
the need for a tape archive system to have an internal disk-based staging area has become redundant. Having a file
striped over multiple disk servers means that all of these disk-servers can be used in parallel to transfer that
file to a tape drive, hence using multiple disk-drive streams to service a single tape stream.
The CTA project is a prototype for a very good reason. The DSS group needs to investigate and learn what it means to
provide a tape archive service that does not have its own internal disk-based staging area. The project also needs to
keep its options open in order to give the DSS group the best opportunities to identify the best ways forward for
reducing application complexity, easing code maintenance, reducing operation overheads and improving tape efficiency.
The \gls{cta} project is a prototype for a very good reason. The \gls{dss} group needs to investigate and learn
what it means to provide a tape archive service that does not have its own internal disk-based staging area. The
project also needs to keep its options open in order to give the \gls{dss} group the best opportunities to identify
the best ways forward for reducing application complexity, easing code maintenance, reducing operation overheads
and improving tape efficiency.
The CTA project currently has no constraints that go against collecting a global view of all tape, drive and user
request states. This means the CTA project should be able to implement intuitive and effective tape scheduling policies.
For example it should be possible to schedule a tape archive mount at the point in time when there is both a free drive
and a free tape. The architecture of the CASTOR system does not facilitate such simple solutions due to its history of
having separate staging areas per experiment and dividing the mount scheduling problem between these separate staging
areas and the central tape system responsible for issuing tape mount requests for all experiments.
The \gls{cta} project currently has no constraints that go against collecting a global view of all tape, drive and
user request states. This means the \gls{cta} project should be able to implement intuitive and effective tape
scheduling policies. For example it should be possible to schedule a tape archive mount at the point in time
when there is both a free drive and a free tape. The architecture of the \gls{castor} system does not facilitate
such simple solutions due to its history of having separate staging areas per experiment and dividing the mount
scheduling problem between these separate staging areas and the central tape system responsible for issuing tape
mount requests for all experiments.
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment