Skip to content
Snippets Groups Projects
Commit fba72018 authored by Steven Murray's avatar Steven Murray
Browse files

Applied corrections from German to CHEP 2016 paper

parent 483f0032
Branches
Tags
No related merge requests found
No preview for this file type
......@@ -16,10 +16,10 @@ tape the custodial copy of the physics data generated by the LHC experiments.
Physics run 3 will start in 2021 and will introduce two major challenges for
which the tape archive software must be evolved. Firstly the software will need
to make more efficient use of tape drives in order to sustain the predicted data
rate of 100 petabytes per year as opposed to the current 40 petabytes per year
of Run-2. Secondly the software will need to be seamlessly integrated with EOS,
which has become the de facto disk storage system provided by the IT Storage
group for physics data.
rate of 150 petabytes per year as opposed to the current 50 petabytes per year.
Secondly the software will need to be seamlessly integrated with EOS, which has
become the de facto disk storage system provided by the IT Storage group for
physics data.
The tape storage software for LHC physics run 3 is code named CTA (the CERN Tape
Archive). This paper describes how CTA will introduce a pre-emptive drive
......@@ -42,14 +42,14 @@ group plans to put CTA into production by the beginning of 2019, ready for
experiments to start migrating to it during the long shut down period between
LHC physics runs 2 and 3.
During 2016, LHC physics run 2 archived over 40 petabytes of physics data to
tape using CASTOR. LHC physics run 2 will continue at similar data rates during
2017 and 2018.
In 2016, LHC physics run 2 archived over 50 petabytes of physics data to tape
using CASTOR. The data rates of LHC physics run 2 will increase up to 100
petabytes per year during 2017 and 2018.
LHC physics run 3 will start in 2021 and is predicted to store over 100
LHC physics run 3 will start in 2021 and is predicted to store over 150
petabytes per year. Run 3 will use EOS and CTA as opposed to CASTOR to archive
data to tape. The CTA project will make more efficient use of tape drives in
order to deal with the predicted 100 petabytes of physics data per year. CTA
order to deal with the predicted 150 petabytes of physics data per year. CTA
will accomplish this by introducing a pre-emptive drive scheduler that will keep
tape drives running at full speed \cite{new} all of the time. Physics data on
tape are not only accessed by physicists. These data are read back for data
......@@ -67,7 +67,7 @@ steps taken to archive a file to tape and then to retrieve it back to disk.
Section \ref{concepts} describes the concepts that an operator needs to
understand in order to configure and work with CTA. Section \ref{scheduler}
describes the pre-emptive drive scheduler of CTA and how it will enable the IT
storage group of CERN to handle the 100 petabytes per year transfer rate of LHC
storage group of CERN to handle the 150 petabytes per year transfer rate of LHC
physics run 3. Section \ref{migrating} describes how and when LHC experiments
and other users of CASTOR will migrate to EOS and CTA. Finally, section
\ref{conclusion} draws the paper to a close with its conclusions.
......@@ -98,7 +98,7 @@ In addition to its normal EOS duties, the EOS manager server also queues
requests with the CTA front-end in order to have EOS disk files archived to tape
or retrieved back to disk. The EOS workflow engine is the internal EOS
component that is responsible for queuing requests with the CTA front-end. The
EOS workflow engine and its configuration is the glue that holds EOS and CTA
EOS workflow engine and its configuration form the glue that holds EOS and CTA
together.
The EOS workflow engine can be configured to provide end users with the
......@@ -107,13 +107,13 @@ following different storage system behaviours.
\item D1T0 - Disk only files.
\item D1T1 - Files replicated on both disk and tape.
\item D0T1 - Tape files cached on disk.
\item Asynchronous tape file retrievals.
\item Synchronous tape file retrievals.
\item Explicit tape file retrievals.
\item Implicit tape file retrievals.
\end{itemize}
In the case of an asynchronous tape file transfer, a user issues a bring on-line
In the case of an explicit tape file transfer, a user issues a bring on-line
request for an EOS file that is on tape but not on EOS disk. They poll EOS
until the file has been retrieved from tape. Once the file has been retrieved,
the user accesses the file on EOS disk. In the case of a synchronous tape
the user accesses the file on EOS disk. In the case of an implicit tape
retrieval, a user is blocked when they try to open an EOS file that is on tape
but not on EOS disk. They are unblocked when the file has been retrieved from
tape.
......@@ -173,7 +173,7 @@ The following steps describe how a file is archived to tape:
file is safely archived.
\item EOS updates its file namespace to reflect the fact that the file is now
safely archived.
\end{enumerate} \cite{new}
\end{enumerate}
\subsection{Retrieving a file from tape}
......@@ -263,12 +263,12 @@ EOS users and groups of EOS users.
CTA does not utilize a central daemon for scheduling. Instead the scheduler is a
shared software component, running as needed on the front ends and in the tape
servers. The scheduler routines store and retrieve data from the two persistent
stores of CTA: the file catalog for keeping track of tapes, tape pools,
stores of CTA: the file catalogue for keeping track of tapes, tape pools,
routes, priorities, and tape files and the object store which keeps track of the
queued archive and retrieve jobs, as well as tape drives statuses.
\subsection{Request queuing}
When a client EOS instance requests a new transfer, the catalog is queried to get
When a client EOS instance requests a new transfer, the catalogue is queried to get
routing information for archives or tape file location for retrieves. The
jobs are then added to their corresponding queues in the object store. For
archiving, jobs are queued per tape pool, while for retrieves, they are
......@@ -276,19 +276,19 @@ attached to one tape at a time. Each queue also keeps track of the summary of th
jobs it contains to allow efficient mount scheduling.
\subsection{Tape mount scheduling and job handling}
The tape drives scheduling has to take into account several competing requirements.
User-initiated accesses in both directions should be executed within a bound latency
(measured in the order of hours). As mounting and dismounting tape cartridges to and
from the drive costs about a minute each, the execution
of data accessed is postponed until the amount of data to transfer makes it worthwhile
or the job age is too high. The user initiated mounts create an irregular
demand, driven by the accelerator cycles and experiments data taking, as well as
various analysis patterns.
The maintenance tasks, retrieves and archives for repack, and verifications are
high bandwidth tasks with very relaxed latency requirements which could extend to
several months. Those low priority tasks get the drives when the user-initiated tasks
are not using the drives.
Several competing requirements need to be taken into account when scheduling tape
drives. User-initiated accesses in both directions should be executed within a
bound latency (measured in the order of hours). As mounting and dismounting tape
cartridges to and from a drive costs about a minute each, the execution of a data
access is postponed until the amount of data to transfer makes it worthwhile
or the job age is too high. User initiated mounts create an irregular demand,
driven by the accelerator cycles and experiment data taking, as well as various
analysis patterns.
The maintenance tasks of retrieving and archiving for repack, and retrieving for
data verification are high bandwidth tasks with very relaxed latency
requirements which could extend to several months. These low priority tasks
should get drive time when user-initiated tasks are not using the drives.
When a drive is idle and ready, the tape daemon process retrieves the summaries of
all the non-empty queues from the object store and picks the highest priority queue
......@@ -302,19 +302,20 @@ executes them.
This single step scheduling allows flexibility with the scheduling rules and
adjustments are expected as experience with CTA grows.
\subsection{Tape drive preemption}
A long mount can last up to eight or more hours, which is greater than the
acceptable latency for user tasks. In order to not let the long-running
low priority mounts block the drives from the higher priority ones, the tape
\subsection{Tape drive pre-emption}
A background task that repacks or verifies a whole tape can take several hours.
Such long duration background tasks must be pre-empted when higher priority user
tasks arrive in order to meet the latency requirements of users. The tape
daemon will keep polling the scheduling information at a low rate and interrupt
a low mount priority if a higher priority one is available to replace it.
a low priority background task if a higher priority one is available to replace
it.
This mixing of high and low priority tasks previously had to be handled by hand or \textit{ad hoc}
scripts in CASTOR.
\subsection{CTA to EOS communication and other operations}
The scheduler is responsible for reporting the steps in the lifecycle of each file.
It coordinates the updating of the file catalog, and EOS.
The scheduler is responsible for reporting the steps in the life cycle of each
file. It coordinates the updating of the file catalogue, and EOS.
Finally, the scheduler component handles queue listing and other house keeping
tasks initiated through the front end by operators, via a command line tool.
......@@ -347,7 +348,7 @@ a drop in replacement for a CASTOR storage element.
Once experiments have replaced their CASTOR instance with EOS and CTA they
will in fact have two EOS instances. The original EOS instance for storing
files they work on and modify and the new EOS instance acting as a staging
area for the CTA tape backend. Figure \ref{consolidated} simply shows that
area for the CTA tape backend. Figure \ref{consolidated} shows that instead and
at the choice of an experiment and the IT operations teams, the two EOS
instances could be merged into one.
......@@ -357,7 +358,7 @@ instances could be merged into one.
\caption{\label{consolidated}Consolidate EOS if desired}
\end{figure}
CASTOR currently stores the custodial copy of all LHC physic data on tape.
CASTOR currently stores the custodial copy of all LHC physics data on tape.
Migrating data from CASTOR to EOS and CTA will be very efficient because
CASTOR and CTA share the same tape format. This means only the metadata
needs to be copied from CASTOR to EOS and CTA, no files needs to be copied
......@@ -365,16 +366,15 @@ between CASTOR and CTA tapes. CTA will simply take ownership of CASTOR
tapes as they are migrated from CASTOR to CTA.
The milestones for the CTA project are as follows. In the second quarter of
2017 an internal release of CTA will be made that does not have the ability to
repack tape media. This release is intended for redundant use cases within the
IT-ST group such as additional backups of filer data (AFS/NFS) and additional
copies of data from the Large Electron Position collider (LEP). In the second
quarter of 2018 the first production release of CTA will be made. This release
will have the ability to repack tape media and is intended to migrate small
virtual organizations such as non-LHC experiments from CASTOR to EOS and CTA.
Finally in the fourth quarter of 2018, the second production release of CTA will
be made. This release will be used to migrate large virtual organizations such
as LHC experiments from CASTOR to EOS and CTA.
2017 an internal release of CTA will be made that is intended for redundant use
cases within the IT-ST group such as additional backups of filer data (AFS/NFS)
and additional copies of data from the Large Electron Position collider (LEP).
In the second quarter of 2018 the first production release of CTA will be made.
This release will have the ability to repack tape media and is intended to
migrate small virtual organizations such as non-LHC experiments from CASTOR to
EOS and CTA. Finally in the fourth quarter of 2018, the second production
release of CTA will be made. This release will be used to migrate large virtual
organizations such as LHC experiments from CASTOR to EOS and CTA.
\section{Conclusion} \label{conclusion}
......@@ -386,7 +386,7 @@ on providing efficient tape backend storage.
CTA will introduce pre-emptive drive scheduling which will automatically
schedule the background tasks of tape media repacking and data verification
This automatic scheduling will use the tape drives at full speed all of the time
and therefore enable CTA to cope with the 100 petabytes per year data rate of
and therefore enable CTA to cope with the 150 petabytes per year data rate of
LHC physics run 3.
CASTOR and CTA share the same tape format. This means migrating data from
......@@ -395,7 +395,7 @@ ownership of CASTOR tapes.
In addition to the hierarchical namespace of EOS, CTA will have its own flat
catalogue of every file archived to tape. This redundancy in metadata will
provide an additional recovery tool in the case of disaster.
provide an additional recovery tool in case of disaster.
The architecture of CTA has benefited from a fresh start. Without the need to
preserve the internal interfaces of the CASTOR networked components, CTA has
......@@ -410,7 +410,7 @@ LHC physics runs 2 and 3.
\begin{thebibliography}{9}
\bibitem{CASTOR} CASTOR homepage {\it http://cern.ch/castor}
\bibitem{experiences} Cancio G, Bahyl V, Kruse D F, Leduc J, Cano E and Murray S Cano E 2015 Experiences and challenges running CERN's high capacity tape archive \textit{J. Phys.: Conf. Series} \textbf{664} 042006
\bibitem{experiences} Cancio G, Bahyl V, Kruse D F, Leduc J, Cano E and Murray S 2015 Experiences and challenges running CERN's high capacity tape archive \textit{J. Phys.: Conf. Series} \textbf{664} 042006
\bibitem{EOS} EOS homepage {\it http://cern.ch/eos}
\bibitem{new} E Cano, S Murray, D F Kruse, V Kotlyar and D C\^{o}me The new CERN tape software - getting ready for total performance \textit{J. Phys.: Conf. Series} \textbf{664} 042007
\bibitem{repack} Kruse D F 2013 The repack challenge {\it Jour. of Phys.: Conf. Ser.} \textbf{513} 042028
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment