Commit 6cc9fa8c authored by Michael Davis's avatar Michael Davis
Browse files

[CHEP] Adds to Introduction

parent 49b3fffb
......@@ -19,3 +19,29 @@
note = {{ADC Technical Coordination Board Meeting}}
}
@inproceedings{castor2007,
author = {Lo Presti, Giuseppe and Olof Barring and Alasdair Earl and Rosa Maria Garcia Rioja and Sebastien Ponce and Giulia Taurelli and Dennis Waldron and Dos Santos, Miguel Coelho},
title = {{CASTOR:} {A} Distributed Storage Resource Facility for High Performance Data Processing at {CERN}},
booktitle = {24th {IEEE} Conference on Mass Storage Systems and Technologies {(MSST} 2007), 24--27 September 2007, San Diego, California, {USA}},
pages = {275--280},
year = {2007},
crossref = {DBLP:conf/mss/2007},
url = {http://doi.ieeecomputersociety.org/10.1109/MSST.2007.7},
doi = {10.1109/MSST.2007.7},
timestamp = {Mon, 18 May 2015 17:23:13 +0200},
biburl = {https://dblp.org/rec/bib/conf/mss/PrestiBERPTWS07},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@proceedings{DBLP:conf/mss/2007,
title = {24th {IEEE} Conference on Mass Storage Systems and Technologies {(MSST} 2007), 24--27 September 2007, San Diego, California, {USA}},
publisher = {{IEEE} Computer Society},
year = {2007},
url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4367953},
isbn = {0--7695--3025--7},
timestamp = {Mon, 18 May 2015 17:23:13 +0200},
biburl = {https://dblp.org/rec/bib/conf/mss/2007},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
......@@ -52,13 +52,111 @@ will interface with EOS and CTA.
\section{Introduction}
\label{introduction}
\section{Use Cases}
The CERN Tape Archive (CTA) is the new storage system for the custodial copy of the CERN physics data. It has
been developed by the IT Storage Group (IT--ST) as the tape back-end to the EOS~\cite{citation_required} disk
system. CTA is an evolution of CASTOR~\cite{castor2007}. It is planned that CTA will be put into production
for Run--3 of the Large Hadron Collider (LHC). Data from the LHC experiments will be migrated from CASTOR
to CTA during the second Long Shutdown which starts in January 2019.
The architecture of CTA was described in~\cite{chep2016}.
% From previous CHEP paper:
%
% Abstract
%The IT Storage group at CERN develops the software responsible for archiving to
%tape the custodial copy of the physics data generated by the LHC experiments. Physics run 3
%will start in 2021 and will introduce two major challenges for which the tape archive software
%must be evolved. Firstly the software will need to make more efficient use of tape drives in
%order to sustain the predicted data rate of 150 petabytes per year as opposed to the current
%50 petabytes per year. Secondly the software will need to be seamlessly integrated with EOS,
%which has become the de facto disk storage system provided by the IT Storage group for physics data.
%The tape storage software for LHC physics run 3 is code named CTA (the CERN Tape
%Archive). This paper describes how CTA will introduce a pre-emptive drive scheduler to use
%tape drives more efficiently, will encapsulate all tape software into a single module that will
%sit behind one or more EOS systems, and will be simpler by dropping support for obsolete
%backwards compatibility.
%
% In 2016, LHC physics run 2 archived over 50 petabytes of physics data to tape using CASTOR.
% The data rates of LHC physics run 2 will increase up to 100 petabytes per year during 2017 and
% 2018.
% LHC physics run 3 will start in 2021 and is predicted to store over 150 petabytes per year.
% Run 3 will use EOS and CTA as opposed to CASTOR to archive data to tape. The CTA project
% will increase tape drive utilization in order to deal with the predicted 150 petabytes of physicsdata per year. CTA will accomplish this by introducing a pre-emptive scheduler that will keep
% tape drives running at full speed [4] all of the time. Physics data on tape are not only accessed
% by physicists. These data are read back for data verification and read and re-written for tape
% media repack[5] campaigns. A repack campaign reads data from older lower capacity tapes
% and writes them to newer higher capacity tapes. This enables the CERN computing centre to
% store more data whilst occupying the same amount of floor space. The pre-emptive scheduler
% will improve tape drive efficiency by automatically filling otherwise idle tape drive time with
% background jobs for data verification and tape media repacking.
%\noindent The High Energy Physics (HEP) experiments at CERN generate a deluge of data which must be
%efficiently archived for later retrieval and analysis~\cite{datacentre}. The custodial copy of the
%data is stored on magnetic tape in CERN's Tier-0 Data Centre, near Geneva. The evolution of the
%tape archive over time is shown in Fig.~\ref{physics_storage}. In June 2017, the total volume of
%archived data passed 200 Petabytes~\cite{cern_200pb}.
%
%
%The archival of such vast quantities of data presents many computing challenges. The CERN data storage
%system must ensure the long-term bit-level integrity of the data, and the archival and retrieval systems
%must scale with this massive growth in data volume~\cite{cern_challenges}.
%
%The data stored on tape at CERN is an \textit{active archive}. This is a very different use case to
%backup and restore. The key characteristics of an active archive are that data are recorded over a
%long period of time to be analyzed later, and any data item can be required to be accessed at any
%time. Other applications of Petabyte-scale data archives include weather
%forecasting~\cite{ecmwf_analysis,ecmwf_simulation}, climate research, life sciences and even digital
%film archives~\cite{lost_picture_show}.
%
%At CERN, data is archived during operational \textit{Runs} of the LHC, which last for 3--5 years,
%interleaved with maintenance periods where little to no archival takes place (\textit{Long Shutdowns}),
%lasting for 1--2 years. The tape archive holds the custodial copy of the physics data, which is stored
%indefinitely and never deleted. Some of the archive files have been migrated across different software
%and media generations for over 30 years.
%
%Conversely, data can be retrieved for analysis at any time by physicists at several hundred Tier-1 and
%Tier-2 institutes across CERN's 22 member states, connected to the Tier-0 Data Centre by the Worldwide
%LHC Computing Grid (WLCG).
%
%The demands on the tape archive will increase further during LHC Run 3, as there will no longer be
%sufficient computing power to perform online reconstruction of physics events from the raw data. The
%data will be stored in the archive and retrieved to perform the event reconstruction as and when
%computing power is available. This will substantially increase the number and amount of retrieval
%requests from tape.
%
%There are many reasons why tape remains a popular medium for archival systems. Tape storage is much
%cheaper than disk, both in terms of media cost and operational costs. Tapes do not require any
%electricity when not mounted, which reduces the problem of heat dissipation in data centres. Tape media
%has huge capacity and will continue to match growth in disk capacity for at least the next decade;
%manufacturers have demonstrated the possibility to store over 300 Tb on a single cartridge~\cite{ibm_330tb}.
%Tape media is very durable; the media has a lifetime of around 30 years. And sequential read times
%are faster than disk access, up to 360 Mb/s.
%
%The main drawback of tape systems is the high time-to-first-byte latency. Tapes are stored in a
%cartridge slot in a library. When data on tape is requested, the library sends a robotic picker
%to move the cartridge to an available tape drive; the cartridge must be loaded and the tape threaded
%and wound to the read position before any data can be read. This process can take several minutes.
%When the read is completed, the process must be repeated in reverse to unmount the cartridge and
%return it to its slot.
Over the next decade, the volume of data will continue to increase super-linearly due to improvements
in the luminosity and availability of the Large Hadron Collider (LHC) and upgrades to the detectors
and data acquisition system. Data archival is expected to reach 100 Pb/year during LHC Run 3
(2019--2021), increasing to 400 Pb/year during Run 4 (2023--). The integrated total data is expected
to exceed one Exabyte around 2023.
Fig.~\ref{fig:T0_predicted_storage} shows the predicted tape archival storage needs for CERN Tier--0 between 2018 and 2031 (LS2, Run--3, LS3, Run--4).
\begin{figure}[t]
\centering
%\includegraphics[width=25pc]{images/Storage}\hspace{2pc}%
\includegraphics[width=27pc]{images/Storage}%
\begin{minipage}[b]{10pc}
\caption{Predicted Tape Archival Storage Needs for CERN Tier--0}
\label{fig:T0_predicted_storage}
\end{minipage}
\end{figure}
\subsection{Changing Use Cases for Archival Storage}
\subsubsection{Scaling up for Run 3 and HL--LHC}
\begin{figure}[t]
\centering\includegraphics[width=0.9\textwidth]{images/Storage}
\end{figure}
\subsubsection{Data for online analysis stored on tape (``Data Carousel'')}
Source:~\cite{xin_zhao_tape_usage}
\begin{figure}[t]
......@@ -68,6 +166,16 @@ Source:~\cite{xin_zhao_tape_usage}
\includegraphics[width=\textwidth]{images/DataCarouselChart.png}
\end{figure}
This paper is organised as follows\ldots
% The next two sections are intended to give a more concrete idea of what is CTA. Section 2
% describes the architecture of CTA and lists the steps taken to archive a file to tape and then to
% retrieve it back to disk. Section 3 describes the concepts that an operator needs to understand
% in order to configure and work with CTA. Section 4 describes the pre-emptive drive scheduler
% of CTA and how it will enable the IT storage group of CERN to handle the 150 petabytes per
% year transfer rate of LHC physics run 3. Section 5 describes how and when LHC experiments
% and other users of CASTOR will migrate to EOS and CTA. Finally, section 6 draws the paper
% to a close with its conclusions.
\section{CASTOR to CTA}
\subsection{CASTOR Architecture}
......@@ -279,6 +387,27 @@ Performance
\section{Conclusions}
\label{conclusions}
%From CHEP 2016 paper:
%CTA will avoid functional duplication with EOS through a clean, consolidated separation
%between disk and tape. EOS will focus on providing high-performance disk storage, data transfer
%protocols and meta-data operations. CTA will focus on providing efficient tape backend storage.
%CTA will introduce pre-emptive drive scheduling which will automatically schedule the
%background tasks of tape media repacking and data verification. This automatic scheduling
%will use the tape drives at full speed all of the time and therefore enable CTA to cope with the
%150 petabytes per year data rate of LHC physics run 3.
%CASTOR and CTA share the same tape format. This means migrating data from CASTOR
%to EOS and CTA only requires the metadata to be copied and CTA taking ownership of CASTOR
%tapes.
%In addition to the hierarchical namespace of EOS, CTA will have its own flat catalogue of
%every file archived to tape. This redundancy in metadata will provide an additional recovery
%tool in the event of a disaster.
%The architecture of CTA has benefited from a fresh start. Without the need to preserve the
%internal interfaces of the CASTOR networked components, CTA has been able to reduce the
%number of networked components in the tape storage system.
%LHC experiments can expect to start migrating from CASTOR to EOS and CTA at the
%beginning of 2019 which is the beginning of the long shut down period between LHC physics
%runs 2 and 3.
\subsection{CERN Tape Archive : Summary}
Use cases for tape archival are changing
\begin{itemize}
......@@ -297,6 +426,121 @@ Performance
\item LS2: Migration from CASTOR to CTA
\end{itemize}
%%
%% Some points from my presentation at EOS workshop in case I want to use any of that
%%
\clearpage
Additional notes from EOS workshop:
\section{Deployment}
\subsection{EOS+CTA Deployment}
\subsubsection{EOS Instances}
\begin{itemize}
\item One EOS instance per main experiment for disk storage and user jobs
\begin{itemize}
\item EOSALICE, EOSATLAS, EOSCMS, EOSLHCb, EOSPUBLIC\\[2ex]
\end{itemize}
\item One EOS instance per main experiment for tape archving
\begin{itemize}
\item EOSCTAALICE, EOSCTAATLAS, \ldots~(replaces CASTORALICE, CASTORATLAS, \ldots)
\item Optimised for staging files between tape and disk
\item No automated synchronisation between EOS\ldots~and EOSCTA\ldots.
\item Access by privileged experiment accounts. No user jobs.
\end{itemize}
\end{itemize}
\subsection{EOS+CTA Deployment}
\subsubsection{CTA Instance}
\begin{itemize}
\item One central instance of CTA
\item Front-end nodes
\begin{itemize}
\item handle commands and metadata operations from EOS and tape operators
\end{itemize}
\item Tape Servers
\begin{itemize}
\item interface between tape drives and libraries and EOS disk servers
\end{itemize}
\item Metadata Catalogue
\begin{itemize}
\item maps files to tape cartridges
\end{itemize}
\item Distributed ObjectStore
\begin{itemize}
\item persistent store for status and queue information
\end{itemize}
\end{itemize}
\subsection{FTS Integration}
\subsubsection{Implementation of Staging Functionality}
\begin{itemize}
\item FTS $\leftrightarrow$ EOS $\leftrightarrow$ CTA
\item No HSM/near-line tape storage
\item EOS--CTA extensions required to integrate current FTS functionality:
\end{itemize}
\begin{description}
\item[Bring Online:] Request a list of files to be staged from CTA to EOSCTA\ldots
% Can be implemented with "xrdfs prepare"
\item[Status Of Get Request:] Query status (Queued\slash Finished/ Failed) of previously-requested files
% “eos ls -y <file>” returns the current disk/tape residency status (e.g. “d0::t1”) but
% does not have knowledge of the status of issued staging requests against that file. In
% addition, the disk/tape residency status is likely EOS specific and not known to
% XROOT. It is unlikely that the core XROOT protocol will be able to handle status
% information thus a EOS/CTA specific call will need to be provided here. FTS may
% issue very large queries – in blocks of O(10K) files – in intervals of only few minutes
% so therefore the queueing status needs some optimization such as for example being
% cached in the EOS namespace.
\item[Abort Request:] Cancel a previous \textbf{Bring Online} request
% Not implemented in standard XRoot protocol, needs an EOS-CTA specific call.
% SrmAbortRequest is heavily used by some experiments that launch concurrent prestaging jobs
% for the same files on multiple sites, then abort all requests but the first completed one.
% An effective staging cancellation needs to be provided within CTA as otherwise the EOS
% staging space could quickly overflow.
\end{description}
\section{Timeline}
\subsection{Timeline}
\subsubsection{2018}
{1Q 2018 (Now)}
\begin{itemize}
\item EOS--CTA interface (see next presentation)
\item Scalable ObjectStore % 1 million objects in-flight in a single queue
\end{itemize}
{2Q 2018}
\begin{itemize}
\item Repack
\item Garbage Collection
\end{itemize}
{3Q 2018}
\begin{itemize}
\item Ready with Pre-production ``Field Test'' experiment instances
\item CASTOR migration tools/strategies
\item Work with user community
\end{itemize}
{4Q 2018}
\begin{itemize}
\item Stress testing with experiments
\end{itemize}
\subsubsection{2019 and Beyond}
\begin{itemize}
\item Deployment
\item Replace CASTOR
\item Non-Oracle backend for tape catalogue
% Non-commercial DB plugin
%{\normalsize
%\begin{itemize}
%\item Postgres
%\item MySQL
%\item \ldots
%\end{itemize}
%}
\end{itemize}
%
% ---- Bibliography ----
%
......
TARGET = CHEP2018_CTA.pdf
BIBLIO = CHEP2018_CTA.bib
IMAGES = $(wildcard images/*)
TEXINPUTS = .:../JPCS:
BSTINPUTS = .:../JPCS:
......@@ -7,7 +8,7 @@ LATEX_TMP = *.aux *.bbl *.blg *.log *.dvi *.bak *.lof *.log *.lol *.lot *.out *.
all: $(TARGET)
%.pdf: %.tex %.bbl
%.pdf: %.tex %.bbl $(IMAGES)
pdflatex $<
refs:
......
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment