Skip to content
Snippets Groups Projects
Commit 290187f8 authored by Eric Cano's avatar Eric Cano
Browse files

Created first version of presentation

No related branches found
No related tags found
No related merge requests found
File added
% ADC meetings: Data carousel | (discussion)
% Thursday, 5 December 40/S2-D01 - Salle Dirac
%09:00 → 10:30
%ADC meetings: Data carousel | (discussion)¶ 40/S2-D01 - Salle Dirac
%Conveners: Alexei Klimentov (Brookhaven National Laboratory (US)), David Cameron (University of Oslo (NO)), David Michael South (Deutsches Elektronen-Synchrotron (DE)), Johannes Elmsheuser (Brookhaven National Laboratory (US)), Mario Lassnig (CERN), Xin Zhao (Brookhaven National Laboratory (US))
% 09:00
% Materials : Tier-1s presentations (October-November splinter meetings), FTS, Rucio, ProdSys2¶ 1m
%Meeting goals¶ 10m
%can we agree on the way forward to cut the following two gaps in the meeting
%Gap 1 : between the throughput out of the tape system itself and the throughput delivered to users (rucio in this case)
%--This is to tackle the issues along the staging chain, dCache, FTS, Rucio, PS2, etc, to minimize performance penalties to the original tape throughput
%Gap 2 : between the nominal tape throughput and the current throughput out of tape
%--This is about “smart writing”, by bigger files and/or better organizing files among tapes, to reach higher tape reading efficiency
%Speakers: Alexei Klimentov (Brookhaven National Laboratory (US)), Mario Lassnig (CERN), Xin Zhao (Brookhaven National Laboratory (US))
%Summary and highlights of Oct-Nov discussions with Tier 1s and CERN¶ 20m
%Speakers: Alexei Klimentov (Brookhaven National Laboratory (US)), Mario Lassnig (CERN), Xin Zhao (Brookhaven National Laboratory (US))
%ATLAS Sites (topical discussion)¶ 59m
% Planning / notification / feedback /discussions
% What concerns sites have with tape system throughput (writing/reading) ?
% -- Active tape drives usage vs drives lifetime (data staging scenario)
% -- What are the bottlenecks at sites, any planned improvements ?
% -- What help do they need from ADC, dCache, FTS ? (e.g. sites want the staged files to be transferred out of disk buffer as soon as possible, avoid too many staging retries ...)
% Site staging profile
% -- Possible extension ?
% What can sites do w.r.t. “Smart writing” ?
% -- new features in tape system ?
% -- external users (eg. ADC/dCache) to create bigger files and/or better organize the order of files when writing to tape ?
% What are the special concerns for sites supporting multiple-VO ?
% -- Do we understand requirements and intended usage of tape resources of other VOs ?
% monitoring tools enough for sites? any suggestions on improving monitoring ?
%11:00 → 12:30
%ADC meetings: Data carousel (II)¶ 40/S2-D01 - Salle Dirac
%Conveners: Alexei Klimentov (Brookhaven National Laboratory (US)), David Cameron (University of Oslo (NO)), David Michael South (Deutsches Elektronen-Synchrotron (DE)), Johannes Elmsheuser (Brookhaven National Laboratory (US)), Mario Lassnig (CERN), Xin Zhao (Brookhaven National Laboratory (US))
% 11:00
% dCache global view and development roadmap¶ 25m
% Speaker: Mr Tigran Mkrtchyan (DESY)
% 11:25
% Topical discussion¶ 1h
% 7 out of 10 T1s run dCache as tape frontend
% Can dCache provide common solution to improve scalability of dCache HSM interface, something like ENDIT does for TSM sites, that will benefit all dCache sites ?
% Can dCache group files by datasets in writing to tapes, which will benefit all dCache T1s ?
% other tape frontend (CTA, StoRM)
% -- plans ?
% alternative protocol for tape access ?
% -- SRM vs QoS APIs ?
% Based on CERN beamer theme by Jerome Belleman
% The optional `\author` command defines the author and is displayed in the slide produced by the `\titlepage` command.
\author{Eric Cano on behalf of the CTA team}
% The optional `\title` command defines the title and is displayed in the slide produced by the `\titlepage` command.
\title{CTA Smart writing}
% The optional `\subtitle` command will add a smaller title below the main one, and will not be displayed in any of the slides' footer.
\subtitle{Conceptual proposition}
% The optional `\date` command will display a custom free text date on the all of the slides' footer. If omitted today's date will be used.
\date{ADC meetings: Data carousel, 5 Dec 2019}
% The optional `\titlepage` command will create a slide with the presentation's title, subtitle and author.
% The optional `\tableofcontents` command will automatically create a table of contents based pm the sections.
\section{The problem}
\begin{frame}{What are we trying to solve?}
\item Datasets are always read whole
\item Tape systems not dataset-aware during write
\item Files scattered over tapes $\Rightarrow$ more read mounts
\item Files interleaved with others within tape
$\Rightarrow$ drive spends time positioning on reads
\section{Files tagging by dataset name}
\begin{frame}{Files tagging by dataset name}
\item Per-file property
\item Type = string
\item Can we define a length cap?
\item Only rely on comparison (no ordering, ranking...)
\section{User interface}
\begin{frame}{File tagging}
\item On write, per file tagging
\item Has to go through Rucio/FTS/EOS/CTA
\item Back tagging of existing files (several scenarios)
\item Executed as a one-off, we could have rule based update script
\item More general: provide get/set operation per file and leave it to the user
\section{Tape system optimization}
\begin{frame}{Tape system optimizations}
\item Write optimization
\item Divide archive queue in per-dataset sub-queue
\item Make write mounts stick to a dataset (until it is drained)
\item $\Rightarrow$ Contiguous files, zero positioning on read
\item Possibly cap the per-dataset parallel writes
\item $\Rightarrow$ Soft-limiting the spreading over tapes
\item Repack/defrag
\item Repack can then write in an optimized manner (defrag)
\item Repack input (which files to read) could be dataset driven instead of tape driven
\item If extra read mount cost bearable
\item Will have to take into account tape level constraints as well (will it be worth the complexity?)
\section{Possible bonus features}
\begin{frame}{Possible bonus features}
\item Multi-level tagging, allowing to better choose the {\em{}next} dataset in a mount
\item Retrieve by dataset (implies big changes in whole data transfer chain, and possibly hairy error handling)
% a. Tagging of new files, as they are written
% b. Back tagging of pre-existing files
% … and of course user should be able to query this tag.
% 2) Queueing by tag in archive queue. This implies some changes in data structure (having sub-queue in archive queues). Once this is done, it is trivial to make the archive session sticky and make sure it will not switch to a second dataset.
% A bonus feature could be to add a second/several layer(s) of tagging (dataset of dataset) which would allow orienting the choice of the next dataset in a mount after finishing with one, but this is really secondary optimisation (but not expensive either while we’re at it).
% Once the old files are tagged, we could repack by dataset(s) to defragment the existing data (instead of tape oriented repack).
% Jerome Belleman <>, March 2015
% Jerome Belleman <>, March 2015
% Slightly smaller
% Slightly smaller
% Slightly smaller
% Slightly smaller
\setbeamerfont*{section title}{parent=title}
% Slightly larger
% Slightly larger
\setbeamerfont{title in head/foot}{size*={7}{8pt}}
% About the same size
% About the same size
% About the same size
% Slightly smaller
\setbeamerfont{itemize/enumerate subbody}{size*={12}{14pt}}
% Slightly smaller
\setbeamerfont{itemize/enumerate subsubbody}{size*={11}{13.6pt}}
% Jerome Belleman <>, March 2015
\defbeamertemplate*{title page}{cern}
\defbeamertemplate*{section page}{cern}{
\begin{beamercolorbox}[leftskip=\titlelf]{section title}
\usebeamerfont{section title}\insertsection
\defbeamertemplate*{itemize item}{cern}{%
\defbeamertemplate*{itemize subitem}{cern}{%
\defbeamertemplate*{itemize subsubitem}{cern}{%
% Jerome Belleman <>, March 2015
% Lengths: trust Inkscape to measure them, PowerPoint if far too approximate
\setlength{\footlinedp}{7.127mm * \ratio{128mm}{254mm}}
\setlength{\footlineht}{19.454mm * \ratio{128mm}{254mm} - \footlinedp}
\setlength{\footlinesk}{3.338mm * \ratio{128mm}{254mm}}
\setlength{\logowd}{14.046mm * \ratio{128mm}{254mm}}
\setlength{\logolw}{\footlinedp - 2.392mm * \ratio{128mm}{254mm}}
\usebeamerfont{title in head/foot}%
% Set depth=0pt to avoid body position from depending on whether title
% text has any depth or not.
\defbeamertemplate*{navigation symbols}{cern}{}
\defbeamertemplate*{frametitle continuation}{cern}{
\addtobeamertemplate{footline}{\hfill\usebeamertemplate***{navigation symbols}%
\hspace*{0.2cm}\par\vskip 10pt}{}
% Jerome Belleman <>, March 2015
% The size and position of graphical elements was rigorously computed/measured.
% That of text not so much as I kept the default Beamer font, so I just
% overlaid the official template to reach a result that's perceivably close.
\setbeamercolor{background canvas}{bg=cern}
\setbeamercolor{background canvas}{bg=cern}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment