Commit 32b852b9 authored by Eric Cano's avatar Eric Cano
Browse files

Clarified repack-per-dataset complexity

Added file size upper bound in problem statement
Added conlusion
parent 42e0a1a0
\usepackage[binary-units=true, per-mode=symbol]{siunitx}
% ADC meetings: Data carousel | (discussion)
......@@ -102,6 +103,13 @@
$\Rightarrow$ drive spends time positioning on reads
\item Making files bigger will impact tape performance
\item Tape drive typically faster than file system (\SI{360}{\mega\byte\per\second} today, up to \SI{1}{\giga\byte\per\second} in the roadmaps)
\item Tape server memory should hold several files to allow streaming them in parallel
\item Typical tape server memory size: \SI{60}{\giga\byte}
\item Upper bound for efficient file size: \SI{10}{\giga\byte}
......@@ -134,6 +142,7 @@
\section{Tape system optimization}
\begin{frame}{Tape system optimizations}
\adjustbox{minipage=1.18\textwidth, scale=0.85}{
\item Write optimization
......@@ -153,10 +162,13 @@
\item Repack input (which files to read) could be dataset driven instead of tape driven
\item If extra read mount cost bearable
\item Will have to take into account tape level constraints as well (will it be worth the complexity?)
\item Will have to take into account tape level constraints as well
\item Make sure we empty old tapes and not re-repack a target tape
\item Will it be worth the complexity?
\section{Possible bonus features}
......@@ -166,14 +178,15 @@
\item Retrieve by dataset (implies big changes in whole data transfer chain, and possibly hairy error handling)
% a. Tagging of new files, as they are written
% b. Back tagging of pre-existing files
% … and of course user should be able to query this tag.
% 2) Queueing by tag in archive queue. This implies some changes in data structure (having sub-queue in archive queues). Once this is done, it is trivial to make the archive session sticky and make sure it will not switch to a second dataset.
% A bonus feature could be to add a second/several layer(s) of tagging (dataset of dataset) which would allow orienting the choice of the next dataset in a mount after finishing with one, but this is really secondary optimisation (but not expensive either while we’re at it).
% Once the old files are tagged, we could repack by dataset(s) to defragment the existing data (instead of tape oriented repack).
\item Changes from outside the tape system (bigger files) will push us to a non-optimal working point
\item With proper hints tape system can optimize read access, knowing that access is done by full dataset
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment