[eos-cta] Updates Questions and Issues in EOS-CTA doc

15bd1590 · Michael Davis · f61009c6 · 15bd1590 · 15bd1590 · 15bd1590
Commit 15bd1590 authored 7 years ago by Michael Davis
--- a/doc/CTA_EOS.pdf
+++ b/doc/CTA_EOS.pdf
--- a/doc/latex/CTA_EOS_Constraints.tex
+++ b/doc/latex/CTA_EOS_Constraints.tex
@@ -13,36 +13,35 @@ in front. This should be assessed in order to have full system performance targe
 achieved and the latency experienced by the user (a synchronous close on write will increase latency, we should
 make sure the result is adequate).

-\section{Operational constraints}
+\section{Synchronous calls}

-Before a repack campaign, users should be encouraged to purge any unnecessary data from EOS. After this operation, a
-reconciliation between the user catalogue and EOS (and then between EOS and CTA) should be done to ensure no unexpected
-data will get deleted during the repack operation.
+The EOS workflow engine will support synchronous actions. EOS clients will call EOS which in turn will call CTA.
+CTA will send back a ``return value'' reply which will be synchronously relayed back to the EOS client.

-\section{Synchronous and Asynchronous Calls}
-
-The EOS workflow engine will be modified to support synchronous actions as well as the existing asynchronous ones. A
-synchronous action will enable an EOS client to call EOS which in turn will call CTA, which will return a ``return
-value'' reply that is synchronously relayed back to the EOS client.
-
-A concrete example would be when an EOS client opens a file for creation that is supposed to be eventually archived to
-tape. EOS will synchronously call CTA in order to determine whether or not the storage class of the file is known and
-has a destination tape pool. If these two conditions are not met then the EOS client will get an immediate synchronous
+\vspace{0.5ex}
+\noindent\textbf{Example:} An EOS client opens a file for creation that should be eventually archived to tape. EOS 
+synchronously calls CTA to determine whether or not the Storage Class of the file is known and that it has a destination
+tape pool. If these two conditions are not met then the EOS client will get an immediate synchronous
 reply saying the file cannot be created because it cannot be archived to tape.

 \section{Data Integrity}

-\subsection{After-the-fact check on archive from EOS}
-
+\textbf{After-the-fact check on archive from EOS: }
 EOS will schedule a second workflow job when an archive is triggered. This will check the archive status and re-trigger
 it if needed at a later time.

 \section{Security}

-CTA will filter EOS requests per instance and forbid cross talk between instances. Each instance should be authenticated
+CTA will filter EOS requests per instance and forbid crosstalk between instances. Each instance should be authenticated
 with a unique simple shared secret (SSS) key.

 As CTA will have access to every EOS instance, it is desirable to apply the least privilege principle to CTA
 and to limit the actions allowed to CTA's SSS key to the strict minimum. This will be achieved by limiting CTA's
 credentials to the protocol interface.

+\section{Operational constraints}
+
+Before a repack campaign, users should be encouraged to purge any unnecessary data from EOS. After this operation, a
+reconciliation between the user catalogue and EOS (and then between EOS and CTA) should be done to ensure no unexpected
+data will get deleted during the repack operation.
+
--- a/doc/latex/CTA_EOS_Intro.tex
+++ b/doc/latex/CTA_EOS_Intro.tex
@@ -46,7 +46,7 @@ In addition, in order to make sure no changes were lost, implicit operations are
   \end{itemize}
 \end{description}
 Chapter~\ref{operations} describes the use cases in more detail. Chapter~\ref{protocol} defines the protocol for the
-EOS-CTA API. Chapters~\ref{constraints} describes the performance and operational constraints on the system.
+EOS-CTA API. Chapter~\ref{constraints} describes the performance and operational constraints on the system.
 Appendices~\ref{questions_and_issues},~\ref{reconciliation_strategy} and~\ref{authorization_rules} detail issues
 which need to be agreed on or resolved.

--- a/doc/latex/CTA_EOS_Protocol.tex
+++ b/doc/latex/CTA_EOS_Protocol.tex
@@ -43,8 +43,3 @@ upgrades have been rolled out to all experiments.

 EOS will keep track of tape-related mode (should have a copy on tape) and the tape replica status.

-\begin{alertbox}
-Should EOS check the Storage Class on open?
-
-Does EOS need to keep track of the tape copy ID?
-\end{alertbox}
--- a/doc/latex/CTA_EOS_Questions.tex
+++ b/doc/latex/CTA_EOS_Questions.tex
@@ -9,13 +9,6 @@
 % - Report back during the tape developments meeting of Wednesday the 22nd November. The report should highlight the
 %   current weaknesses of the “archive protocol” and explain how these weaknesses will be tackled by the SSI based EOS
 %   workflow engine and CTA.
-%
-% Steve identified one known weakness of the current “archive” protocol.  If the “queue archive” request from the EOS
-% workflow engine to the CTA frontend times out on the client (workflow engine) side then the file never gets an archive
-% ID in the EOS namespace but is nonetheless marked as having a valid tape copy safely stored on tape.  No error messages
-% are reported to either the EOS end user or the CTA frontend.  The end user incorrectly thinks that their file can be
-% retrieved from EOS/CTA because they see a tape replica, but this not true because of the lack of an archive ID in the
-% EOS namespace which is required by a retrieve operation.

 \section{What is to be shared between the EOS and CTA projects in order to implement the EOS/CTA interface?}

@@ -54,7 +47,8 @@ reduces maintenance effort as bugs only need to be found and fixed once.

 The generic headers can be reused by other projects which are nothing to do with CTA.

-The alternative to using generic code would be to document all the design decisions and leave it to the EOS team to implement them.
+The alternative to using generic code would be to document all the design decisions and leave it to the EOS team to
+implement them.

 \section{What is the mechanism by which the CTA frontend will determine the individual instance names of the EOS
 instances sending it archive, retrieve and delete requests?}
@@ -62,62 +56,163 @@ instances sending it archive, retrieve and delete requests?}
 As CTA will have access to every EOS instance, we want to prevent crosstalk: one VO should not be able to access or
 interfere with the files from another VO. The principle of least privilege should apply.

-SSS keys are used for both identification and authentication. This implies that there should be a unique SSS key
-for each VO.
-
-XRootD issue \href{https://github.com/xrootd/xrootd/issues/591}{\#591} prevents having multiple keys in the same keytab
-with the same user ID. I think this could be solved by having a separate user ID for each VO. Then we can have multiple keys
-in the keytab, each with its own unique user ID.
+SSS keys are used for both identification and authentication. There should be a unique SSS key for each VO. Luca tells
+us the EOS team are using the following SSS keys:
+\begin{lstlisting}
+Number Len Date/Time Created Expires  Keyname   User & Group
+------ --- --------- ------- -------- -------   ------------
+2      32  02/10/12 17:34:12 -------- eosalice  daemon daemon
+------ --- --------- ------- -------- -------
+2      32  12/10/10 15:22:01 -------- eoscms    daemon daemon
+------ --- --------- ------- -------- -------
+1      32  06/25/12 15:42:16 -------- eoslhcb   daemon daemon
+------ --- --------- ------- -------- -------
+1      32  02/14/13 16:20:06 -------- eospublic daemon daemon
+------ --- --------- ------- -------- -------
+3      32  10/06/14 12:04:59 -------- eosuser   daemon daemon
+\end{lstlisting}
+There are two problems we are aware of:
+\begin{enumerate}
+   \item There is an XRootD bug that if the keys are created within the same second, XRoot treats them as different
+      versions of the same key (see issue \href{https://github.com/xrootd/xrootd/issues/592}{\#592}). The workaround
+      is to have at least a 2-second delay between generating each key, so this is not really an issue in production.
+   \item XRoot allows us to detect the user name of the key but not the keyname, and currently all key user names
+      are set to the same value (daemon). We need to check with Luca/Andreas if the key user name can be changed in
+      production or find another way to discriminate between the different keys if not.
+\end{enumerate}

 \section{Will EOS instance names within the CTA catalogue be ``long'' or ``short'', in other words ``eosdev'' or just ``dev''?}

 Following the point above, we should have a separate instance name for each VO (``eosatlas'', ``eoscms'', \textit{etc.}) and
 a unique key for each instance name.

-Personally I prefer the long name as it is clearer that this is an instance name, but its not a religious issue.
+Personally I prefer to follow the long names that the EOS team are already using, but it's not a religious issue.

 \section{Do we want the EOS namespace to store CTA archive IDs or not?}

 \begin{description}
-   \item[If no:] we are allowing that the EOS file ID uniquely identifies the file (and that the file is immutable).
-   \item[If yes:] the EOS file ID must somehow be mapped to a CTA archive ID which is returned to EOS.
+   \item[If no:] we are allowing that the EOS file ID uniquely identifies the file. We must maintain a one-to-one mapping
+      from EOS ID to CTA archive ID on our side. This also implies that the file is immutable.
+   \item[If yes:] we must generate the CTA archive ID and return it to EOS. There must be a guarantee that EOS has attached
+      the archive ID to the file (probably as an xattr but that's up to the EOS team), i.e. \textbf{the EOS end-user must
+      never see an EOS file with a tape replica but without an archive ID}. EOS must provide the CTA archive ID as the
+      key to all requests. 
 \end{description}
+Design notes from Steve:
+\begin{alertbox}
+One of the reasons I wanted an archive ID in the EOS namespace was that I wanted to have one primary key for the CTA
+file catalogue and I wanted it to be the CTA archive ID. Therefore I expected that retrieve and delete requests issued
+by EOS would use that key.\\
+
+This ``primary key'' requirement is blown apart by the requirement of the CTA catalogue to
+identify duplicate archive requests.  The CTA archive ID represents an ``archive request'' and not an individual EOS file.
+Today, 5 requests from EOS to archive the same EOS file will result in 5 unique CTA archive IDs. Making the CTA catalogue
+detect 4 of these requests as duplicate means adding a ``second'' primary key composed of the EOS instance name and the EOS
+file ID. It also adds the necessity to make sure that archive requests complete in the event of failure, so that retries
+from EOS will eventually be accepted and not forever refused as duplicate requests.  It goes without saying that dropping
+the CTA archive ID from EOS also means using the EOS instance name and EOS file ID as primary key for retrieve and delete
+requests from EOS.\\
+
+The requirement for a ``second'' primary key may be inevitable for reasons other than (idempotent) archive, retrieve and
+delete requests from EOS.  CTA tape operators will want to drill down into the CTA catalogue for individual end user files
+when data has been lost or something has ``gone wrong''.  The question here is, should it be a ``primary key'' as in no
+duplicate values or should it just be an index for efficient lookup?
+\end{alertbox}
+
+To summarize:
+\begin{itemize}
+   \item We would like to have a unique key to identify files
+   \item In our current design, the CTA Archive ID uniquely identifies archive requests, not files
+\end{itemize}

-Possible implementation:
-\begin{enumerate}
-   \item On OPENW: EOS calls CTA with the file metadata
-   \item CTA Frontend validates metadata (check Storage Class is valid)
-   \item CTA Frontend stores metadata
-   \item CTA Frontend allocates a unique archive ID and returns it to EOS
-   \item EOS attaches archive ID as an xattr
-   \item On CLOSEW\slash PREPARE\slash DELETE: EOS calls CTA with the archive ID
-\end{enumerate}
-If any step fails, EOS will receive an error code and will not have the archive ID so is unable to request any tape operations.
-EOS can retry to obtain the archive ID or send notice of failure to operator\slash user.
+\subsection{Proposed Archive ID Solution}
+
+% Steve identified one known weakness of the current “archive” protocol.  If the “queue archive” request from the EOS
+% workflow engine to the CTA frontend times out on the client (workflow engine) side then the file never gets an archive
+% ID in the EOS namespace but is nonetheless marked as having a valid tape copy safely stored on tape.  No error messages
+% are reported to either the EOS end user or the CTA frontend.  The end user incorrectly thinks that their file can be
+% retrieved from EOS/CTA because they see a tape replica, but this not true because of the lack of an archive ID in the
+% EOS namespace which is required by a retrieve operation.
+
+We could change the current design to allocate the CTA Archive ID to files rather than archive requests, by allocating
+the ID on file write open instead of file write close. This results in the following workflow:
+\begin{description}
+   \item[On OPENW:] EOS calls CTA with the file metadata
+   \begin{itemize}
+      \item CTA Frontend validates Storage Class
+      \item CTA Frontend determines destination Tape Pool
+      \item CTA Frontend sends metadata to the Catalogue
+      \item CTA Frontend generates a unique archive ID and returns it to EOS
+      \item EOS attaches archive ID as an xattr
+   \end{itemize}
+   \item[On CLOSEW:] EOS calls CTA with the file metadata
+   \begin{itemize}
+      \item The xattrs now include the validated Storage Class and the CTA archive ID
+      \item CTA Frontend queues the archive request and returns status code to EOS. File state is \textit{archive in progress}.
+      \item On successful write of the first tape copy, the tape server notifies EOS. File state is \textit{one copy on tape}.
+         This equates to \textit{m-bit set} in CASTOR.
+      \item On successful write of each tape copy, the tape server notifies EOS. The number of copies on tape can be
+         stored as an xattr.
+      \item On successful write of the last tape copy, the tape server notifies EOS. The number of copies on tape is
+         updated. File state is \textit{archived}.
+   \end{itemize}
+   \item[On PREPARE:] EOS calls CTA with the file metadata. The file is retrieved. In case of failure, the file will not
+      be retrieved and the reason will be logged.
+   \item[On DELETE:] EOS calls CTA with the file metadata. All copies of the file are marked for deletion. In case of
+      failure, the file will not be deleted and the reason will be logged.
+\end{description}
+In this scheme, if any part of the \textbf{OPENW} workflow fails, nothing is archived and no archive ID is attached to the
+file xattrs, so we are in a guaranteed consistent state and EOS is informed that something went wrong. EOS will not be
+able to execute the \textbf{CLOSEW} workflow without the archive ID.

-If there is a failure after storing the metadata, then in the worst case we have some stored metadata and no archive file,
-but we know we have 0 tape copies, so the CTA state is always consistent.
+In the \textbf{CLOSEW} workflow, we cannot end up in a state where the file was successfully archived but EOS does not have
+the archive ID. The only possible inconsistency between EOS and CTA is when we successfully archived at least one copy of the
+file but did not successfully notify EOS. In this case, the operator should be notified and the EOS user can retry.

-Is step (3) guaranteed to complete quickly?
+If a file is appended to or otherwise modified, this is a new file as far as at CTA is concerned, and a new archive ID will
+be generated\footnote{Archive files are immutable in any case. This applies only to the backup use case.}
+
+If EOS loses the archive ID for any reason, no further operations on the file are possible. Inconsistencies between the
+EOS namespace and CTA namespace will be picked up during reconciliation.
+
+\begin{alertbox}
+   The \textbf{OPENW} workflow should be guaranteed to return quickly. There will be a small number of valid Storage Classes,
+   so this can be held in memory. Likewise the process to generate or obtain archive IDs should be fast and run in bounded
+   time.\\
+
+   For CASTOR, there is an additional constraint that the disk copy cannot be deleted until all tape copies have been
+   successfully written. The above scheme keeps track of the number of tape copies written and it will be up to the
+   EOS developers to ensure that this constraint is observed.
+\end{alertbox}

 \section{Should the CTA catalogue methods prepareForNewFile() and prepareToRetrieveFile() detect repeated requests from
 EOS instances?}

 EOS does not keep track of requests which have been issued. We have said that CTA should implement idempotent retrieve queuing.
-Presumably also for archives and deletes?
+\begin{alertbox}
+What are the consequences if we do not implement idempotent retrieve queuing?\\
+
+What about archives and deletes?
+\end{alertbox}

 \subsection{If so how should the catalogue communicate such ``duplicate'' requests to the caller (Scheduler\slash cta-frontend plugin)?}

-The frontend does not call the catalogue directly, it calls the scheduler which calls the catalogue. A possible sequence:
+The CTA Frontend calls the Scheduler which calls the Catalogue.
+There are several possible schemes for handling duplicate jobs:
 \begin{enumerate}
-   \item Frontend calls Scheduler
-   \item Scheduler checks job is not in its queue\slash return failure if it is
-   \item Scheduler queries the Catalogue to ensure job is not in progress\slash return failure if it is
-   \item Job accepted into Scheduler queue\slash return success
-   \item Frontend logs result and returns success to EOS (or return failure in the case that job could not be
-      accepted for some other reason)
+   \item If duplicates are rare, perhaps they don't need to be explicitly handled
+   \item When a retrieve job is submitted, the Scheduler could check in the Catalogue for duplicates 
+   \item When a retrieve job completes, the Tape Server could notify the Scheduler, which could then check for and
+      drop any duplicate jobs in its queue.
 \end{enumerate}

+\begin{alertbox}
+Reporting of retrieve status could set an \texttt{xattr}. Then the user would be able to monitor status which could
+reduce duplicate requests.\\
+
+Failed archivals or other CTA errors could also be logged as an \texttt{xattr}.
+\end{alertbox}
+
 \subsection{If the CTA catalogue keeps an index of ongoing archive and retrieve requests, what will be the new
 protocol additions (EOS, cta-frontend and cta-taped) required to guarantee that ``never completed'' requests are removed
 from the catalogue?}
@@ -140,20 +235,6 @@ Such a protocol addition could be something as simple as a timeout.

 \newpage

-\section{API}
-
-Should the EOS-CTA interface be defined as a shared library (e.g. SSIv2 client plugin) or as a framework (boilerplate
-code)?
-
-So we just need serialise---send---receive---parse ?
-
-\section{Communication Layer}
-
-RPC channel between EOS and CTA: current strategy is to use SSIv2. In case of unforseen problems, other options are
-still open: opaque query or open-write-read-close.
-
-Is adding SSIv2 support as simple as loading the SSI plugin?
-
 \section{Return value}

 Notification return structure ("return value") should be defined. It will contain:
@@ -165,12 +246,6 @@ Notification return structure ("return value") should be defined. It will contai
 \item Failure message for the end user executing a synchronous workflow (for example ``Cannot open file for writing because there is no route to tape'')
 \end{itemize}

-\section{\texttt{xattr}}
-
-Reporting of retrieve status could use the \texttt{xattr} message (to be confirmed, see \ref{dataSerialization}).
-Reporting of failed archival could also use \texttt{xattr}. Reporting of the last error encountered in CTA could also
-use the \texttt{xattr} message (to be confirmed, see \ref{dataSerialization}).
-
 \section{CTA Failure}

 What is the mechanism for restarting a failed archive request (in the case that EOS accepts the request and CTA fails
@@ -186,24 +261,9 @@ What is the retry policy?
 Full life cycle of files in EOS with copies on tape should be determined (they inherit their tape properties
 from the directory, but what happens when the file gets moved or the directory properties changed?).

-\section{Handling of immutability}
-
-It is forbidden to update files archived on tape. Devise an update policy for backup-type behaviour (\ref{updates}).	
-
-\section{Slow reconciliation interface}
-
-Action on storage class change for a file? (postponed to repack?)
-
-Possible admin daemon that handles slow reconcilations and repacks?
-
-Full chain reconciliation should be devised.
-
-\section{Restoring Deleted Files}
-
-A method to re-create a deleted file in EOS from CTA data/metadata should be devised.
-
-We might want to pass the information that a file deletion has been confirmed after reconciliation with the user's catalogue. Also delete could be passed
-to CTA when the file is moved to the recycle bin in EOS, or when it is definitely deleted from EOS.
+%\section{Handling of immutability}
+%
+%It is forbidden to update files archived on tape. Devise an update policy for backup-type behaviour (\ref{updates}).	

 \section{Storage Classes}

@@ -217,7 +277,7 @@ Chaining of archive and retrieve requests to retrieve requests.

 Execution of retrieve requests as disk to disk copy if possible.

-Catalogue will also keep track of requests for each files (archive and retrieve) so that queueing can be made idempotent.
+%Catalogue will also keep track of requests for each files (archive and retrieve) so that queueing can be made idempotent.

 \section{Catalogue}


--- a/doc/latex/CTA_EOS_Reconciliation.tex
+++ b/doc/latex/CTA_EOS_Reconciliation.tex
@@ -26,3 +26,19 @@ owners before deleting them from CTA.
 Note: It's important to note that we do not reconcile storage class information. Any storage class change is triggered 
 by the EOS user and it is synchronous: once we successfully record the change our command returns.

+\section{Slow reconciliation interface}
+
+\begin{itemize}
+\item Action on storage class change for a file? (postponed to repack?)
+\item Possible admin daemon that handles slow reconcilations and repacks?
+\item Full chain reconciliation should be devised.
+\end{itemize}
+
+\section{Restoring Deleted Files}
+
+A method to re-create a deleted file in EOS from CTA data/metadata should be devised.
+
+We might want to pass the information that a file deletion has been confirmed after reconciliation with the user's
+catalogue. Also delete could be passed to CTA when the file is moved to the recycle bin in EOS, or when it is
+definitely deleted from EOS.
+