Commit 7223aca1 authored by Michael Davis's avatar Michael Davis
Browse files

Revisions to CTA-EOS interface document

parent aea0c6c2
No preview for this file type
No preview for this file type
No preview for this file type
......@@ -35,11 +35,11 @@
\input{CTA_EOS_Operations.tex}
\input{CTA_EOS_Protocol.tex}
\input{CTA_EOS_Constraints.tex}
\input{CTA_EOS_Reconciliation.tex}
\input{CTA_EOS_Authorization.tex}
\appendix
\input{CTA_EOS_Questions.tex}
\input{CTA_EOS_Reconciliation.tex}
\input{CTA_EOS_Authorization.tex}
\end{document}
......@@ -19,3 +19,30 @@ Before a repack campaign, user should be encouraged to purge any unnecessary
data from EOS. After this operation, a reconciliation between the user catalogue and EOS (And then EOS and CTA)
should be done to ensure no unexpected data will get deleted during the repack operation.
\section{Synchronous and Asynchronous Calls}
The EOS workflow engine will be modified to support synchronous actions as well as the existing asynchronous ones. A
synchronous action will enable an EOS client to call EOS which in turn will call CTA, which will return a ``return
value'' reply that is synchronously relayed back to the EOS client.
A concrete example would be when an EOS client opens a file for creation that is supposed to be eventually archived to
tape. EOS will synchronously call CTA in order to determine whether or not the storage class of the file is known and
has a destination tape pool. If these two conditions are not met then the EOS client will get an immediate synchronous
reply saying the file cannot be created because it cannot be archived to tape.
\section{Data Integrity}
\subsection{After-the-fact check on archive from EOS}
EOS will schedule a second workflow job when an archive is triggered. This will check the archive status and re-trigger
it if needed at a later time.
\section{Security}
CTA will filter EOS requests per instance and forbid cross talk between instances. Each instance should be authenticated
with a unique simple shared secret (SSS) key.
As CTA will have access to every EOS instance, it is desirable to apply the least privilege principle to CTA
and to limit the actions allowed to CTA's SSS key to the strict minimum. This will be achieved by limiting CTA's
credentials to the protocol interface.
......@@ -2,40 +2,51 @@
\label{introduction}
This document summarizes the full control chain between the User, EOS and CTA. It is intended to be used for
CTA-EOS interface design.
CTA-EOS interface design. The file lifecycle includes:
The file lifecycle includes:
\begin{itemize}
\item creation (write until initial close)
\item updates: they should be denied for files with copy on tape. This can be achieved by administrators by
adding an immutable flag (ACL) to the directories configured to go to tape in EOS, or as rule in EOS.
\item explicit retrieves
\item implicit retrieves (allowed or not, this choice should probably be left to the operator)
\item user triggered disk copy removal (allowed or not, optional)
\item garbage collection of disk copies
\item complete deletion of files
\item metadata changes (on EOS side, then propagated to CTA, rate limited). There are two sub-use cases:
\begin{description}
\item[File Archive:] ~
\begin{itemize}
\item Generic metadata update (just involving a catalog entry update)
\item Storage class change (involving a migration to different tape pools (most probably postponed to repack).
\item File create: write until initial close (CLOSEW)
\item File update: denied for files with copy on tape. This can be achieved by administrators by adding an immutable
flag (ACL) to the directories configured to go to tape in EOS, or as rule in EOS.
\end{itemize}
\item metadata injection (from CTA to EOS: injection of missing files in EOS). There are two sub-use cases:
\item[File retrieve:] ~
\begin{itemize}
\item Injection of files from CASTOR during initial data migration.
\item Re-injection of files from CTA to EOS in a disaster recovery.
\item Explicit retrieve (PREPARE)
\item Implicit retrieve (allowed or not, this choice should probably be left to the operator)
\end{itemize}
\end{itemize}
\item[Deletion:] ~
\begin{itemize}
\item User-triggered disk copy removal (allowed or not, optional)
\item Garbage collection of disk copies
\item Complete deletion of files
\end{itemize}
\item[Update Metadata:] ~
\begin{itemize}
\item Update metadata on EOS side, propagate to CTA, rate limited:
\begin{itemize}
\item Generic metadata update (just involving a catalog entry update)
\item Storage class change, involving a migration to different tape pools (most probably postponed to repack)
\end{itemize}
\item Inject metadata of missing files from CTA to EOS:
\begin{itemize}
\item Injection of files from CASTOR during initial data migration
\item Re-injection of files from CTA to EOS during disaster recovery
\end{itemize}
\end{itemize}
\end{description}
In addition, in order to make sure no changes were lost, implicit operations are needed:
\begin{itemize}
\item fast reconciliation (in flight archive requests for sure, maybe retrieve requests as well)
\item full or slow reconciliation (complete name space scan)
\item synchronize the list of valid tape storage classes between EOS and CTA
\end{itemize}
\begin{description}
\item[Reconciliation:] ~
\begin{itemize}
\item Fast reconciliation: in-flight archive requests for sure, maybe retrieve requests as well
\item Full or slow reconciliation: complete name space scan
\item Reconcile Storage Classes: Synchronize the list of valid tape storage classes between EOS and CTA
\end{itemize}
\end{description}
Chapter~\ref{operations} describes the use cases in more detail. Chapter~\ref{protocol} defines the protocol for the
EOS-CTA API. Chapters~\ref{constraints},~\ref{reconciliation_strategy} and~\ref{authorization_rules} discuss constraints
and other operational issues. Issues which need to be considered in more detail are in
Appendix~\ref{questions_and_issues}.
EOS-CTA API. Chapters~\ref{constraints} describes the performance and operational constraints on the system.
Appendices~\ref{questions_and_issues},~\ref{reconciliation_strategy} and~\ref{authorization_rules} detail issues
which need to be agreed on or resolved.
\chapter{CTA Operations}
\label{operations}
\section{File writes}
Figure \ref{fig:write-archive-sync} shows the sequence of a client writing a file to EOS with the optional check
of the storage class on open and synchronous archive request queuing on close. This second options allows to signal
problems to the user, at least on close and possibly as well on open.
\begin{figure}[h]
\begin{figure}[t]
\resizebox{\linewidth}{!}{
\begin{sequencediagram}
\newthread{c}{:Client} \newinst[3]{ef}{:EOS FST} \newinst[1]{em}{:EOS MGM} \newinst[3]{cf}{:CTA FE}
......@@ -29,32 +23,36 @@ problems to the user, at least on close and possibly as well on open.
} \caption{File write and archive queuing (synchronous)} \label{fig:write-archive-sync}
\end{figure}
It should be noted that: \begin{itemize} \item disk copies cannot be deleted before they are archived on tape
(pining) -- the full file could still be deleted (potentially
leading to issues to be handled in the tape archive session).
\item the files with an archive on tape should be immutable in EOS (``raw data use case''), or a delayed archive
mechanism should
be devised for mutable files (CERNBox archive use case).
\item synchronous calls allow reporting of CTA failures to the original client (especially in the case of custodial
data (``raw data case''). The request will be sent to CTA as a notification message (see \ref{dataSerialization}).
\item reporting metadata in "tape replica"(checkum and size) in addition to archive completion will allow EOS
to detect discrepancies (like happened when requests got mixed up in initial tests). \item the workflow will
both trigger the synchronous archive queuing and post a second delayed workflow job that will check and re-issue
the request if needed (in case the request gets lots in CTA). This event driven reconciliation acts as a fast
reconciliation. The criteria to check the file status will be the EOS side status (see below) which CTA reports
asynchronously to EOS (see \ref{dataSerialization}). \end{itemize}
\section{Archive file (CLOSEW)}
Figure \ref{fig:write-archive-sync} shows the sequence of a client writing a file to EOS, with the optional check of the
storage class on open and synchronous archive request queuing on close. This second option allows to signal problems to
the user, at least on close and possibly as well on open. Note that:
\begin{itemize}
\item disk copies cannot be deleted before they are archived on tape (pinning)---the full file could still be deleted
(potentially leading to issues to be handled in the tape archive session).
\item the files with an archive on tape should be immutable in EOS (raw data use case), or a delayed archive mechanism
should be devised for mutable files (CERNBox archive use case).
\item synchronous calls allow reporting of CTA failures to the original client (especially in the case of custodial
data/raw data case). The request will be sent to CTA as a notification message (see \ref{dataSerialization}).
\item reporting metadata in ``tape replica'' (checkum and size) in addition to archive completion will allow EOS to
detect discrepancies (like happened when requests got mixed up in initial tests).
\item the workflow will both trigger the synchronous archive queuing and post a second delayed workflow job that will
check and re-issue the request if needed (in case the request gets lost in CTA). This event-driven reconciliation acts
as a fast reconciliation. The criteria to check the file status will be the EOS side status (see below) which CTA reports
asynchronously to EOS (see \ref{dataSerialization}).
\end{itemize}
EOS will need to represent and handle part the tape status of the files. This includes the fact that the file
should be on tape, the name of the CTA storage class, and the mutually exclusive statuses indicated by CTA:
not on tape, partially on tape, fully on tape. The report from CTA will use the "tape replica" message (see
not on tape, partially on tape, fully on tape. The report from CTA will use the ``tape replica'' message (see
\ref{dataSerialization}).
\section{Updates} \label{updates}
\section{Update} \label{updates}
EOS should filter and forbid updates to files located on tape. Otherwise, a policy should be devised to migrate
at a reasonable rate (once per day?), to allow a backup like behavior (CERNBox files).
at a reasonable rate (once per day?), to allow a backup-type behavior (CERNBox files).
\section{Explicit retrieves (prepare)} \label{user_retrieves}
\section{Explicit retrieve (PREPARE)} \label{user_retrieves}
Explicit retrieve will face several use cases. The way we deal with them should be decided. Synchronous and
asynchronous (through WFE) behaviors make little difference, but allow (catastrophic) problems to be reported to
......@@ -85,12 +83,12 @@ Figure \ref{fig:explicit-retrieve} describes the enqueuing and subsequent retrie
} \caption{File read from tape with explicit prepare (synchronous)} \label{fig:explicit-retrieve}
\end{figure}
Failed CTA transfers to and from tape should be reported to the end EOS user by setting appropriately named extended
attributes on the corresponding EOS namespace entries. For example a retrieve operation that fails due to a full
EOS disk server reporting ENOSPC to a tape server could be reported by the tape server setting the value of the
file’s "last\_retrieve\_result" attribute to "Disk space full".
Failed CTA transfers to and from tape should be reported to the EOS end-user by setting appropriately-named extended
attributes on the corresponding EOS namespace entries. For example, a retrieve operation that fails due to a full EOS
disk server (reporting ENOSPC to a tape server) could be reported by the tape server setting the value of the file's
\texttt{last\_retrieve\_result} attribute to ``Disk space full''.
\section{Implicit retrieves (open for read)}
\section{Implicit retrieve (open for read)}
When a user opens a file with copies only on tape, EOS should implicitly generate a retrieve request to CTA.
In turn, CTA should return a time estimate for the file retrieve arrival, so EOS can return a try timing to the
......@@ -98,18 +96,18 @@ XrootD client, which will retry the open at that time.
As EOS will not keep track of retrieve requests it issued, CTA should have an idempotent retrieve queuing.
\section{User triggered disk copy removal}
\section{User-triggered disk copy removal}
CASTOR has learned that it is not easy or even possible to implement the exact \"garbage collection\" policy
CASTOR has learned that it is not easy or even possible to implement the exact ``garbage collection'' policy
required by experiments when it comes to deleting disk copies of files safely stored on tape. CASTOR has provided
the {\tt{}stager\_rm} command to end users to enable them to manually garbage collect files in their CASTOR disk
cache. We currently believe that an equivalent of the {\tt{}stager\_rm} command should be implemented in EOS.
Such a command could simply be a request to execute a {\tt{}"stager\_rm"} workflow action on a specific file..
Such a command could simply be a request to execute a {\tt{}stager\_rm} workflow action on a specific file.
\section{Garbage collection of disk copies}
A double criteria garbage collection will probably be necessary to keep free space in disk pools (file age
(LRU/FIFO/etc...) + pinning).
A double-criteria garbage collection will probably be necessary to keep free space in disk pools (file age
(LRU/FIFO/etc. \ldots) + pinning).
\section{Complete deletion of files}
......@@ -134,8 +132,8 @@ CTA should be able to inject new files into the EOS tree when: \begin{itemize}
The slow reconciliation would scan the entire list of files existing in one EOS instance. CTA could then detect
the files which are missing on its side, and the ones which are not known to EOS anymore. Metadata changes in
EOS will as well be propagated to CTA durig this process. Extra levels of safety could be added (crossing sizes,
checksums, etc...) at the cost of a heavier streaming from EOS. We would then need a retransmit request operation
EOS will as well be propagated to CTA during this process. Extra levels of safety could be added (crossing sizes,
checksums, etc. \ldots) at the cost of a heavier streaming from EOS. We would then need a retransmit request operation
(could be triggering the proper workflow in EOS), and possibly another operation allowing the confirmation of
non-existence of a file.
......
\chapter{EOS-CTA Protocol}
\label{protocol}
\section{Data serialization} \label{dataSerialization} All messages that
are sent from EOS to CTA and from CTA to EOS will be serialized using Google protocol buffers. A dedicated type of
workflow will be assigned but the administrator to the events that should be propagated to CTA. This will trigger
the sending of a {\tt{}notification} message, thought an interface still TdB. The sending must be synchronous,
with error propagation to the user. The option of asynchronous events is possible for the ones not requiring error
propagation to the user.
\section{Data serialization}
\label{dataSerialization}
The error back propagation protobuf is still TbD.
All messages that are sent from EOS to CTA and from CTA to EOS will be serialized using Google protocol buffers. A
dedicated type of workflow will be assigned but the administrator to the events that should be propagated to CTA. This
will trigger the sending of a {\tt{}notification} message (through an interface still to be decided). The sending must
be synchronous, with error propagation to the user. The option of asynchronous events is possible for the ones not
requiring error propagation to the user.
The error back-propagation protobuf is still TBD.
The protocol buffer specification used by the EOS fuse mount is not be shared with the EOS/CTA interface in order
to keep the development of the fuse mount and EOS/CTA interface completely decoupled.
......@@ -23,109 +25,156 @@ CTA where the interface has been upgraded. Such upgrades would start with a new
first that can talk both the old and new interface protocol(s). Then each EOS instance would be upgraded, one
experiment at a time.
The workflows will no longer set any file properties on the EOS side, as those will be set by CTA via "tape replica"
and "xattr" calls.
The workflows will no longer set any file properties on the EOS side, as those will be set by CTA via ``tape replica''
and ``xattr'' calls.
\begin{lstlisting}
syntax = "proto3"; package eos.wfe;
syntax = "proto3";
package eos.wfe;
message Id {
fixed64 uid = 1; //< user identity number string username = 2; //< user name fixed64 gid = 3; //<
group identity number string groupname = 4; //< group name
fixed64 uid = 1; //< user identity number
string username = 2; //< user name
fixed64 gid = 3; //< group identity number
string groupname = 4; //< group name
}
message Checksum {
string value = 1; //< checksum value string name = 2; //< checksum name
string value = 1; //< checksum value
string name = 2; //< checksum name
}
message Clock {
fixed64 sec = 1; //< seconds of a clock fixed64 nsec = 2; //< nanoseconds of a clock
fixed64 sec = 1; //< seconds of a clock
fixed64 nsec = 2; //< nanoseconds of a clock
}
message Md {
fixed64 fid = 1; //< file/container id fixed64 pid = 2; //< parent id Clock ctime = 3; //<
change time Clock mtime = 4; //< modification time Clock btime = 5; //< birth time Clock ttime = 6;
//< tree modification time Id owner = 7; //< ownership fixed64 size = 8; //< size Checksum cks = 9;
//< checksum information sfixed32 mode = 10; //< mode string lpath = 11; //< logical path map<string,
string> xattr = 12; //< xattribute map
fixed64 fid = 1; //< file/container id
fixed64 pid = 2; //< parent id
Clock ctime = 3; //< change time
Clock mtime = 4; //< modification time
Clock btime = 5; //< birth time
Clock ttime = 6; //< tree modification time
Id owner = 7; //< ownership
fixed64 size = 8; //< size
Checksum cks = 9; //< checksum information
sfixed32 mode = 10; //< mode
string lpath = 11; //< logical path
map<string,string>
xattr = 12; //< xattribute map
};
message Security {
string host = 1; //< client host string app = 2; //< app string string name = 3; //< sec name
string prot = 4; //< security protocol string grps = 5; //< security grps
string host = 1; //< client host
string app = 2; //< app string
string name = 3; //< sec name
string prot = 4; //< security protocol
string grps = 5; //< security grps
}
message Client {
Id user = 1; //< acting client Security sec = 2; //< client security information
Id user = 1; //< acting client
Security sec = 2; //< client security information
}
message Service {
string name = 1; //< name of the service string url = 2; //< access url of the service
string name = 1; //< name of the service
string url = 2; //< access url of the service
}
message Workflow {
enum EventType {
NONE = 0; OPENR = 1; OPENW = 2; CLOSER = 3; CLOSEW = 4; DELETE = 5; PREPARE = 6;}
EventType event = 1; //< event string queue = 2; //< queue string wfname = 3; //< workflow string
vpath = 4; //< vpath Service instance = 5; //< instance information fixed64 timestamp = 6; //< event timestamp
enum EventType { NONE = 0; OPENR = 1; OPENW = 2; CLOSER = 3; CLOSEW = 4;
DELETE = 5; PREPARE = 6; }
EventType event = 1; //< event
string queue = 2; //< queue
string wfname = 3; //< workflow
string vpath = 4; //< vpath
Service instance = 5; //< instance information
fixed64 timestamp = 6; //< event timestamp
}
message Notification {
Workflow wf = 1; //< workflow string turl = 2; //< transport URL Client cli = 3; //< client
information Md file = 4; //< file meta data Md directory = 5; //< directory meta data
Workflow wf = 1; //< workflow
string turl = 2; //< transport URL
Client cli = 3; //< client information
Md file = 4; //< file meta data
Md directory = 5; //< directory meta data
}
message Xattr {
enum Operation { NONE = 0; GET = 1; ADD = 2; SET = 3; DELETE = 4;} fixed64 fid = 1; //< file id
map<string, string> xattrs = 2; //< xattribute map Operation op = 3; //< operation to execute for
this xattr map
enum Operation { NONE = 0; GET = 1; ADD = 2; SET = 3; DELETE = 4;}
fixed64 fid = 1; //< file id
map<string, string>
xattrs = 2; //< xattribute map
Operation op = 3; //< operation to execute for this xattr map
}
message Tapereplica {
enum Status { NONE = 0; OFFTAPE = 1; ONTAPE = 2; ONTAPESAVE = 3;} fixed64 fid = 1; //< file id Status status
= 2; //< state state for file ID fixed64 size = 3; //< File size as recorded on tape for cross check Checksum
cks = 4; //< File checksum as computer while writing to tape
enum Status { NONE = 0; OFFTAPE = 1; ONTAPE = 2; ONTAPESAVE = 3;}
fixed64 fid = 1; //< file id
Status status = 2; //< state state for file ID
fixed64 size = 3; //< File size as recorded on tape for cross check
Checksum cks = 4; //< File checksum as computer while writing to tape
}
message Error {
enum Audience { NONE = 0; EOSLOG = 1; ENDUSER = 2;} Audience audience = 1; //< The intended audience of the
error message fixed64 code = 2; //< Zero means success, non-zero means error string message = 3; //<
An empty if success, else an error message
enum Audience { NONE = 0; EOSLOG = 1; ENDUSER = 2;}
Audience audience = 1; //< The intended audience of the error message
fixed64 code = 2; //< Zero means success, non-zero means error
string message = 3; //< An empty if success, else an error message
}
// The following message is used to wrap all messages sent between EOS and its // peers. // // This wrapper message
allows new message types to be added to the protocol in // the future. // // This wrapper message also allows EOS
peers to receive non-EOS messages as // long as the following two conditions are met: // 1. The peer uses a wrapper
message with exactly the same (simple) structure. // 2. No two message types use the same numeric tag value. //
// The structure of this message is based on the "Union Types" section of the // following Google protocol buffers
web page: // // https://developers.google.com/protocol-buffers/docs/techniques // // A protocol buffer parser
cannot determine a message type based solely on its // contents. The type field of this wrapper message provides
the required // metadata. message Wrapper {
enum Type {NONE = 0; ERROR = 1; NOTIFICATION = 2; XATTR = 3; TAPEREPLICA = 4;} Type type = 1; Error error = 2;
Notification notification = 3; Xattr xattr = 4; Tapereplica tapereplica = 5;
// The following message is used to wrap all messages sent between EOS and its
// peers.
//
// This wrapper message allows new message types to be added to the protocol in
// future.
//
// This wrapper message also allows EOS peers to receive non-EOS messages as long
// as the following two conditions are met:
// 1. The peer uses a wrapper message with exactly the same (simple) structure.
// 2. No two message types use the same numeric tag value.
//
// The structure of this message is based on the "Union Types" section of the
// following Google protocol buffers web page:
//
// https://developers.google.com/protocol-buffers/docs/techniques
//
// A protocol buffer parser cannot determine a message type based solely on its
// contents. The type field of this wrapper message provides the required metadata.
message Wrapper {
enum Type {NONE = 0; ERROR = 1; NOTIFICATION = 2; XATTR = 3; TAPEREPLICA = 4;}
Type type = 1;
Error error = 2;
Notification notification = 3;
Xattr xattr = 4;
Tapereplica tapereplica = 5;
}
\end{lstlisting}
\section{CTA front end extension} \label{CLI}
As SSIv2 is not available yet, we could use an interim interface over the current CLI.
We could modify the current CTA command-line tool to receive a protocol buffer message on its standard in and send
that message to the CTA front end as the contents of a virtual file in the CTA front-end’s virtual namespace.
The CTA command-line tool currently: \begin{itemize} \item[1] Base64 encodes each element of the argv\[\] array passed
to its main() function,. \item[2] Concatenates all of the encoded elements together into a single string with an
ampersand ‘\&’ between each encoded element. \item[3] Calls the XrdCl::FILE::Open() method with the resulting
string as the name of the file to be opened. \end{itemize} For example executing cta admin ls at command prompt
would cause the cta program to call:
As SSIv2 is not available yet, we could use an interim interface over the current CLI. We could modify the current CTA
command-line tool to receive a protocol buffer message on its standard in and send that message to the CTA front end as
the contents of a virtual file in the CTA front-end's virtual namespace.
The CTA command-line tool currently:
\begin{enumerate}
\item Base64 encodes each element of the argv[] array passed to its main() function.
\item Concatenates all of the encoded elements together into a single string with an ampersand '\&' between each encoded element.
\item Calls the XrdCl::FILE::Open() method with the resulting string as the name of the file to be opened.
\end{enumerate}
For example executing \texttt{cta admin ls} at command prompt would cause the cta program to call:
\begin{lstlisting}
XrdCl::FILE::Open("Y3Rh&YWRtaW4=&bHM=")
\end{lstlisting}
where {\tt{}"Y3Rh"} is the base64 encoding of {\tt"cta"}, {\tt"YWRtaW4"} is {\tt"admin"} and
{\tt"bHM="} is {\tt"ls"}. This therefore means the file opened is always named {\tt"Y3Rh"} which is the base64
encoding of the {\tt"cta"}. The modified CTA command-line tool could use a different file name when sending the
protocol buffer message so that the CTA front end could be incrementally modified to continue to handle all of
the current cta commands by processing the “Y3Rh” file and all of the new protocol buffer messages by handling
the opening of the new file, for example {\tt"protobuf"}.
where {\tt{}Y3Rh} is the base64 encoding of {\tt cta}, {\tt YWRtaW4} is {\tt admin} and {\tt bHM=} is {\tt ls}. This
therefore means the file opened is always named {\tt Y3Rh} which is the base64 encoding of {\tt cta}. The modified
CTA command-line tool could use a different file name when sending the protocol buffer message so that the CTA front end
could be incrementally modified to continue to handle all of the current cta commands by processing the ``Y3Rh'' file and
all of the new protocol buffer messages by handling the opening of the new file, for example {\tt protobuf}.
The EOS workflow engine currently talks to CTA by executing bash scripts that call the {\tt{}cta} command-line
tool. These bash scripts selectively pass the appropriate arguments to the {\tt{}cta} command-line tool for each
......@@ -134,29 +183,14 @@ will contain everything known by EOS about the file being acted upon. The CTA fr
everything it needs to know about the file and more. It will be responsibility of the CTA front end to only select
what information it needs from the protocol buffer message.
\section{Move to Xrootd's SSIv2} \label{SSI}
\section{Move to Xrootd's SSIv2}
\label{SSI}
The \href{http://xrootd.org/doc/dev42/ssi\_reference.htm}{Scalable Service Interface} functionality of XRootD is
expected to provide a threading model matching our requirements (i.e. without the per-file serialization of the calls).
\section{Synchronous calls from EOS to CTA} The EOS workflow engine will be modified to support synchronous
actions as well as the existing asynchronous ones. A synchronous action will enable an EOS client to call EOS
which in turn will call CTA, which will return a "return value" reply that is synchronously relayed back to the
EOS client. A concrete example would be when an EOS client opens a file for creation that is supposed to be
eventually archived to tape. EOS will synchronously call CTA in order to determine whether or not the storage
class of the file is known and has a destination tape pool. If these two conditions are not met then the EOS
client will get an immediate synchronous reply saying the file cannot be created because it cannot be archived to tape.
\section{After the fact check on archive from EOS} EOS will schedule a second workflow job when an archive
is triggered. This will check the archive status and re trigger it if needed at a later time.
\section{Security considerations} CTA will filter EOS requests per instance and forbid cross talk between
instances. Each instance should be authenticated with a unique simple shared secret (SSS) key.
As CTA will have access to every EOS instance, it is desirable to apply the least privilege principle to CTA
and to limit the actions allowed to CTA's SSS key to the strict minimum. This will be achieved by limiting CTA's
credentials to the protocol interface.
\section{EOS support of tape notions}
\section{EOS support of tape notions} In order to achieve those function, EOS will keep track of tape related
mode (should have a copy on tape), as well as keeping track of the tape replica status.
In order to achieve those function, EOS will keep track of tape-related mode (should have a copy on tape), as well as
keeping track of the tape replica status.
\chapter{Questions and Issues}
\label{questions_and_issues}
The following points are still under discussion:
\begin{itemize} \item Handling of immutability \item "RPC" channel between EOS and CTA and vice-versa not decided
yet (opaque query, open-write-read-close, or SSIv2). \item notification return structure ("return value") should
be defined. It will contain:
\begin{itemize} \item Success \item Action to be taken on success (for example set the “CTA archive ID”
extended attribute of the EOS file being queued for archival) \item Failure code \item Failure message for
logging in EOS (still under discussion) \item Failure message for the end user executing a synchronous workflow
(for example “Cannot open file for writing because there is no route to tape”) \end{itemize}
\item Full life cycle of files in EOS with copies on tape should be determined (they inherit their tape properties
from the directory, but what happens when the file gets moved or the directory properties changed?). \item Finalize
update policy (\ref{updates}). \item Reporting of retrieve status could use the "xattr" message (to be confirmed,
see \ref{dataSerialization}) \item Reporting of failed archival could also use "xattr". \item Reporting of the
last error encountered in CTA could also use the "xattr" message (to be confirmed, see \ref{dataSerialization}).
\item A method to re-creation a file from CTA to EOS should be devised. \item We might want to pass the information
that a file deletion has been confirmed after reconciliation with the user's catalogue. Also delete could be passed
to CTA when the file is moved to the recycle bin in EOS, or when it is definitely deleted from EOS. \item Full
chain reconciliation should be devised. \item Slow reconciliation interface \item Action on storage class change
for a file? (postponed to repack?) \item Catalogue will also keep track of requests for each files (archive and
retrieve) so that queueing can be made idempotent. \item Chaining of archive and retrieve requests to retrieve
requests. Excution of retrieve requests as disk to disk copy if possible. \item Catalogue files could hold the
necessary info to recreate the archive request if needed. \item The list of valid storage classes needs to be
synchronized between EOS and CTA. EOS should not allow a power user to label a directory with an invalid storage
class. CTA should not delete or invalidate a storage class that is being used by EOS.\end{itemize}
This appendix contains questions and open issues which are still under discussion.
\section{API}
Should the EOS-CTA interface be defined as a shared library (e.g. SSIv2 client plugin) or as a framework (boilerplate
code)?
So we just need serialise---send---receive---parse ?
\section{Communication Layer}
RPC channel between EOS and CTA: current strategy is to use SSIv2. In case of unforseen problems, other options are
still open: opaque query or open-write-read-close.
Is adding SSIv2 support as simple as loading the SSI plugin?
\section{Return value}
Notification return structure ("return value") should be defined. It will contain:
\begin{itemize}
\item Success
\item Action to be taken on success (for example set the ``CTA archive ID'' extended attribute of the EOS file being queued for archival)
\item Failure code
\item Failure message for logging in EOS (still under discussion)
\item Failure message for the end user executing a synchronous workflow (for example ``Cannot open file for writing because there is no route to tape'')
\end{itemize}
\section{\texttt{xattr}}
Reporting of retrieve status could use the \texttt{xattr} message (to be confirmed, see \ref{dataSerialization}).
Reporting of failed archival could also use \texttt{xattr}. Reporting of the last error encountered in CTA could also
use the \texttt{xattr} message (to be confirmed, see \ref{dataSerialization}).
\section{CTA Failure}
What is the mechanism for restarting a failed archive request (in the case that EOS accepts the request and CTA fails
subsequently)?
If CTA is unavailable or unable to perform an archive operation, should EOS refuse the archive request and report failure
to the User?
What is the retry policy?
\section{File life cycle}
Full life cycle of files in EOS with copies on tape should be determined (they inherit their tape properties
from the directory, but what happens when the file gets moved or the directory properties changed?).
\section{Handling of immutability}
It is forbidden to update files archived on tape. Devise an update policy for backup-type behaviour (\ref{updates}).
\section{Slow reconciliation interface}
Action on storage class change for a file? (postponed to repack?)
Possible admin daemon that handles slow reconcilations and repacks?
Full chain reconciliation should be devised.
\section{Restoring Deleted Files}
A method to re-create a deleted file in EOS from CTA data/metadata should be devised.
We might want to pass the information that a file deletion has been confirmed after reconciliation with the user's catalogue. Also delete could be passed
to CTA when the file is moved to the recycle bin in EOS, or when it is definitely deleted from EOS.
\section{Storage Classes}
The list of valid storage classes needs to be synchronized between EOS and CTA. EOS should not allow a power user to
label a directory with an invalid storage class. CTA should not delete or invalidate a storage class that is being used
by EOS.
\section{Request Queue}
Chaining of archive and retrieve requests to retrieve requests.
</