Commit fefff11e authored by Daniele Kruse's avatar Daniele Kruse
Browse files

added documentation on CTA basic concepts

parent 3704f81e
CTA basic concepts
CTA is operated by authorized administrators (AdminUsers) who issue CTA commands from authorized machines (AdminHosts), using the CTA command line interface. All administrative metadata (such as tape, tape pools, storage classes, etc..) is tagged with a "creationLog" and a "lastModificationLog" which say who/when/where created them and last modified them. An administrator may create ("add"), delete ("rm"), change ("ch") or list ("ls") any of the administrative metadata.
Tape pools are logical groupings of tapes that are used by operators to separate data belonging to different VOs. They are also used to categorize types of data and to separate multiple copies of files so that they end up in different buildings. Each tape belongs to one and only one tape pool.
Logical libraries are the concept that is used to link tapes and drives together. We use logical libraries to specify which tapes are mountable into which drives, and normally this mountability criteria is based on location, that is the tape has to be in the same physical library as the drive, and on read/write compatibility. Each tape and each drive has one and only one logical library.
The storage class is what we assign to each archive file to specify how many tape copies the file is expected to have. Archive routes link storage classes to tape pools. An archive route specifies onto which set of tapes the copies will be written. There is an archive route for each copy in each storage class, and normally there should be a single archive route per tape pool.
So to summarize, an archive file has a storage class that determines how many copies on tape that file should have. A storage class has an archive route per tape copy to specify into which tape pool each copy goes. Each tape tool is made of a disjoint set of tapes. And tapes can be mounted on drives that are in their same logical library.
Archiving a file with CTA
CTA has a CLI for archiving and retrieving files to/from tape, that is meant to be used by an external disk-based storage system with an archiving workflow engine such as EOS. A non-administrative "User" in CTA is an EOS user which triggers the need for archiving or retrieving a file to/from tape. A user normally belongs to a specific CTA "mount group", which specifies the mount criteria and limitations (together called "mount policy") that trigger a tape mount. Here we offer a simplified description of the archive process:
1. EOS issues an archive command for a specific file, providing its source path, its storage class (see above), and the user requesting the archival
2. CTA returns immediately an "ArchiveFileID" which is used by CTA to uniquely identify files archived on tape. This ID will be kept by EOS for any operations on this file (such as retrieval)
3. Asynchronosly, CTA carries out the archival of the file to tape, in the following steps:
a. CTA looks up the storage class provided by EOS and makes sure it has correct routings to one or more tape pools (more than one when multiple copies are required by the storage class)
b. CTA queues the corresponding archive job(s) to the proper tape pool(s)
c. in the meantime each free tape drive queries the central "scheduler" for work to be done, by communicating its name and its logical library
d. for each work request CTA checks whether there is a free tape in the required pool (as specified in b.), that belongs to the desired logical library (as specified in c.)
e. if that is the case, CTA checks whether the work queued for that tape pool is worth a mount, i.e. if it meets the archive criteria specified in the mount group to which the user (as specified in 1.) belongs
f. if that is the case, the tape is mounted in the drive and the file gets written from the source path specified in 1. to the tape
g. after a successful archival CTA notifies EOS through an asynchronous callback
An archival process can be canceled at any moment (even after correct archival, but in this case it's a "delete") through the "delete archive" command
Retrieving a file with CTA
Here we offer a simplified description of the retrieve process:
1. EOS issues a retrieve command for a specific file, providing its ArchiveFileID and desired destination path, and the user requesting the retrieval
2. CTA returns immediately
3. Asynchronosly, CTA carries out the retrieval of the file from tape, in the following steps:
a. CTA queues the corresponding retrieve job(s) to the proper tape(s) (depending on where the tape copies are located)
b. in the meantime each free tape drive queries the central "scheduler" for work to be done, by communicating its name and its logical library
c. for each work request CTA checks whether the logical library (as specified in b.) is the same of (one of) the tape(s) (as specified in a.)
d. if that is the case, CTA checks whether the work queued for that tape is worth the mount, i.e. if it meets the retrieve criteria specified in the mount group to which the user (as specified in 1.) belongs
e. if that is the case, the tape is mounted in the drive and the file gets read from tape to the destination specified in 1.
f. after a successful retrieval CTA notifies EOS through an asynchronous callback
A retrieval process can be canceled at any moment prior to correct retrieval through the "cancel retrieve" command
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment