1. 09 Aug, 2017 1 commit
    • Victor Kotlyar's avatar
      Make archive reporting on the flush for the batch of jobs. · c8827ade
      Victor Kotlyar authored
      Changed reporting to the Catalog with a batch of written files.
      Changed synchronous reporting to the backend job by job to the
      asynchronous reporting for batch of jobs.
      Changed synchronous reporting to the EOS mgm to the asynchronous
      reporting.
      c8827ade
  2. 08 Aug, 2017 2 commits
    • Eric Cano's avatar
      Reworked OStoreDB::ArchiveMount::getNextJobBatch(). · d733b4c7
      Eric Cano authored
      This new version locks the queues for less time.
      Fixed a bug where the wrong lock (shared and not exclusive) was taken when removing empty queues
      from the root entry.
      Improved multiple iterations retrying (we can now retry on a new queue if one gets contructed
      in the meantime.
      d733b4c7
    • Eric Cano's avatar
      Moved mhvtl's tape to tmpfs. · 9e329479
      Eric Cano authored
      This makes the system tests faster on HDD backed systems as mhvtl is an heavy user of fsync.
      
      Fsync cost can go up to 100ms on HDD (+vm + hyperV).
      9e329479
  3. 07 Aug, 2017 2 commits
  4. 04 Aug, 2017 5 commits
    • Eric Cano's avatar
      Increased timeout for rados locks to 4 minutes. · 721d6d59
      Eric Cano authored
      Some cases of operations taking over a minute were seen practice.
      721d6d59
    • Eric Cano's avatar
      d64e47a3
    • Eric Cano's avatar
      Fixed sheduling bug where not enough retrieve mounts got triggered. · 5769eaaa
      Eric Cano authored
      The queue size division by the number of existing mount is only valid for archive mounts
      where the queue is shared by each mount. In the case of retrieves, the criteria should be considered
      vid by vid and the number of exiting mounts should not matter.
      5769eaaa
    • Steven Murray's avatar
      Fixed missing FSctl symbol · 716c46eb
      Steven Murray authored
      This commit fixes the following error encountered by the
      Continuous Integration system and discovered and
      reported by Eric:
      
      Config Falling back to using libXrdCtaOfs.so
      Plugin /lib64/libXrdCtaOfs.so: undefined symbol: _ZN3cta13xroot_plugins16XrdCtaFilesystem5FSctlEiR11XrdSfsFSctlR13XrdOucErrInfoPK12XrdSecEntity fslib libXrdCtaOfs.so
      Config Unable to load fslib plugin libXrdCtaOfs.so
      170804 02:37:03 558 XrootdConfig: Unable to create file system object via libXrdCtaOfs.so
      170804 02:37:03 558 XrootdConfig: Unable to load file system.
      ------ xrootd protocol initialization failed.
      
      The origin of the problem is the following commit that
      completed removed the implemention of the
      XrdCtaFilesystem::FSctl() method:
      
      commit 419ea364
      Author: Michael Davis <michael.davis@cern.ch>
      Date:   Thu Aug 3 10:54:49 2017 +0200
      
          [XrdSsi] Updates eos_messages.proto and deletes Opaque Query
      
          Update the protobuf file to the version required by EOS-CTA SSI
          interface and delete all source code that depends on the previous
          version (i.e. all the opaque query code).
      
      This removal should have been a replacement as opposed
      to a hard delete.  The XrdCtaFilesystem::FSctl() method
      should have been re-implemented as follows:
      
      int XrdCtaFilesystem::FSctl(const int cmd, XrdSfsFSctl &args, XrdOucErrInfo &eI
        (void)cmd; (void)args; (void)eInfo; (void)client;
        eInfo.setErrInfo(ENOTSUP, "Not supported.");
        return SFS_ERROR;
      }
      716c46eb
    • Eric Cano's avatar
      Added handling of unlocking error in OStoreDB::ArchiveMount::getNextJobBatch(). · 670915e3
      Eric Cano authored
      Problem can occur in case of lock expiration. The request is still updated and should be handled as such.
      The previous behaviour led to orphaned objects.
      670915e3
  5. 03 Aug, 2017 11 commits
  6. 02 Aug, 2017 5 commits
  7. 01 Aug, 2017 2 commits
  8. 31 Jul, 2017 2 commits
  9. 30 Jul, 2017 1 commit
    • Eric Cano's avatar
      Revisited locking in BackendRados. · 4918ed96
      Eric Cano authored
      The locks in Rados have timeouts. They are needed in case a locker process dies without
      releasing its lock. As we have some contention in heavily loaded situations, it can happen
      that a process is till accessing objects while the lock is expired. To lessen the likeliness
      of this situation, the timeout has been increased from 10s to 60s.
      
      The backoff was ajusted using the MultithreadLockingInterface unit test, with printouts
      allowing to visually see the effect of the backoff strategy. The printouts are committed,
      but they are commented out.
      
      The same unit test was fized as it used to create an empty object, which is not supported
      anymore in order to be able to detect locking of non-existing objects (lock creates the object,
      but we detect non-existence as it is empty and re-delete it.
      
      This mechanism of empty object locking detection is also added to the async update of object
      as it was missing there (and the backoff has been added there too).
      4918ed96
  10. 29 Jul, 2017 1 commit
  11. 28 Jul, 2017 8 commits