1. 20 Oct, 2017 2 commits
  2. 19 Oct, 2017 1 commit
    • Eric Cano's avatar
      Rados performance: fixed missing unwatch call and added instrumentation. · c903c7bb
      Eric Cano authored
      The missing unwatch fix should improve performance of watch/notify based locking significantly.
      Instrumentation will log any call to rados longer that 1s to /var/tmp/cta-rados-slow-calls.log.
      Also prepared a structure to allow switching between watch/notify and backoff based locking.
      Backoff code is not yet brought back (will test with the unwatch fix first).
      c903c7bb
  3. 16 Oct, 2017 1 commit
    • Eric Cano's avatar
      Moved Rados locking to aio_unwatch. · d124b2fd
      Eric Cano authored
      The unwatch step is pretty slow, so the notification structure is now in a seprate
      internal object, which is left to be deleted by the callback of aio_unwatch. We need
      to keep the structure around for that time as notifications could still arrive
      between the call to aio_unwatch and the actual unwatching.
      d124b2fd
  4. 08 Oct, 2017 4 commits
  5. 30 Sep, 2017 1 commit
    • Eric Cano's avatar
      Simplified reads in rados backend. · fc1ddbe0
      Eric Cano authored
      We do not do a stat before reading. Instead we ask for an arbitrarily big read, and find out
      the size of the data while reading. This avoids a race condition in lockfree reads where we failed
      to get the full object if it got re-written to a bigger size between stat an read.
      fc1ddbe0
  6. 25 Sep, 2017 1 commit
  7. 20 Sep, 2017 1 commit
    • Eric Cano's avatar
      Fixed long updating of statistics in object store leading to timeout. · 0c28e713
      Eric Cano authored
      The updating of statistic in the objecttore (in drive register) could wait for a lock
      for so long that the session get killed due to a heatbeat timeout.
      
      To avoid this, we use lockfree access as much as possible, and when locking it srequired, do it with a
      timeout.
      
      The timeout will be logged as a warning.
      
      Added timeouts to synchronous locking primitives in the object store backends to implement this feature.
      
      Fixed a bug where the VFS backend always used exclusive locks.
      0c28e713
  8. 05 Sep, 2017 2 commits
  9. 16 Aug, 2017 1 commit
    • Victor Kotlyar's avatar
      Implement batch reporting to the backend for successful retrieve jobs. · b7213e30
      Victor Kotlyar authored
      Proceed all successful reports asynchronously and periodically check
      and clear statuses if they have finished.
      In the end of session do the check/flush for all reports in the
      successful reports queue.
      
      Switch from synchronous rados remove to async aio_remove in
      case of zero size object in BackendRados::AsyncUpdater
      b7213e30
  10. 09 Aug, 2017 1 commit
    • Victor Kotlyar's avatar
      Make archive reporting on the flush for the batch of jobs. · c8827ade
      Victor Kotlyar authored
      Changed reporting to the Catalog with a batch of written files.
      Changed synchronous reporting to the backend job by job to the
      asynchronous reporting for batch of jobs.
      Changed synchronous reporting to the EOS mgm to the asynchronous
      reporting.
      c8827ade
  11. 04 Aug, 2017 1 commit
  12. 03 Aug, 2017 1 commit
  13. 30 Jul, 2017 1 commit
    • Eric Cano's avatar
      Revisited locking in BackendRados. · 4918ed96
      Eric Cano authored
      The locks in Rados have timeouts. They are needed in case a locker process dies without
      releasing its lock. As we have some contention in heavily loaded situations, it can happen
      that a process is till accessing objects while the lock is expired. To lessen the likeliness
      of this situation, the timeout has been increased from 10s to 60s.
      
      The backoff was ajusted using the MultithreadLockingInterface unit test, with printouts
      allowing to visually see the effect of the backoff strategy. The printouts are committed,
      but they are commented out.
      
      The same unit test was fized as it used to create an empty object, which is not supported
      anymore in order to be able to detect locking of non-existing objects (lock creates the object,
      but we detect non-existence as it is empty and re-delete it.
      
      This mechanism of empty object locking detection is also added to the async update of object
      as it was missing there (and the backoff has been added there too).
      4918ed96
  14. 28 Jul, 2017 1 commit
  15. 26 Jul, 2017 1 commit
    • Eric Cano's avatar
      Fixed racy implementation of BackendRados::lock{Exclusive|Shared}() · aa56a1c3
      Eric Cano authored
      As rados re-creates an object when trying to lock it, we tested for presence before locking.
      This is racy as object could be deleted in the mean time.
      Instead, we now lock blindly and delete the object if we find it having a zero-size.
      As we own the lock, this is safe.
      
      This problem led to issues in garbage colector, where agent gets polled while it could disappear.
      aa56a1c3
  16. 23 May, 2017 1 commit
  17. 19 May, 2017 1 commit
    • Eric Cano's avatar
      Reviewed promised based thread synchronisation · 8012a02d
      Eric Cano authored
      Fixed calls to promise::get_future() after possible access form other thread. They are now guaranteed to happen before.
      Added helgrind annotations for promise based synchronisation.
      Added macros enabling helgrind annotations for shared_ptr.
      Added suppression for shared_ptr used inside other standard lib object and not covered by the previous macros.
      Added unit test for lower level .
      Added suppressions for reported race conditions in Rados library.
      Review heavily MemArchiveQueue and fixed missing commit in object store, leading to potentially orphaned objects.
      Enabled formerly disabled test as it is now fast enough.
      8012a02d
  18. 09 May, 2017 1 commit
  19. 28 Apr, 2017 2 commits
  20. 13 Apr, 2017 1 commit
  21. 02 Dec, 2016 1 commit
  22. 30 Nov, 2016 2 commits
  23. 18 Nov, 2015 1 commit
    • Eric Cano's avatar
      Replaced byte arrays with simple std::strings · b8b82fb5
      Eric Cano authored
      Added url style conversion of checksums
      Added support for checksums in mockNs
      Fixed support for checksums in the scheduler
      Re-instated several ASSERT_NO_THROW which were commented out during debugging.
      b8b82fb5
  24. 05 Nov, 2015 2 commits
  25. 04 Nov, 2015 1 commit
  26. 01 Jun, 2015 1 commit
  27. 28 May, 2015 1 commit
  28. 20 May, 2015 1 commit
  29. 13 May, 2015 1 commit
    • Eric Cano's avatar
      Reorganized the files in themes. · a5621a56
      Eric Cano authored
      middletier, itself separated in interface, SQLite, objectstore and
      shared tests.
      Moved all utilities (exceptions, threading...) to a shared utility
      directory.
      Created a single, shared unit test from all the scattered ones.
      a5621a56
  30. 08 May, 2015 1 commit
  31. 07 May, 2015 1 commit
  32. 29 Apr, 2015 1 commit