Skip to content
Snippets Groups Projects
  1. Nov 28, 2022
  2. Apr 29, 2022
  3. Mar 28, 2022
  4. Nov 12, 2021
  5. Nov 09, 2021
  6. Nov 08, 2021
  7. Oct 12, 2021
    • mvelosob's avatar
      Mitigate popping bad behaviour in archive queues (#1027) · 2bec11ea
      mvelosob authored
      During the July datachallenge the archiveJobTransferForUser queue for the r_atlas_test_datachallenge
      tapepool became full of ArchiveJobs whose bytes field became zero after popped from the queue.
      This caused the tape servers to pop ~5TB of work from the queue. To prevent this from happening in
      the future, instead of summing the sizes of the individual elements popped from the queue, we now
      subtract them from the total size popped. This way, if there are popped jobs that have incorrectly
      set their bytes field to zero, the algorithm will consume less data than expected, not exponencially
      more
      2bec11ea
  8. Aug 02, 2021
  9. Jun 15, 2021
  10. Jun 02, 2021
  11. Nov 22, 2019
  12. Jul 02, 2019
  13. Apr 15, 2019
  14. Feb 22, 2019
  15. Jan 25, 2019
  16. Dec 20, 2018
  17. Dec 13, 2018
  18. Dec 10, 2018
    • Eric Cano's avatar
      Implemented promotion of repack requests from Pending to ToExpand · 074bd3d5
      Eric Cano authored
      This promotion is controlled so that only a limited number a requests
      are in the
      state ToExpand or Starting at any point in time. This ensures both the
      availabality
      of repack file requests to system while preventing an explosion of file
      level requests.
      
      Created a one-round popping from the container (algorithms) with status
      switching.
        - Used for repack requests switching from pendig to to expand
      
      Added ElementStatus to algorithms.
      
      Implemented promotion interface in Scheduler and OstoreDb. The actual
      decision is taken at
      the Scheduler level. The function itself is called by the
      RepackRequestManager.
      
      Promotion is tested in a unit test.
      
      Various code maintenance:
      Switched to "using"-based constructor inheritance.
      Fixed privacy of function in cta::range.
      074bd3d5
  19. Oct 01, 2018
  20. Sep 07, 2018
  21. Sep 05, 2018
  22. Sep 03, 2018
  23. Aug 30, 2018
    • Eric Cano's avatar
      Reworked ArchiveRequest jobs lifecycles. · 0a9d8429
      Eric Cano authored
      Changed the lifecycle of the ArchiveRequest to handle the various
      combinations of several jobs and their respective success/failures.
      Most notably, the request now holds a reportdecided boolan, which
      is set when decing to report. This happens when failing to archive
      one copy (first failure), or when all copies are transferred (success
      for all copies).
      
      Added support for in-mount retries. On falure, the job will be
      requeued (with a chance to pick it up again) in the same session
      if sane session retries are not exceeded. Otherwise, the job is
      left owned by the session, to be picked up by the garbage collector
      at tape unmount.
      
      Made disk reporter generic, dealing with both success and failure.
      Improved mount policy support fir queueing.
      
      Expanded information avaible in popped element from archive queues.
      
      Added optional parameters to ArchiveRequest::asyncUpdateJobOwner() to
      cover various cases.
      
      Updated the archive job statuses.
      
      Clarified naming of functions (transfer/report failure instead of bare
      \"failure\").
      
      Updated garbage collector for new archive job statuses.
      
      Added support for report retries and batch reporting in the scheduler
      database.
      
      Updated obsolete wording in MigrationReportPacker log messages and error
      counts.
      0a9d8429
    • Eric Cano's avatar
      c779aa29
    • Eric Cano's avatar
      Generalized queue type to "ToTransfer" "ToReport" "Failed". · 158d52ed
      Eric Cano authored
      "ToTransfer" are to be picked up by tape sessions.
      "ToReport" Includes both successes and failures to report, as the mechanism to report is the same.
         They will be handled by the reporter, which shares the single thread of the garbage collector.
      "Failed" Will be a (possibly non-queue) container which will contain the failed requests. The operators
         will be able to examine, relaunch or abandon those requests.
      
      The states and lifecycles of the requests have been reworked to reflect this lifecycle too.
      The container algorithmes have been adapted to handle the multiple queue/container types.
      158d52ed
    • Eric Cano's avatar
    • Eric Cano's avatar
  24. Aug 17, 2018
  25. Aug 10, 2018
Loading