Commits · 8e89401faf59937c26376efddcf93a84b2413777 · dCache / cta

Nov 28, 2022
- New states REPACKING/EXPORTED, new internal states, new maintenance runner for... · 15478f9e
  Joao Afonso authored 2 years ago
  
  New states REPACKING/EXPORTED, new internal states, new maintenance runner for cleaning-up retrieve queue requests
  View commit 15478f9e 2 tags
  
  15478f9e
Apr 29, 2022
- Resolve "Use std::optional instead of cta::optional" · eaa5170c
  Jorge Camarero Vera authored 2 years ago
  
  eaa5170c
Mar 28, 2022
- Resolve "Review software license text in CTA" · 81aee933
  Jorge Camarero Vera authored 3 years ago
  
  81aee933
Nov 12, 2021
- Resolve "Valgrind CI gets stuck after SchedulerTests" · d76be168
  Jorge Camarero Vera authored 3 years ago
  
  d76be168
Nov 09, 2021
- remove prints from objectstore/Algorithms.hpp · 5f88bce4
  mvelosob authored 3 years ago
  
  5f88bce4
Nov 08, 2021
- Resolve "ObjectStore objects in other classes." · 0c837c80
  Jorge Camarero Vera authored 3 years ago
  
  0c837c80
Oct 12, 2021

Mitigate popping bad behaviour in archive queues (#1027) · 2bec11ea

mvelosob authored 3 years ago

During the July datachallenge the archiveJobTransferForUser queue for the r_atlas_test_datachallenge
tapepool became full of ArchiveJobs whose bytes field became zero after popped from the queue.
This caused the tape servers to pop ~5TB of work from the queue. To prevent this from happening in
the future, instead of summing the sizes of the individual elements popped from the queue, we now
subtract them from the total size popped. This way, if there are popped jobs that have incorrectly
set their bytes field to zero, the algorithm will consume less data than expected, not exponencially
more

2bec11ea

Aug 02, 2021
- Solved bug of integer instead of string in protobuf variable setup · 8f84d901
  Jorge Camarero Vera authored 3 years ago
  
  8f84d901
Jun 15, 2021
- Change to 2003-2021 Castor Licenses and to 2015-2021 CTA License in C++ files · 2cf198bb
  Jorge Camarero Vera authored 3 years ago
  
  2cf198bb
Jun 02, 2021
- Replaces CASTOR licence agreement comments with those of CTA. · 541e8f4d
  Jorge Camarero Vera authored 3 years ago
  
  541e8f4d
Nov 22, 2019

Retry queueing in ArchiveQueueToReportToRepackForSuccess if... · c35bd881

Cedric CAFFY authored 5 years ago

Retry queueing in ArchiveQueueToReportToRepackForSuccess if switchElementsOwnership fails because of a rados::lockbackoff() problem

c35bd881

Jul 02, 2019
- Added some logs for Graphana plots to understand why the repack buffer explodes · f6c3139b
  Cedric CAFFY authored 5 years ago
  
  f6c3139b
Apr 15, 2019
- Disabled or made conditional some repetitive logs. · 86dc3ce8
  Eric Cano authored 5 years ago
  
  86dc3ce8
Feb 22, 2019

Created a Sorter to queue Archive or Retrieve Jobs · f30480b5

Cedric CAFFY authored 6 years ago

Queueing of Archive Jobs is done and unit tested
Queueing of Retrieve Requests is not completely done yet

f30480b5

Jan 25, 2019

Changed the status of succeeded retrieve requests as RJS_Succeeded and... · 492361a1

Cedric CAFFY authored 6 years ago

Changed the status of succeeded retrieve requests as RJS_Succeeded and inserted them to the RetrieveQueueToReportToRepackForSuccess, unit tested but memory leak

492361a1

Dec 20, 2018
- Made genereic QueueType in Queue algorithms (objectstore) · fe14d866
  Cedric CAFFY authored 6 years ago
  
  fe14d866
Dec 13, 2018
- Removed redundant statement. · 1e69a369
  Eric Cano authored 6 years ago
  
  1e69a369
- Made popping algorithms leave the loop after trimming a queue. · 1a0cf817
  Eric Cano authored 6 years ago
  
  This will prevent trim-against-create fights between 2 processes where a process trying to create a new queue to push abandons after 5 retries.
  1a0cf817
Dec 10, 2018

Implemented promotion of repack requests from Pending to ToExpand · 074bd3d5

Eric Cano authored 6 years ago

This promotion is controlled so that only a limited number a requests
are in the
state ToExpand or Starting at any point in time. This ensures both the
availabality
of repack file requests to system while preventing an explosion of file
level requests.

Created a one-round popping from the container (algorithms) with status
switching.
  - Used for repack requests switching from pendig to to expand

Added ElementStatus to algorithms.

Implemented promotion interface in Scheduler and OstoreDb. The actual
decision is taken at
the Scheduler level. The function itself is called by the
RepackRequestManager.

Promotion is tested in a unit test.

Various code maintenance:
Switched to "using"-based constructor inheritance.
Fixed privacy of function in cta::range.

074bd3d5

Oct 01, 2018
- Improved comments and code. · de1aeb02
  Eric Cano authored 6 years ago
  
  de1aeb02
- Renamed QueueType to JobQueueType. Created RepackQueueType. · f60aafd8
  Eric Cano authored 6 years ago
  
  f60aafd8
Sep 07, 2018
- [os-generic-queues] Clean and tidy some mess left from merge · 4d416fb5
  Michael Davis authored 6 years ago
  
  4d416fb5
Sep 05, 2018
- [os-generic-queues] Adds partial specialisation for Archive and Retrieve queue types · 53588382
  Michael Davis authored 6 years ago
  
  53588382
Sep 03, 2018
- [os-generic-queues] Adds extra template parameter · c3d1753a
  Michael Davis authored 6 years ago
  
  No inheritance, instead have partial or full specialization based on two template parameters.
  c3d1753a
- [os-generic-queues] Adds in a missing RQ method · b00f65d3
  Michael Davis authored 6 years ago
  
  b00f65d3
Aug 30, 2018

Reworked ArchiveRequest jobs lifecycles. · 0a9d8429

Eric Cano authored 6 years ago

Changed the lifecycle of the ArchiveRequest to handle the various
combinations of several jobs and their respective success/failures.
Most notably, the request now holds a reportdecided boolan, which
is set when decing to report. This happens when failing to archive
one copy (first failure), or when all copies are transferred (success
for all copies).

Added support for in-mount retries. On falure, the job will be
requeued (with a chance to pick it up again) in the same session
if sane session retries are not exceeded. Otherwise, the job is
left owned by the session, to be picked up by the garbage collector
at tape unmount.

Made disk reporter generic, dealing with both success and failure.
Improved mount policy support fir queueing.

Expanded information avaible in popped element from archive queues.

Added optional parameters to ArchiveRequest::asyncUpdateJobOwner() to
cover various cases.

Updated the archive job statuses.

Clarified naming of functions (transfer/report failure instead of bare
\"failure\").

Updated garbage collector for new archive job statuses.

Added support for report retries and batch reporting in the scheduler
database.

Updated obsolete wording in MigrationReportPacker log messages and error
counts.

0a9d8429

Implemented OStoreDB::getNextArchiveJobsToReportBatch(). · c779aa29
Eric Cano authored 6 years ago

c779aa29

Generalized queue type to "ToTransfer" "ToReport" "Failed". · 158d52ed

Eric Cano authored 6 years ago

"ToTransfer" are to be picked up by tape sessions.
"ToReport" Includes both successes and failures to report, as the mechanism to report is the same.
They will be handled by the reporter, which shares the single thread of the garbage collector.
"Failed" Will be a (possibly non-queue) container which will contain the failed requests. The operators
will be able to examine, relaunch or abandon those requests.

The states and lifecycles of the requests have been reworked to reflect this lifecycle too.
The container algorithmes have been adapted to handle the multiple queue/container types.

158d52ed

Moved archive requeues garbage collection to generic algorithms. · 044001f8
Eric Cano authored 6 years ago

044001f8
Moved ArchiveMount::getNextJobsBatch to generic algoritms. · f6b0786b
Eric Cano authored 6 years ago

f6b0786b

Aug 17, 2018
- [os-generic-queues] Implements Retrieve referenceAndSwitchOwnershipIfNecessary() · ee6c61df
  Michael Davis authored 6 years ago
  
  ee6c61df
Aug 10, 2018
- [os-generic-queues] Fixes typo · da1b784c
  Michael Davis authored 6 years ago
  
  da1b784c
- [os-generic-queues] Implements retrieve algorithms test · 4c662905
  Michael Davis authored 6 years ago
  
  4c662905
- [os-generic-queues] Implements RetrieveQueue::switchElementsOwnership() · 53188198
  Michael Davis authored 6 years ago
  
  53188198
- [os-generic-queues] Adds skeleton method definitions for RetrieveQueueAlgorithms · 8ebbd146
  Michael Davis authored 6 years ago
  
  8ebbd146
- [os-generic-queues] Puts back queue specializations · 22810941
  Michael Davis authored 6 years ago
  
  22810941
- [os-generic-queues] Refactors Algorithms.hpp, it compiles again · 4fe78b26
  Michael Davis authored 6 years ago
  
  4fe78b26
- [os-generic-queues] Refactors Algorithms.hpp · f6110ff4
  Michael Davis authored 6 years ago
  
  f6110ff4
- [os-generic-queues] Creates ContainerTraitsTypes class · c2b29c46
  Michael Davis authored 6 years ago
  
  c2b29c46
- Moved archive requeues garbage collection to generic algorithms. · 03308f17
  Eric Cano authored 6 years ago
  
  03308f17