- Jul 31, 2017
-
-
Julien Leduc authored
Purging Oracle DB recycle bin in init, otherwise CI DB size explodes because of the recycle bin content.
-
- Jul 30, 2017
-
-
Eric Cano authored
The locks in Rados have timeouts. They are needed in case a locker process dies without releasing its lock. As we have some contention in heavily loaded situations, it can happen that a process is till accessing objects while the lock is expired. To lessen the likeliness of this situation, the timeout has been increased from 10s to 60s. The backoff was ajusted using the MultithreadLockingInterface unit test, with printouts allowing to visually see the effect of the backoff strategy. The printouts are committed, but they are commented out. The same unit test was fized as it used to create an empty object, which is not supported anymore in order to be able to detect locking of non-existing objects (lock creates the object, but we detect non-existence as it is empty and re-delete it. This mechanism of empty object locking detection is also added to the async update of object as it was missing there (and the backoff has been added there too).
-
- Jul 29, 2017
-
-
Eric Cano authored
Added unlocking a non-scoped lock if needed. Added more information in logs.
-
- Jul 28, 2017
-
-
Eric Cano authored
Name of object was already present in some errors but not all.
-
Victor Kotlyar authored
DriveState.
-
Eric Cano authored
- when failing to schedule. - now list which drive has an existing mount (at schedule time as well.
-
Vladimir Bahyl authored
-
Julien Leduc authored
-
Vladimir Bahyl authored
-
Vladimir Bahyl authored
-
Julien Leduc authored
Timeouting full runs after 50 minutes: 10 minutes for namespace creation and 40 minutes for the test, so that gitlab does not times it out and leaves a dirty CI runner.
-
Julien Leduc authored
Performing 100 rm in parallel for rados, this should not be painful as those synchronous rm are mostly waiting
-
Julien Leduc authored
-
Vladimir Bahyl authored
-
Vladimir Bahyl authored
XRD_TIMEOUTRESOLUTION=600 # increased from 15s
-
Julien Leduc authored
replaced eosh script with ls -y except after retrieval as archived and retrieved are the same status regarding eos... Looks like sometime ls -y determined archived files is not a growing function...
-
- Jul 27, 2017
-
-
Julien Leduc authored
client_ar.sh can now write to /eos/ctaeos/preprod with -d option, just complains: Could not remove disk replica for /eos/ctaeos/preprod/ as drop is already done in the wfe script. Should test for disk replica before trying to drop with ls -y on directory.
-
Eric Cano authored
-
Eric Cano authored
In the MemQueue, the promise for the next batch was set after the queue was committed, but before the lock was released (by the last user of the queue, through a shared pointer). This would lead to an uselessly early start of the next queue batch for writing an avoidable contention on the object store lock. This would no lead to a pile up though as only 2 thread would be contended (previous and early starting next).
-
Eric Cano authored
-
Eric Cano authored
-
Eric Cano authored
... in preparation for replacement of RetrieveMount::getNextJob().
-
Julien Leduc authored
-
Julien Leduc authored
-
Eric Cano authored
The retrieve request now gets properly queued in case of retrieve error. The errors are counted and the request gets deleted eventually. A new field was added to the retrive request in object store. This commit will fail on upgrade if there are retrieve requests still queued at update time. Cleaned up some unused structures in cta.proto Minor modifications to ArchiveJobs.
-
Victor Kotlyar authored
Converted all bytes to Mbytes. Removed extra space in the output. Reordered fields.
-
Eric Cano authored
This is a stop gap solution while we wait for efficient archive/retrieve reporting.
-
Eric Cano authored
-
- Jul 26, 2017
-
-
Eric Cano authored
This affects only unit tests as taped already relied on getNextJobBatch().
-
Vladimir Bahyl authored
-
Eric Cano authored
As rados re-creates an object when trying to lock it, we tested for presence before locking. This is racy as object could be deleted in the mean time. Instead, we now lock blindly and delete the object if we find it having a zero-size. As we own the lock, this is safe. This problem led to issues in garbage colector, where agent gets polled while it could disappear.
-
- Jul 25, 2017