- 20 Oct, 2017 2 commits
- 19 Oct, 2017 1 commit
-
-
Eric Cano authored
The missing unwatch fix should improve performance of watch/notify based locking significantly. Instrumentation will log any call to rados longer that 1s to /var/tmp/cta-rados-slow-calls.log. Also prepared a structure to allow switching between watch/notify and backoff based locking. Backoff code is not yet brought back (will test with the unwatch fix first).
-
- 16 Oct, 2017 1 commit
-
-
Eric Cano authored
The unwatch step is pretty slow, so the notification structure is now in a seprate internal object, which is left to be deleted by the callback of aio_unwatch. We need to keep the structure around for that time as notifications could still arrive between the call to aio_unwatch and the actual unwatching.
-
- 08 Oct, 2017 4 commits
-
-
Eric Cano authored
-
Eric Cano authored
This happens when the notification is received multiple times. This did not show before full scale test.
-
Eric Cano authored
The locking is now handled in a single function. This functions is also used in the asynchronous operations. This operations have been simplified in this operation: we now do a synchronous stat after the locking. Previously stat was the first asynchronous operation. This will slightly slow down the async operations (for the gain of code simplicity).
-
Eric Cano authored
This commit only replaces backoff with notifications in exclusive locks, for validation with a multithreaded unit test. This proof of concept is considered as working, as the measured gap between a lock release and a lock take never exceeds ~1/4 of a second, with a typical release-lock gap in the few ms to few tens of ms.
-
- 30 Sep, 2017 1 commit
-
-
Eric Cano authored
We do not do a stat before reading. Instead we ask for an arbitrarily big read, and find out the size of the data while reading. This avoids a race condition in lockfree reads where we failed to get the full object if it got re-written to a bigger size between stat an read.
-
- 25 Sep, 2017 1 commit
-
-
Eric Cano authored
This would detect instances where rados reports an object deletion before is actually happens.
-
- 20 Sep, 2017 1 commit
-
-
Eric Cano authored
The updating of statistic in the objecttore (in drive register) could wait for a lock for so long that the session get killed due to a heatbeat timeout. To avoid this, we use lockfree access as much as possible, and when locking it srequired, do it with a timeout. The timeout will be logged as a warning. Added timeouts to synchronous locking primitives in the object store backends to implement this feature. Fixed a bug where the VFS backend always used exclusive locks.
-
- 05 Sep, 2017 2 commits
-
-
Julien Leduc authored
change.
-
Julien Leduc authored
-
- 16 Aug, 2017 1 commit
-
-
Victor Kotlyar authored
Proceed all successful reports asynchronously and periodically check and clear statuses if they have finished. In the end of session do the check/flush for all reports in the successful reports queue. Switch from synchronous rados remove to async aio_remove in case of zero size object in BackendRados::AsyncUpdater
-
- 09 Aug, 2017 1 commit
-
-
Victor Kotlyar authored
Changed reporting to the Catalog with a batch of written files. Changed synchronous reporting to the backend job by job to the asynchronous reporting for batch of jobs. Changed synchronous reporting to the EOS mgm to the asynchronous reporting.
-
- 04 Aug, 2017 1 commit
-
-
Eric Cano authored
Some cases of operations taking over a minute were seen practice.
-
- 03 Aug, 2017 1 commit
-
-
Eric Cano authored
-
- 30 Jul, 2017 1 commit
-
-
Eric Cano authored
The locks in Rados have timeouts. They are needed in case a locker process dies without releasing its lock. As we have some contention in heavily loaded situations, it can happen that a process is till accessing objects while the lock is expired. To lessen the likeliness of this situation, the timeout has been increased from 10s to 60s. The backoff was ajusted using the MultithreadLockingInterface unit test, with printouts allowing to visually see the effect of the backoff strategy. The printouts are committed, but they are commented out. The same unit test was fized as it used to create an empty object, which is not supported anymore in order to be able to detect locking of non-existing objects (lock creates the object, but we detect non-existence as it is empty and re-delete it. This mechanism of empty object locking detection is also added to the async update of object as it was missing there (and the backoff has been added there too).
-
- 28 Jul, 2017 1 commit
-
-
Eric Cano authored
Name of object was already present in some errors but not all.
-
- 26 Jul, 2017 1 commit
-
-
Eric Cano authored
As rados re-creates an object when trying to lock it, we tested for presence before locking. This is racy as object could be deleted in the mean time. Instead, we now lock blindly and delete the object if we find it having a zero-size. As we own the lock, this is safe. This problem led to issues in garbage colector, where agent gets polled while it could disappear.
-
- 23 May, 2017 1 commit
-
-
Eric Cano authored
This uncovered several failures to fetch again after relocking objects in unit tests (fixed as well).
-
- 19 May, 2017 1 commit
-
-
Eric Cano authored
Fixed calls to promise::get_future() after possible access form other thread. They are now guaranteed to happen before. Added helgrind annotations for promise based synchronisation. Added macros enabling helgrind annotations for shared_ptr. Added suppression for shared_ptr used inside other standard lib object and not covered by the previous macros. Added unit test for lower level . Added suppressions for reported race conditions in Rados library. Review heavily MemArchiveQueue and fixed missing commit in object store, leading to potentially orphaned objects. Enabled formerly disabled test as it is now fast enough.
-
- 09 May, 2017 1 commit
-
-
Eric Cano authored
BackendVFS now throws the same exceptions as BackendRados. The exceptions of the user provided update callback are now passed through. A unit test was added.
-
- 28 Apr, 2017 2 commits
- 13 Apr, 2017 1 commit
-
-
Eric Cano authored
-
- 02 Dec, 2016 1 commit
-
-
Eric Cano authored
-
- 30 Nov, 2016 2 commits
- 18 Nov, 2015 1 commit
-
-
Eric Cano authored
Added url style conversion of checksums Added support for checksums in mockNs Fixed support for checksums in the scheduler Re-instated several ASSERT_NO_THROW which were commented out during debugging.
-
- 05 Nov, 2015 2 commits
-
-
Eric Cano authored
Added tolerance to BackendRados::ScopedLock::release() when releasing a deleted object's lock. Switched back to old style m_radosCtx.objects_begin() for compatibility with older releases of ceph.
-
Eric Cano authored
Added a list method for the backends so we can clean up the rados store before running each unit test, which expects a fresh store. Added a new command line tool to list the contents of the backends.
-
- 04 Nov, 2015 1 commit
-
-
Eric Cano authored
New URLs are rados://user@pool and file:///tmp/XXXXX. Otherwise, we fall back to interpreting the URL as a bare path.
-
- 01 Jun, 2015 1 commit
-
-
Steven Murray authored
-
- 28 May, 2015 1 commit
-
-
Eric Cano authored
Created a richer protocol buffer RootEntry object and created a parital implementation to allow unit tests to run.
-
- 20 May, 2015 1 commit
-
-
Steven Murray authored
-
- 13 May, 2015 1 commit
-
-
Eric Cano authored
middletier, itself separated in interface, SQLite, objectstore and shared tests. Moved all utilities (exceptions, threading...) to a shared utility directory. Created a single, shared unit test from all the scattered ones.
-
- 08 May, 2015 1 commit
-
-
Steven Murray authored
License comment is no longer filtered out by doxygen when generating web pages of the individual source files.
-
- 07 May, 2015 1 commit
-
-
Steven Murray authored
-
- 29 Apr, 2015 1 commit
-
-
Steven Murray authored
-