- 13 Jun, 2013 1 commit
-
-
Sebastien Ponce authored
-
- 12 Jun, 2013 1 commit
-
-
Sebastien Ponce authored
This includes : - merge of d2dtransfer into diskmanagerd - drop of the WAITDISKTODISKCOPY state in DiskCopy, and of the diskcopies for ongoing replication in general. Now DiskCopies are only created at the end of the replication, when they actually exist on disk - drop of the StageReplicaRequest table and concept - introduction of the DiskToDiskCopyJob table, holding the list of ongoing disk to disk copies - a split of transferToSchedule into userTransferToSchedule and D2dTransferToSchedule - an according split in the dispatcher part of the transfer manager to call both methods. Note that with this commit, both methods fill the same FIFO queue of transfers and both take everything they can find without any throttling. At this stage, the draining facility is broken.
-
- 28 Mar, 2013 1 commit
-
-
Giuseppe Lo Presti authored
-
- 21 Mar, 2013 1 commit
-
-
Sebastien Ponce authored
-
- 19 Mar, 2013 1 commit
-
-
Sebastien Ponce authored
At the same time the use of a ResourceHelper has been dropped and the d2dtransfer and stageJob modified to take directly the diskserver and filesystem on the command line, rather than getting it from a file given on the command line.
-
- 10 Dec, 2012 2 commits
-
-
Sebastien Ponce authored
integration of the monitoring of diskserver statuses in the diskmanagerd/transfermanagerd infrastrucutre. In particular rmmasterd, rmnoded and all the shared memory infrastructure have now disappeared
-
Sebastien Ponce authored
-
- 25 Oct, 2012 1 commit
-
-
Giuseppe Lo Presti authored
essentially, fake entries with None are used to detect that the thread has to stop.
-
- 08 May, 2012 1 commit
-
-
Sebastien Ponce authored
Fixed sr #128448: Hanging transfers in listtransfers (c2cms/t0temp). Basically the problem was due to disk to disk copies that could not find any destination
-
- 16 Sep, 2011 1 commit
-
-
Sebastien Ponce authored
-
- 15 Sep, 2011 1 commit
-
-
Sebastien Ponce authored
-
- 09 Sep, 2011 1 commit
-
-
Sebastien Ponce authored
Fixed bug #86557: in case of failure, the dispatcher module of the transfermanager creates inconsistencies. It actually called serverqueue.remove with wrong arguments
-
- 28 Jul, 2011 2 commits
-
-
Sebastien Ponce authored
-
Sebastien Ponce authored
Before this fix, machines where a job could not be submitted (e.g. because of a timeout) were not removed from the server side scheduling queue. The number of unique pending jobs, derived from this queue was thus wrong. On top, jobs that would finally fail to start (e.g. because all machines that accepted it are too full) would not answer their submitter, as the server was believing that other machines were still trying to schedule them.
-
- 20 May, 2011 1 commit
-
-
Sebastien Ponce authored
-
- 12 May, 2011 1 commit
-
-
Sebastien Ponce authored
Avoided the use of timeout gets on queues for efficiency reasons. Also cleaned up the termination code
-
- 21 Apr, 2011 1 commit
-
-
Sebastien Ponce authored
-
- 15 Apr, 2011 3 commits
-
-
Sebastien Ponce authored
-
Sebastien Ponce authored
Limit the length of the internal queue between the DB extraction and the actual dispatching to 2x the number of workers. Also take care that the queue is emptied when we exit gracefully
-
Sebastien Ponce authored
Do not use intermediate status BEINGSCHED for subrequests. It was only present for the cases where LSF was losing jobs. This is no more hapenning, and would it happen, the synchronization would take care of it
-
- 05 Apr, 2011 1 commit
-
-
Dennis Waldron authored
Fixed: Stuck restarts of the diskmanager daemon caused by missing 'detach_process=True' option when daemonizing. Standarized logging parameters Added 'TransferManager Daemon started' and 'DiskManager Daemon started' log messages
-
- 04 Apr, 2011 1 commit
-
-
Sebastien Ponce authored
-
- 31 Mar, 2011 1 commit
-
-
Sebastien Ponce authored
-
- 29 Mar, 2011 2 commits
-
-
Sebastien Ponce authored
-
Sebastien Ponce authored
- fixed the hosts used by command lines to connect to the transfer manager daemon (they were wrongly using the jobmanager host) - better exception handling in case of entry not found in the config file - avoid using JobManager entries of the config file. The involved entries have been duplicated so that the cleanup is easier when the jobManager goes
-
- 28 Mar, 2011 1 commit
-
-
Dennis Waldron authored
- Fixed: (ERRORS) Undefined variable 'reqdiskPool' should by reqdiskpool - Fixed: (ERRORS) Undefined variable 'transferId' should be transferid - Fixed: Bad indentation - Fixed: More than one statement on a single line - Fixed: Comma not followed by a spaces - Fixed: Unnecessary semicolons
-
- 25 Mar, 2011 2 commits
-
-
Sebastien Ponce authored
Many small improvements and fixes to the transfer manager and tools around it. Most credits for finding the problems go to Dennis. - fixed header alignments in listtransfers - fixed case of keys in logs - set JobId to "unused for TM" in the stagerJob log - create the temporary files of the scheduling read only - show fileid in listtransfers - record the request id in all log messages where it makes sense - fixed rebuilding of diskserver manager queue when it restarts - handled properly log values contaning spaces - fixed too zealous retries to schedule d2d destination when source is not ready - improved drain mode, so that the transfer manager stops running when activity is over - fixed automatic reload for all parameters (some were cached and thus not reloaded properly) - reenabled active canceling of jobs on diskservers that did not start it first (was lost in some bad merge)
-
Sebastien Ponce authored
-
- 22 Mar, 2011 11 commits
-
-
Sebastien Ponce authored
-
Sebastien Ponce authored
- lot's of renaming. jobs are now transfers, client commands are named listtransfers, killtransfers and draintransfers, daemons are called diskmanagerd and transfermanagerd, packages are renamed accordingly, ... - a drain mode has been added to the transferdaemon to ease the retirement of machines running it - ports used are not colliding anymore with LSF so that we can let LSF run and only stop/start the jobmanagerd and the transfermanagerd when we switch from one mode to the other and back - the badmin command has dropped in favour of automatic, regular reload of all configuration files - client commands have been extended to allow restriction to a given diskpool and user - the listtransfer commands has been extended to display more parameters, depending on options (e.g. number of unique pending jobs in a pool, per protocols values) - reconnection to the ORACLE DB have been fixed - DLF logging has been fixed (both the wrapping of it through enums and the insertion of new facilities in the facility table)
-
Sebastien Ponce authored
-
Sebastien Ponce authored
-
Sebastien Ponce authored
-
Sebastien Ponce authored
-
Sebastien Ponce authored
- support for abort, with fast cancelation of the job in the scheduling queues - synchronization from the DB to the scheduling system : regular checks that DB jobs that are pending/running for more than 1h are still existing in the scheudler - handling of coredumping/killed stagerJobs. The DB will get updated by the disk server manager itself - timeout for pending jobs - check of available space before running jobs that write to the filesystem - automatic killing of jobs when resources disappear On top, the python CastorConf class has a new method allowing to retrieve typed values.
-
Sebastien Ponce authored
-
Sebastien Ponce authored
-
Sebastien Ponce authored
-
Sebastien Ponce authored
-