Commit af433bbd authored by Giuseppe Lo Presti's avatar Giuseppe Lo Presti
Browse files

[migration] Completed #644:

- Added tool to drop from the CTA catalogue all CASTOR tapes from a given tapepool.
  This tool makes use of a temporary table and an additional PL/SQL procedure
  in order to perform efficient bulk operations in the CTA catalogue.
- Made tapepool insertion idempotent.
- Updated documentation.
parent 775e36bd
......@@ -242,6 +242,7 @@ void DropSchemaCmd::dropOracleCatalogueSchema(rdbms::Conn &conn) {
"TAPE",
"TEMP_TAPE_FILE_BATCH",
"TEMP_TAPE_FILE_INSERTION_BATCH",
"TEMP_REMOVE_CASTOR_METADATA",
"REQUESTER_MOUNT_RULE",
"REQUESTER_GROUP_MOUNT_RULE",
"ADMIN_USER",
......
......@@ -33,3 +33,7 @@ CREATE GLOBAL TEMPORARY TABLE TEMP_TAPE_FILE_INSERTION_BATCH(
)
ON COMMIT DELETE ROWS;
CREATE INDEX TEMP_T_F_I_B_ARCHIVE_FILE_ID_I ON TEMP_TAPE_FILE_INSERTION_BATCH(ARCHIVE_FILE_ID);
CREATE GLOBAL TEMPORARY TABLE TEMP_REMOVE_CASTOR_METADATA(
ARCHIVE_FILE_ID UINT64TYPE
)
ON COMMIT DELETE ROWS;
......@@ -318,9 +318,10 @@ directory metadata into the EOS namespace.
%attr(0755,root,root) %{_bindir}/json-pretty-print.sh
%attr(0644,root,root) %{_bindir}/begin_vo_export_to_cta.sh
%attr(0644,root,root) %{_bindir}/export_production_tapepool_to_cta.sh
%attr(0644,root,root) %{_bindir}/vmgr_reenable_tapepool.sh
%attr(0755,root,root) %{_bindir}/tapepool_castor_to_cta.py
%attr(0755,root,root) %{_bindir}/complete_tapepool_export.py
%attr(0644,root,root) %{_bindir}/vmgr_reenable_tapepool.sh
%attr(0644,root,root) %{_bindir}/cta-catalogue-remove-castor-tapes.py
%attr(0644,root,root) %config(noreplace) %{_sysconfdir}/cta/castor-migration.conf.example
%package -n cta-rmcd
......
......@@ -3,17 +3,16 @@ CASTOR to CTA migration tools
The metadata migration from CASTOR to CTA involves two main parts:
1. Populate the CTA Oracle catalogue with all tape-related metadata, consolidating the content of the CASTOR Nameserver and VMGR Oracle databases.
2. Populate the namespace of the embedded CTA EOS instance with all files' and directories' metadata.
2. Populate the namespace of the embedded EOS CTA instance with all files' and directories' metadata.
The tools to perform the metadata migration from CASTOR to CTA are as follows:
* `eos-import-dirs`: imports a CASTOR directory tree and injects it in the EOS namespace.
* `exporttapepool.sh`: wrapper script to migrate all files belonging to a given tapepool, checking that they are not used in CASTOR and taking care of disabling them in CASTOR. This scripts supports a dry-run mode with the `-d` option. The script internally includes the following commands:
* `export_production_tapepool_to_cta.sh`: wrapper bash script to migrate all files belonging to a given tapepool, checking that they are not used in CASTOR and taking care of disabling them in CASTOR. This script supports a dry-run mode, where "dry" is to be intended for CASTOR only. The script internally calls the following commands:
* `tapepool_castor_to_cta.py`: migrates all files belonging to a given tapepool to CTA. `ARCHIVED` tapes are not migrated, `READONLY` and `DISABLED` ones are. This tool creates an intermediate table for the files, consumed by `eos-import-files`.
* `eos-import-dirs --delta`: imports any additional/missing directory that was not imported by the first round of `eos-import-dirs`.
* `eos-import-files`: reads all file-related metadata produced by the above tool and injects it to the EOS namespace.
* `eos-import-files`: reads all file-related metadata from the previously created intermediate table and injects it to the EOS namespace.
* `complete_tapepool_export.py`: terminates an ongoing tapepool export.
* `undoexporttapepool.sh`: to revert in CASTOR the export of a tapepool, which had been successfully exported to CTA. Note that "dry-run" exports are not considered here as CASTOR is not modified in such cases.
The tools are designed to work as follows, for a given VO:
......@@ -30,3 +29,7 @@ To be noted that the list of "relevant" directories for a given VO is to be prov
3. In case of errors, `exporttapepool.sh` stops and the operator is expected to fix the case and rerun the export. Errors are accumulated in suitable Oracle tables both for the database migration and the EOS namespace injection.
In addition, the following tools are provided, which can be used as part of a restore/recovery procedure. Such procedure is deliberately **not** fully automated nor complete and will have to be dealt with on a case by case basis.
* `vmgr_reenable_tapepool.sh`: reverts in CASTOR the export of a tapepool, which had been successfully exported to CTA. All related tapes are marked `FULL`.
* `cta-catalogue-remove-castor-tapes.py`: removes in the CTA catalogue all metadata related to CASTOR tapes for a given tapepool. If additional CTA tapes were added to the tapepool, they are left in the catalogue.
......@@ -17,7 +17,8 @@
install(FILES
${CMAKE_SOURCE_DIR}/migration/castor/begin_vo_export_to_cta.sh
${CMAKE_SOURCE_DIR}/migration/castor/export_production_tapepool_to_cta.sh
${CMAKE_SOURCE_DIR}/migration/castor/vmgr_reenable_tapepool.sh
${CMAKE_SOURCE_DIR}/migration/castor/tapepool_castor_to_cta.py
${CMAKE_SOURCE_DIR}/migration/castor/complete_tapepool_export.py
${CMAKE_SOURCE_DIR}/migration/castor/vmgr_reenable_tapepool.sh
${CMAKE_SOURCE_DIR}/migration/castor/cta-catalogue-remove-castor-tapes.py
DESTINATION usr/bin)
#!/usr/bin/python
#/******************************************************************************
# * cta-catalogue-remove-castor-tapes.py
# *
# * This file is part of the Castor/CTA project.
# * See http://cern.ch/castor and http://cern.ch/eoscta
# * Copyright (C) 2019 CERN
# *
# * This program is free software; you can redistribute it and/or
# * modify it under the terms of the GNU General Public License
# * as published by the Free Software Foundation; either version 2
# * of the License, or (at your option) any later version.
# * This program is distributed in the hope that it will be useful,
# * but WITHOUT ANY WARRANTY; without even the implied warranty of
# * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# * GNU General Public License for more details.
# * You should have received a copy of the GNU General Public License
# * along with this program; if not, write to the Free Software
# * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
# *
# * @author Castor Dev team, castor-dev@cern.ch
# *****************************************************************************/
'''command line tool to remove all CASTOR-imported tapes from a CTA tapepool'''
import sys
import getopt
from time import sleep, time
from datetime import datetime
from threading import Thread
import castor_tools
def usage(exitcode):
'''prints usage'''
print 'Usage : ' + sys.argv[0] + ' [-h|--help] -t|--tapepool <tapepool>'
sys.exit(exitcode)
def connectToCTA():
'''Connects to the CTA catalogue database, cf. castor_tools'''
user, passwd, dbname = castor_tools.getNSDBConnectParam('CTACONFIG')
return castor_tools.connectToDB(user, passwd, dbname, '0.0', enforceCheck=False)
def async_remove_castor_tapes(conn, tapepool):
'''helper function to execute the export process in a separate thread'''
cur = conn.cursor()
cur.execute('ALTER SESSION ENABLE PARALLEL QUERY')
cur = conn.cursor()
cur.execute('ALTER SESSION ENABLE PARALLEL DML')
cur = conn.cursor()
cur.execute('BEGIN removeCASTORMetadata(:tapepool); END;', tapepool=tapepool)
def run():
'''main code'''
tapepool = None
# first parse the options
try:
options, _ = getopt.getopt(sys.argv[1:], 'ht:', ['help', 'tapepool='])
except Exception as e:
print(e)
usage(1)
for f, v in options:
if f == '-h' or f == '--help':
usage(0)
elif f == '-t' or f == '--tapepool':
tapepool = v
else:
print 'Unknown option: ' + f
usage(1)
# deal with arguments
if not tapepool:
print 'Missing argument(s)'
usage(1)
try:
# connect to the CTA catalogue and execute the async_remove_castor_tapes function on a separate thread, to be able to babysit it
ctaconn_async = connectToCTA()
runner = Thread(target=async_remove_castor_tapes, args=[ctaconn_async, tapepool])
runner.start()
# at the same time, connect again to the Nameserver for monitoring the process
sleep(1)
ctaconn = connectToCTA()
cur = ctaconn.cursor()
querylog = '''
SELECT timestamp, message FROM CTAMigrationLog
WHERE tapepool = :tapepool AND timestamp > :t ORDER BY timestamp ASC
'''
# poll the NS database for logs about the ongoing execution
t = time() - 24*3600
lastprinttime = time()
while True:
cur.execute(querylog, tapepool=tapepool, t=t)
rows = cur.fetchall()
if rows:
t = rows[-1][0]
if 'Removal of CASTOR tapes metadata completed successfully' in rows[-1][1]:
print datetime.fromtimestamp(int(rows[-1][0])).isoformat(), ' ', rows[-1][1]
# export is over, terminate
break
# exit also in case of premature termination
if not runner.isAlive():
break
# no news, keep printing something every minute
if time() - lastprinttime > 60:
lastprinttime = time()
for r in rows:
print datetime.fromtimestamp(int(r[0])).isoformat(), ' ', r[1]
if not rows:
print datetime.now().isoformat().split('.')[0], ' .'
sleep(5)
# that ought to be immediate now
runner.join()
# close DB connections
castor_tools.disconnectDB(ctaconn_async)
castor_tools.disconnectDB(ctaconn)
except Exception as e:
print(e)
import traceback
traceback.print_exc()
sys.exit(-1)
if __name__ == '__main__':
run()
......@@ -44,6 +44,9 @@ CREATE OR REPLACE SYNONYM Vmgr_tape_dgnmap FOR &vmgrSchema..Vmgr_tape_dgnmap;
ALTER TABLE Archive_File PARALLEL;
ALTER TABLE Tape_File PARALLEL;
-- Used by removeCASTORMetadata
CREATE OR REPLACE TYPE NUMLIST IS TABLE OF INTEGER;
-- Import a tapepool and its tapes from CASTOR
-- Raises constraint_violation if the tapepool and/or some tapes were already imported
......@@ -71,24 +74,31 @@ BEGIN
END;
END LOOP;
SELECT
TAPE_POOL_ID_SEQ.NEXTVAL INTO varTapePoolId
FROM
DUAL;
INSERT INTO Tape_Pool (tape_pool_id, tape_pool_name, vo, nb_partial_tapes, is_encrypted, user_comment,
creation_log_user_name, creation_log_host_name, creation_log_time, last_update_user_name,
last_update_host_name, last_update_time)
VALUES (
varTapePoolId,
varTapePoolName,
inVO,
0, -- nb_partial_tapes, to be filled afterwards
'0', -- is_encrypted is assumed false in CASTOR
'Imported from CASTOR',
'CASTOR', 'CASTOR', getTime(),
'CASTOR', 'CASTOR', getTime()
);
BEGIN
SELECT TAPE_POOL_ID_SEQ.NEXTVAL INTO varTapePoolId FROM Dual;
INSERT INTO Tape_Pool (tape_pool_id, tape_pool_name, vo, nb_partial_tapes, is_encrypted, user_comment,
creation_log_user_name, creation_log_host_name, creation_log_time, last_update_user_name,
last_update_host_name, last_update_time)
VALUES (
varTapePoolId,
varTapePoolName,
inVO,
0, -- nb_partial_tapes, to be filled afterwards
'0', -- is_encrypted is assumed false in CASTOR
'Imported from CASTOR',
'CASTOR', 'CASTOR', getTime(),
'CASTOR', 'CASTOR', getTime()
);
EXCEPTION WHEN CONSTRAINT_VIOLATED THEN
-- The TapePool is already present, typically because of a previous import: override some values
UPDATE Tape_Pool SET
vo = inVO,
user_comment = 'Re-imported from CASTOR',
last_update_user_name = 'CASTOR',
last_update_host_name = 'CASTOR',
last_update_time = getTime()
WHERE tape_pool_name = varTapePoolName;
END;
FOR T in (SELECT TI.vid, TI.density, TI.manufacturer, DGN.dgn, TS.status, TS.nbfiles,
TI.rcount, TI.wcount, TI.rhost, TI.whost, TI.rtime, TI.wtime
FROM Vmgr_tape_info TI, Vmgr_tape_side TS, Vmgr_tape_dgnmap DGN
......@@ -126,7 +136,7 @@ BEGIN
'12TC', 12000000000000,
'15TC', 15000000000000,
0),
0, -- total data: will be filled by populateCTAFromCASTOR()
0, -- total data: will be filled by populateCTAFilesFromCASTOR()
T.nbfiles,
decode(BITAND(T.status, 1), 1, '1', '0'), -- DISABLED flag
decode(BITAND(T.status, 8), 8, '1', '0'), -- FULL flag
......@@ -147,7 +157,7 @@ END;
-- Insert the file-level metadata for the given migration
CREATE OR REPLACE PROCEDURE populateCTAFromCASTOR(inEOSCTAInstance VARCHAR2, inTapePool VARCHAR2) AS
CREATE OR REPLACE PROCEDURE populateCTAFilesFromCASTOR(inEOSCTAInstance VARCHAR2, inTapePool VARCHAR2) AS
nbPreviousErrors INTEGER;
nbMissingImports INTEGER;
CONSTRAINT_VIOLATED EXCEPTION;
......@@ -240,10 +250,10 @@ END;
-- Mark tapes as exported in CASTOR VMGR
CREATE OR REPLACE PROCEDURE markTapePoolExported(inTapePool VARCHAR2) AS
CREATE OR REPLACE PROCEDURE markCASTORTapePoolExported(inTapePool VARCHAR2) AS
BEGIN
UPDATE Vmgr_tape_side
SET status = status + 2 - BITAND(status, 2) -- as BITOR does not exist
SET status = status + 2 - BITAND(status, 2) -- read status = BITOR(status, 2), but BITOR does not exist
WHERE poolName = inTapePool;
COMMIT;
CNS_ctaLog(inTapePool, 'VMGR Tapes marked as EXPORTED');
......@@ -259,7 +269,7 @@ BEGIN
SELECT COUNT(*) INTO nbFiles FROM CNS_CTAFilesHelper;
IF nbFiles > 0 THEN
raise_application_error(-20000, 'Another export of ' || nbFiles || ' files to CTA is ongoing, ' ||
'please terminate it with completeCTAExport() before starting a new one.');
'please terminate it with complete_tapepool_export.py before starting a new one.');
END IF;
IF inDryRun = 0 THEN
CNS_ctaLog(inTapePool, 'CASTOR metadata import started');
......@@ -272,10 +282,10 @@ BEGIN
-- import tapes; can raise exceptions
importTapePool(inTapePool, inVO);
-- import metadata into the CTA catalogue
populateCTAFromCASTOR(inEOSCTAInstance, inTapePool);
populateCTAFilesFromCASTOR(inEOSCTAInstance, inTapePool);
IF inDryRun = 0 THEN
-- mark tapes as exported, only when executed for real
markTapePoolExported(inTapePool);
markCASTORTapePoolExported(inTapePool);
CNS_ctaLog(inTapePool, 'CASTOR metadata import completed successfully');
ELSE
CNS_ctaLog(inTapePool, 'CASTOR metadata import completed successfully [dry-run mode]');
......@@ -289,6 +299,42 @@ END;
/
-- Entry point to remove the CASTOR imported metadata from the CTA catalogue
CREATE OR REPLACE PROCEDURE removeCASTORMetadata(inTapePool VARCHAR2) AS
nb INTEGER;
CURSOR c IS SELECT archive_file_id FROM Temp_Remove_CASTOR_Metadata;
ids numList;
BEGIN
SELECT COUNT(*) INTO nb FROM Tape WHERE tape_pool_name = inTapePool AND is_from_castor = '1';
IF nb = 0 THEN
raise_application_error(-20000, 'No CASTOR tapes found or no such tape pool');
END IF;
-- prepare the list of files to be removed in a temporary table
INSERT /*+ APPEND */ INTO Temp_Remove_CASTOR_Metadata
(SELECT /*+ PARALLEL(Tape_File) */ archive_file_id FROM Tape_File WHERE vid IN
(SELECT vid FROM Tape WHERE tape_pool_name = inTapePool AND is_from_castor = '1'));
SELECT COUNT(*) INTO nb FROM Temp_Remove_CASTOR_Metadata;
CNS_ctaLog(inTapePool, 'Removal of CASTOR tapes metadata started, '|| nb ||' files to go');
-- efficiently delete all Tape_File and Archive_File entries in multiple bulks
OPEN c;
LOOP
FETCH c BULK COLLECT INTO ids LIMIT 10000;
EXIT WHEN ids.count = 0;
FORALL i IN 1..ids.count
DELETE FROM Tape_File WHERE archive_file_id = ids(i);
FORALL i IN 1..ids.count
DELETE FROM Archive_File WHERE archive_file_id = ids(i);
END LOOP;
CLOSE c;
-- delete all CASTOR tapes but leave the Tape pool in the system
DELETE FROM Tape WHERE tape_pool_name = inTapePool AND is_from_castor = '1';
-- commit the entire operation: this will clean the temporary table
COMMIT;
CNS_ctaLog(inTapePool, 'Removal of CASTOR tapes metadata completed successfully');
END;
/
-- The following is to be executed at schema creation or before the first migration
BEGIN
dbms_errlog.create_error_log(dml_table_name => 'Tape_File');
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment