= Copying Pools with Rsync = = General = * [[#install|Software installation]] * [[#batch|Configuring as a Batch Job]] * [[#examples|Examples]] * [[PACS/RsyncAdmin|Data Pool Rsync Administration]] = Overview = Level 0 product pools are created from data queried from the ICC's Versant database and stored as product pools on a computer accessible by other internal and extern nodes, including machines at Sub-ICC institutes. The ICC, where the pools are created and maintained, runs an rsync daemon having assess to the product pools they create from their Versant database. A client-side, rsync script, '''transferPools''', is used to list and get pools. Sets of related product pools are stored as ''rsync modules'' that that have simple access names, like ILT, PV, OPS, etc. Module names may also designate just part of a mission phase, a week number for instance, so you could have PV1, PV2, and so on. Modules with smaller sets of data pools may make data pool management easier. You can use transferPools interactively or in batch mode. Interactively, you can list modules, list pools, and get a single pool, or a group of them. You can define a batch job to work on a module, or some named part of it. The batch job periodically gets new files, updates existing ones and deletes those that no longer exist on the server, giving you a mirror of what's on the server. = The transferPools Script = '''transferPools''' is a bash script that runs an rsync command. You need a Unix, Linux or Mac OS to run the script. The script uses rsync to list and transfer groups of files, typically one or a group of data pools. Running interactively from the command line, the scrip can: 1. Display command usage. 1. List rsync data pool modules names and descriptions. 1. List directories and files within a module. The view can be the entire module or just part of it. 1. Get directories and files within a module. These files are synchronized with the pool at the ICC so they're the same, including updates and deletes. You can get part off a module, a single data pool if you like. Run as a batch job, you can automate the ''get'' command. In batch mode, the script also produces daily logs so you can monitor what's happening. == Command Syntax == '''transferPools''' Display command usage and exit. '''transferPools test [[] ]''' 1. '''transferPools test''' Connects to the daemon and prints the list of modules. Use this to make sure you have to network difficulties and to make sure the daemon is running. 1. '''transferPools test [[] ]''' Creates the supporting file structure used with a CRON job. It also tests the rsync command without copying any data. See [[#cron|Creating a CRON Job]] for details. '''transferPools list [/]''' List the files with a pool. If a path is included as well, the listing is limited to those items in that path. 1. '''transferPools list''' List the data pool modules names and descriptions. 1. '''transferPools list ''' List the files and directories directly under this module. 1. '''transferPools list /''' Limit what's returned to those items at the path level. '''transferPools rlist [/]''' It works like the ''list'' option, but it recursively displays all sub-directories and files below the specified module and path. '''transferPools copy [] ''' Use this command to make a single copy of all the files in the pool, or those under the module path if it's also specified. The local path is the root directory under which sub-directories and files are written. ''Copy'' is recursive. '''transferPools [] ''' Use this command as a CRON job. Along with copying the files, it sets-up logging and blocks itself from running again if it's last invocation is still running. See [[#batch|Configuring as a Batch Job]] for details. == Command Options == You can list and copy groups of files with this script, but you can't write with them. In fact, the rsync daemon at the ICC will not allow you to write to any of the pools there. When the commands connect to the ICC rsync daemon, a password is supplied. The ''transferPool'' script uses the password found in the {{{$ICC_RSYNC_HOME/password}}} file. This file is created when you [[#install|install]] the software. The daemon only accepts commands from ''allowed'' machines and networks. If you're having connection problems, run the command {{{transferPools test}}}. If nothing happens, check that your machine is allowed to connect to the ICC daemon. The batch mode version of the command adds, updates and '''deletes''' files that are not found at the ICC. The batch command gives you a mirror of the ICC site's pools as they change with time. If you are going to alter or add files in a pool, it's best to make your own working copy. On the other hand, if you get files interactively, files are only added and updated. Files are never deleted. So the interactive commands provide your own view of a pool, while the batch command pools provide a mirror copy of the ICC's pools. If there are symbolic links ''within'' the pool directory hierarchy at the ICC, they're copied to the local pool. <> = Software Installation = 1. The ICC rsync daemon is secure, so before you begin, contact your ICC. a. You need to give them: 1. The IP addresses of the computers or subnets that will access their rsync daemon. 1. The names and passwords of users that will connect to the ICC rsync daemon. (These are only used for connecting to the daemon and should not be the same values used for local login accounts.) a. By default, the '''transferPools''' script contacts the ICC rsync daemon using TCP port 44520. If your ICC uses a different port, you need to set that number as the value of the environmental variable {{{ICC_RSYNC_PORT}}}. 1. For each computer running '''transferPools''' at your site: a. Make sure the machine has a copy of rsync by executing the command ''rsync --version''. You should see something like this: {{{ > rsync --version rsync version 2.6.3 protocol version 28 Copyright (C) 1996-2004 by Andrew Tridgell and others Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, inplace, IPv6, 64-bit system inums, 64-bit internal inums }}} 1. Create a directory named {{{icc-rsync}}}. Software, log files, and other administrative files go there. a. Save the absolute path, including ''icc-rsync'' in the environmental variable {{{ICC_RSYNC_HOME}}}. a. Include {{{ICC_RSYNC_HOME}}} in your ''PATH'' so you can run the script from anywhere. a. Download a copy of '''transferPools''' from here and place it in the {{{icc-rsync}}} directory. Change the permissions so you, or everyone in your group, can read and execute it. Here's the user/group version of the command: {{{ > chmod 0550 transferPools }}} 1. Change to the icc-rsync directory and create and edit a file named {{{password}}}. Enter the password, but not the user names, given to you by your ICC that allows you to access their rsync daemon. Use just one password per ''icc-rsync'' installation. Secure the password file. If you're the only user, the command is {{{ > chmod 0400 password }}} If your group is using the password, the command is: {{{ > chmod 0440 password }}} == Testing Your Configuration == Make sure the basic configuration works, which includes 1. Connectivity to the ICC rsync daemom a. No problems with firewalls. a. Password file in place and correct password. a. ICC daemon properly defined in the script and running at the ICC. a. Proper TCP port number used. 1. Local definition of the environmental variable ICC_RSYNC_HOME defined. 1. {{{$ICC_RSYNC_HOME}}} is included in your {{{PATH}}} environmental variable setting. To check all of this, logout, log in again, and run the following command from your home directory: {{{ > transferPools test Connecting using TCP port 44520 to nhsc@pacs1.mpe-garching.mpg.de ICC_RSYNC_HOME: /local/home/pacspools/icc-rsync Module list: pools Sub-ICC's get copies of Pacs Product Pools }}} If you dont' get something like this, go through the installation steps again. To gain a little experience, try the same command; but this time add the name of one of the modules in the list returned to you just now. For example: {{{ > transferPools test pools Connecting using TCP port 44520 to nhsc@pacs1.mpe-garching.mpg.de ICC_RSYNC_HOME: /local/home/pacspools/icc-rsync File list: drwxr-x--- 4096 2007/08/14 02:36:01 . drwxr-x--- 4096 2007/07/30 07:01:07 simple.pacs_calibration_products drwxr-x--- 4096 2007/08/16 02:28:15 simple.pacs_standard_products_fmilt drwxr-x--- 4096 2007/08/02 05:40:36 simple.standard }}} You're now set-up for interactive use to '''transferPools'''. See the [[#examples|Examples]] section to get you started. In the next section, we configure '''transferPools''' to run in batch mode. <> == Configuring transferPools as a Batch Job == Use a batch job to mirror all, or part, of an ICC data pool module. You can run the script interactively with one of the '' keyword'' commands. When you don't supply a keyword, the script is designed to run as a CRON job that executes at regular time intervals in the background. Start by setting-up the '''crontab''' entry. The syntax for a command that runs once per hour, on the hour, is: {{{ 00 * * * * transferPools [] }}} You invoke '''crontab''' as you would the '''vi''' editor and insert one or more entries. When done, exit with the command {{{:x }}}. Here's an example that runs the ''transferPools'' script once an hour, on the hour. It synchronizes your data pool set with those in the ICC rsync module. By synchronize, we mean it adds, updates and deletes files, so that you copy is the same as the one at the ICC. For the example, we'll use the ''PV'' module data pools. The last argument is the local diectory where rsync copies the module. {{{ > crontab -e 00 * * * * transferPools PV /pacs/pools/pv :x }}} Different instances of transferPools can run different modules. Suppose that PV data is separated into a sequence of modules, PV11, PV2,.... We can run an rsync instance for each module, or just some of them by defining multiple instance of the script. (A script is made unique by the module name associated with it. This is important as we'll see shortly.) In the following example the instances are started at staggered times: 0, 10 and 20 minutes past the hour. Each script runs once per hour {{{ > crontab -e 00 * * * * transferPools PV1 /pacs/pools/pv_1 10 * * * * transferPools PV2 /pacs/pools/pv_2 20 * * * * transferPools PV3 /pacs2/pools/pv_3 :x }}} You can view the commands you have in crontab with this command: {{{ > crontab -l }}} To remove the commands from crontab, use this: {{{ > crontab -r }}} Once the commands are defined in '''crontab''', your system runs them at the specified times. === Batch Job Administrative Support === When you run transferPools as a CRON job, it creates a directory with the name of the module you supplied with the command. The {{{icc-rsync}}} directory structure looks like this. {{{ icc-rsync | |-password | |- | |-logs | | | |-logDay | |-Sunday | |-Monday | |-... | |-writeLock }}} We've already described the {{{password}}} file. Each running command creates a subdirectory using the module name supplied with the command. The subdirectory contains: 1. '''logs/''' A directory where the script's log files are kept. There's a log file for each day of the week. After seven days, the oldest log file is overwritten. Each time the script is run, it makes an entry in the current day's log file. 1. '''logDay''' This is a file within the logs directory. Don't change it. It contains the name of the current day. The script uses it to determine when to switch log files. 1. '''writeLock''' This file only exist while the rsync command executes. It contains the script's PID. Another pool specific instance of the script won't start as long as the writeLock file exist. (You can also test for the existence of the file. You may not want to copy files while rsync is copying and updating files from the ICC.) '''Tip:''' If you copy pools to another location, check for the {{{writeLock}}} file before starting. While this doesn't guarantee that you won't copy files while the CRON job is running, it could start again while you're still copying, in practice it should work if you do the following: a. Check for the {{{writeLock}}} file and if not there start to copy. a. Once finished, immediately check for the {{{writeLock}}} file again. 1. If not there, you're copy is fine. 1. If it is there, wait until is disappears and then start over. === Cron Job Log Files === Each log file starts with the day's header. Each time the script runs during the day, a new, time-stamped entry is logged. Here's an example of the beginning of log file ''Friday.log'' for 24 August 2007. (The date format is: {{{YYYYMMDDThhmm}}}.) {{{ =================================================== TransferPools log file for Friday, 20070824T1639 Products located in: /pacs/PacsProductPools/pools =================================================== ***** 20070824T1639: starting rsync ***** building file list ... done sent 21 bytes received 20 bytes 82.00 bytes/sec total size is 0 speedup is 0.00 20070824T1639: PACS product pool transfer rsync error 23. ***** 20070824T1646: starting rsync ***** receiving file list ... done ./ 20070824T1646: PACS product pool transfer rsync error 20. ***** 20070824T1744: starting rsync ***** receiving file list ... done ./ simple.pacs_calibration_products/ simple.pacs_standard_products_fmilt/ simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product/ simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product.attrib simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product.meta simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product/0 }}} Each file transfered is listed in the logs. (This means the log files can grow to be quite large. They are deleted after a week, so there's a limit to the growth; but the size can still be substantial.) Error message are reported in the log and sent to the list of email addresses defined in the transferPools script, see the [[#install|Software Installation]] section for more about that. <> = Examples = Display the command syntax. {{{ > transferPools Usage: transferPools transferPools list [[/]] transferPools rlist [[/]] transferPools get [] transferPools test [] transferPools [] }}} Run the configuration test for interactive use. {{{ > transferPools test Connecting using TCP port 44520 to nhsc@pacs1.mpe-garching.mpg.de ICC_RSYNC_HOME: /local/home/pacspools/icc-rsync Module list: pools Sub-ICC's get copies of Pacs Product Pools }}} List modules. {{{ > transferPools list pools Sub-ICC's get copies of Pacs Product Pools }}} List what's in a module. {{{ > transferPools list pools drwxr-x--- 4096 2007/08/14 02:36:01 . drwxr-x--- 4096 2007/07/30 07:01:07 simple.pacs_calibration_products drwxr-x--- 4096 2007/08/16 02:28:15 simple.pacs_standard_products_fmilt drwxr-x--- 4096 2007/08/02 05:40:36 simple.standard }}} List what's in a module directory. {{{ > transferPools list pools/simple.pacs_standard_products_fmilt/* drwxr-x--- 200704 2007/08/25 07:02:19 herschel.ia.dataset.Product -rwxr-x--- 1465587 2007/08/25 07:02:19 herschel.ia.dataset.Product.attrib -rwxr-x--- 14328512 2007/08/25 07:02:19 herschel.ia.dataset.Product.meta ... }}} Recursively list a pool's contents, including the meta data and attributes files. {{{ > transferPools rlist pools/simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product* | less drwxr-x--- 200704 2007/08/25 07:02:19 herschel.ia.dataset.Product -rwxr-x--- 1465587 2007/08/25 07:02:19 herschel.ia.dataset.Product.attrib -rwxr-x--- 14328512 2007/08/25 07:02:19 herschel.ia.dataset.Product.meta -rwxr-x--- 42916 2007/08/14 02:36:01 herschel.ia.dataset.Product/0 -rwxr-x--- 44004 2007/08/14 02:36:03 herschel.ia.dataset.Product/1 ... }}} Get the pool we just listed. Write the results to /tmp/pacs. If there are updates later, we can use the same command again. New files will be added, changed files updated, and deleted files removed. {{{ > transferPools get pools/simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product* /tmp/pacs > ls -R /tmp/pacs /tmp/pacs: herschel.ia.dataset.Product herschel.ia.dataset.Product.attrib herschel.ia.dataset.Product.meta /tmp/pacs/herschel.ia.dataset.Product: 0 100 10001 10004 10007 1001 10012 10015 10018 10020 10023 10026 10029 10031 10034 10037 1 1000 10002 10005 10008 10010 10013 10016 10019 10021 10024 10027 1003 10032 10035 10038 10 10000 10003 10006 10009 10011 10014 10017 1002 10022 10025 10028 10030 10033 10036 10039 ... }}}