Copying Pools with Rsync

General

* [#install Software installation]

* [#batch Configuring as a Batch Job]

* [#examples Examples]

* Server administrators should refer to [wiki:PACS/RsyncAdmin Data Pool Rsync Administration]

Overview

Level 0 product pools are created from data queried from the ICC's Versant database and stored as product pools on a computer accessible by other internal and extern nodes, including machines at Sub-ICC institutes. The ICC, where the pools are created and maintained, runs an rsync daemon having assess to all product pools. A client rsync installation is used to copy the pools.

Sets of related product pools are stored in rsync modules that are given meaningful names to data pool. For example, the module name "PV" would contain all PV pools. Using this module name you could list or get all, or particular sets, of files. Module names are also used for subsets of data pools. For example, you might have names like PV<obsid>,... or PV<sequence number>,... or PV<start date>,....

The transferPools Script

transferPools is a script used for listing and transfering copies of pools. It performs these functions.

  1. Interactively on the command line
    1. Display command usage.
    2. List rsync modules names and descriptions in which data pools reside.
    3. List files a module.
    4. Copy and update files in a module.
  2. Run as a CRON job at regular intervals, copying and updating files as data pools are created and updated at the ICC.

Command Syntax

transferPools Display command usage and exit. transferPools test [<module>[<path>] <local path>]

  1. transferPools test Connects to the daemon and prints the list of modules. Use this to make sure you have to network difficulties and to make sure the daemon is running.

  2. transferPools test [<module>[<path>] <local path>] Creates the supporting file structure used with a CRON job. It also tests the rsync command without copying any data. See [#cron Creating a CRON Job] for details.

transferPools list <module>[/<path within module>] List the files with a pool. If a path is included as well, the listing is limited to those items in that path.

  1. transferPools list List the data pool modules names and descriptions.

  2. transferPools list <module> List the files and directories directly under this module.

  3. transferPools list <module>/<path within module> Limit what's returned to those items at the path level.

transferPools rlist <module>[/<path within module>] It works like the list option, but it recursively displays all sub-directories and files below the specified module and path.

transferPools copy <module>[<path>] <local path> Use this command to make a single copy of all the files in the pool, or those under the module path if it's also specified. The local path is the root directory under which sub-directories and files are written. Copy is recursive.

transferPools <module>[<path>] <local path> Use this command as a CRON job. Along with copying the files, it sets-up logging and blocks itself from running again if it's last invocation is still running. See [#batch Configuring as a Batch Job] for details.

Anchor(install)

Software Installation

  1. Before you begin, contact your ICC.
    1. You need to give them the following information:
      1. The IP addresses of the computers or subnets that will access the ICC rsync daemon.
      2. The names and passwords of users that will connect to the ICC rsync daemon. (These are only used for connecting to the daemon and should not be the same as local login accounts.)
    2. By default, the transferPools script contacts to the ICC rsync daemon using TCP port 44520. If your ICC uses a different port, you need to set that number as the value of the environmental variable ICC_RSYNC_PORT.

  2. Download the ICC RSYNC tar file. When you untar the files, they are placed in a directory named icc-rsync.

    • tar xvf icc-rsync.tar
  3. Save the absolute path, including icc-rsync in the environmental variable ICC_RSYNC_HOME.

    1. Include ICC_RSYNC_HOME in your PATH so you can run the script from anywhere.

  4. Change to the icc-rsync directory and edit the file password. Enter the passwords, but not the user names, that you agreed to with your ICC in step 1. Make sure you the permissions on this file are user read/write only.

  5. Build the rest of the ICC RSYNC file structure and test your connection to the ICC's rsync daemon by running the command:
    • transferPools test
    • Running the command should result in the ICC's module list being returned. If you get no list, make sure you can contact the daemon. If you can, and you got no list, contact your ICC.
    • As a further test, run the script again, but this time following the key word test with the name of one of the modules retuned. This will create some additional files and a new directory under icc-rsync/.

      transferPools test <module name>
      1. <icc-rysnc path>/<module>/logs/ The directory containing the rsync log files. There is a named log for for each day of the week. After seven days, the old log file is over written over.

      2. <icc-rsync path>/<module>/logs/logDay Contains the current day. When this changes, the script will write the new day to this file and start a new log file for the day.

      3. <icc-rsync path>/password Contains the password sent to the rsync server. Only inlcude the password, not the user name.

      4. <icc-rsync path>/<module>/writeLock Created when the script is started. It contains the PID for the script instance. It's removed just before the script exits. It's used to keep another instance of rsync from running if one already exists. If an invocation of this script is skipped, its noted in the daily log.

Anchor(batch)

Configuring as a Batch Job

You can run the script interactively with one of the commands that takes a key word. When you don't supply a keyword, the script is designed to be run as a CRON job that executes at regular time intervals in the background. Use this method to mirror one or more ICC pool modules at your site.

Start by setting-up the crontab entry. The syntax is:

00 * * * * <path to script>/transferPools <pool>[<path>] <local path> 

> crontab -e
00 * * * * transferPools pv /pacs/pools/pv
:x<return>

Different instances of transferPools can be run for different module data pools. Suppose that PV data is separated into several modules, pv-1, pv-2,.... We can run an rsync instance for each module, or just some of them. Define multiple instance of the script. (A script is made unique by the module name associated with it.)

In the following example the instances are started at staggered times: 0, 10 and 20 minutes past the hour. Each script runs once per hour

> crontab -e
00 * * * * transferPools pv-1 /pacs/pools/pv-1
10 * * * * transferPools pv-2 /pacs/pools/pv-2
20 * * * * transferPools pv-3 /pacs2/pools/pv-3
:x<return>

You can view the commands you have in crontab with this command:

> crontab -l

To remove the commands from crontab, use this:

> crontab -r

Batch Job Administrative Support

When you run transferPools as a CRON job, it creates a directory with the name of the module associated with this instance of "transferPools". Other files and directories are created within the pool directory.

icc-rsync
   |
   |-password
   |
   |-<module name>
          |
          |-logs
          |   |
          |   |-logDay
          |   |-Sunday
          |   |-Monday
          |   |-...
          |
          |-writeLock
  1. password File containing password supplied to ICC rsync daemon. Protect this file. Make it read/write for the user only.

  2. logs/ A directory where the script's log files are kept. There's a log file for each day of the week. After seven days, the oldest log file is overwritten. Each time the script is run, it makes an entry in the current day's log file.

  3. logDay This is a file within the logs directory. Don't change it. It contains the name of the current day. The script uses it to determine when to switch log files.

  4. writeLock This file only exist while the rsync command executes. It contains the script's PID. Another pool specific instance of the script won't start as long as the writeLock file exist. (You can also test for the existence of the file. You may not want to copy files while rsync is copying and updating files from the ICC.)

Cron Job Log Files

Each log file starts with the day's header. Each time the script runs during the day, a new, time-stamped entry is logged. Here's an example of the beginning of log file Friday.log for 24 August 2007. (The date format is: YYYYMMDDThhmm.)

        ===================================================
        TransferPools log file for Friday, 20070824T1639
        Products located in: /pacs/PacsProductPools/pools
        ===================================================

        ***** 20070824T1639: starting rsync *****
        building file list ... done

        sent 21 bytes  received 20 bytes  82.00 bytes/sec
        total size is 0  speedup is 0.00
        20070824T1639: PACS product pool transfer rsync error 23.

        ***** 20070824T1646: starting rsync *****
        receiving file list ... done
        ./
        20070824T1646: PACS product pool transfer rsync error 20.

        ***** 20070824T1744: starting rsync *****
        receiving file list ... done
        ./
        simple.pacs_calibration_products/
        simple.pacs_standard_products_fmilt/
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product/
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product.attrib
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product.meta
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product/0

Each file transfered is listed in the logs. (This means the log files can grow to be quite large. They are deleted after a week, so there's a limit to the growth; but the size can still be substantial.) Error message are reported in the log and sent to the list of email addresses defined in the transferPools script, see the [#install Software Installation] section for more about that.

Anchor(examples)

Examples

Display the command syntax.

> transferPools 
Usage: transferPools
       transferPools list [<module>[/<path within module>]]
       transferPools rlist [<module>[/<path within module>]]
       transferPools get <module>[<path within module>] <local path>
       transferPools test <module>[<path within module>] <local path>
       transferPools <module>[<path within module>] <local path>

Run the configuration test for interactive use.

> transferPools test
Connecting using TCP port 44520 to nhsc@pacs1.mpe-garching.mpg.de
ICC_RSYNC_HOME: /local/home/pacspools/icc-rsync
Module list:
pools           Sub-ICC's get copies of Pacs Product Pools

List modules.

> transferPools list
pools           Sub-ICC's get copies of Pacs Product Pools

List what's in a module.

> transferPools list pools
drwxr-x---        4096 2007/08/14 02:36:01 .
drwxr-x---        4096 2007/07/30 07:01:07 simple.pacs_calibration_products
drwxr-x---        4096 2007/08/16 02:28:15 simple.pacs_standard_products_fmilt
drwxr-x---        4096 2007/08/02 05:40:36 simple.standard

List what's in a module directory.

> transferPools list pools/simple.pacs_standard_products_fmilt/*
drwxr-x---      200704 2007/08/25 07:02:19 herschel.ia.dataset.Product
-rwxr-x---     1465587 2007/08/25 07:02:19 herschel.ia.dataset.Product.attrib
-rwxr-x---    14328512 2007/08/25 07:02:19 herschel.ia.dataset.Product.meta
...

Recursively list a pool's contents, including the meta data and attributes files.

> transferPools rlist pools/simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product* | less
drwxr-x---      200704 2007/08/25 07:02:19 herschel.ia.dataset.Product
-rwxr-x---     1465587 2007/08/25 07:02:19 herschel.ia.dataset.Product.attrib
-rwxr-x---    14328512 2007/08/25 07:02:19 herschel.ia.dataset.Product.meta
-rwxr-x---       42916 2007/08/14 02:36:01 herschel.ia.dataset.Product/0
-rwxr-x---       44004 2007/08/14 02:36:03 herschel.ia.dataset.Product/1
...

Get the pool we just listed. Write the results to /tmp/pacs. If there are updates later, we can use the same command again. New files will be added, changed files updated, and deleted files removed.

> transferPools get pools/simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product* /tmp/pacs
> ls -R /tmp/pacs
/tmp/pacs:
herschel.ia.dataset.Product  herschel.ia.dataset.Product.attrib  herschel.ia.dataset.Product.meta

/tmp/pacs/herschel.ia.dataset.Product:
0   100    10001  10004  10007  1001   10012  10015  10018  10020  10023  10026  10029  10031  10034  10037
1   1000   10002  10005  10008  10010  10013  10016  10019  10021  10024  10027  1003   10032  10035  10038
10  10000  10003  10006  10009  10011  10014  10017  1002   10022  10025  10028  10030  10033  10036  10039
...