Differences between revisions 5 and 6
Revision 5 as of 2007-08-29 23:48:44
Size: 8789
Editor: JohnRector
Comment:
Revision 6 as of 2007-08-30 18:45:17
Size: 10406
Editor: JohnRector
Comment:
Deletions are marked like this. Additions are marked like this.
Line 57: Line 57:
 1. Before you begin, contact your ICC.
    a. You need to give them the following information:

       1. The IP addresses of computers or subnets that will access the ICC's rsync daemon.
    
       1. The names and passwords for users that will connect to the daemon. (These names are only used for connecting to the daemon and should not be the same as your local login account.)

    a. By default, the script you use to contact the ICC's rsync daemon uses TCP port 44520. If you ICC uses a different port number, you need to set the port number with the environmental variable ICC_RSYNC_PORT.

 1. Download the ICC RSYNC tar file. When you untar the files, they are placed in a directory named "icc-rsync"
       {{{
 tar xvf icc-rsync.tar
       }}}

 1. Save the absolute path, including ''icc-rsync'' in the environmental variable ICC_RSYNC_ROOT.

    a. Include ICC_RSYNC_ROOT in your PATH so you can run the script from anywhere.

 1. Change to the icc-rsync directory and edit the file ''password''. Enter the passwords, but not the user names, that you agreed to with your ICC in step 1. Make sure you the permissions on this file are user read/write only.

 1. Build the rest of the ICC RSYNC file structure and test your connection to the ICC's rsync daemon by running the command:
    {{{
    transferPools test
    }}}
    a. Running the command should result in the ICC's module list being returned. If you get no list, make sure you can contact the daemon. If you can, and you got no list, contact your ICC.

    a. As a further test, run the script again, but this time following the key word ''test'' with the name of one of the modules retuned. This will create some additional files and a new directory under ''icc-rsync/''.
       {{{
       transferPools test <module name>
       }}}
       1. <icc-rysnc path>/<module>/logs/:: The directory containing the rsync log files. There is a named log for for each day of the week. After seven days, the old log file is over written over.

       2. <icc-rsync path>/<module>/logs/logDay:: Contains the current day. When this changes, the script will write the new day to this file and start a new log file for the day.

       3. <icc-rsync path>/password:: Contains the password sent to the rsync server. Only inlcude the password, not the user name.

       4. <icc-rsync path>/<module>/writeLock:: Created when the script is started. It contains the PID for the script instance. It's removed just before the script exits. It's used to keep another instance of rsync from running if one already exists. If an invocation of this script is skipped, its noted in the daily log.
Line 59: Line 96:
1. Before you begin, contact the ICC. You need to give them the following information:
 a.The IP addresses of computers or subnets that will access the ICC's rsync daemon.
 a. The names and passwords of accounts that will connect to the daemon. (These names
          are just used for connecting to the daemon and should not be the same as your local login account.)
== Setting-Up a CRON Job ==
Line 64: Line 98:
1. You also need to get information from the ICC.
 a. What port is the rsync daemon listening on? A default value of 44520 is set in the transferPools script.
        a. Where is the list of pools that my rsync client can subscribe to? A defaultpool of "all" is set in the transferPools script.
   When you start the script, you can provide your own values for these arguments. (The syntax is shown later.)
You can run the script interactively with one of the commands that takes a key word. When you don't supply a keyword, the script is designed to be run as a CRON job that executes at regular time intervals in the background. Use this method to mirror one or more ICC pool modules at your site.
Line 69: Line 100:
1. Download the tar file from here.

1. When you untar the files, they are placed in a directory named "icc-rsync"
 1. Start by setting-up the crontab entry. The syntax is:
Line 73: Line 102:
 tar xvf icc-rsync.tar  00 * * * * <path to script>/transferPools <pool>[<path>] <local path>
Line 75: Line 104:
1. Change to the icc-rsync directory and edit the script transferPools. Read the comments at the beginning of the file and then edit the lines in the section of the script titled ''Set these values for your site.'' Check to make sure the script is executable.

1. Edit the file password. Place the password, but not the user name, that this account will use when connecting to the ICC rsync daemon into a file names "password." Make sure the permissions on this file are user read/write only.

== Setting Up a CRON Job ==

You can run the script manually, but it's designed to be run as a CRON job that executes at regular intervals, say once per hour. Set-up the crontab entry with syntax like this:
{{{
 0 * * * * <path to script>/transferPools <pool>[<path>] <local path>
}}}
   You invoke crontab as you would the vi editor and make an entry. When done, exit as you would from vi with the command ''':x <return>'''. For example:
   You invoke crontab as you would the vi editor and insert one or more entries. When done, exit with the command ''':x <return>'''. Here's a complete example that runs the ''transferPools'' script once an hour, on the hour, returning any new or updates files in the ''pv'' module data pools:
Line 88: Line 107:
 00 * * * * /home/pacspools/icc-rsync/transferPools pv /pacs/pools/pv  00 * * * * transferPools pv /pacs/pools/pv
Line 91: Line 110:
Different instances of transferPools can be run for different modules. Suppose
that PV data is separated into several modules, pv-1, pv-2,.... We can run an rsync instance for each module. Instance are defined such that each copies items to a different location.
Different instances of transferPools can be run for different module data pools. Suppose
that PV data is separated into several modules, pv-1, pv-2,.... We can run an rsync instance for each module, or just some of them. Define multiple instance of the script. (A script is made unique by the module name associated with it.)
Line 110: Line 129:
=== Administrative Support === === CRON Job Administrative Support ===
Line 132: Line 151:
=== Log Files === === Cron Job Log Files ===

Copying Pools with Rsync

General

* Download the script used to copy pools.

* [#install Software installation]

* If you have questions, contact MailTo(John DOT Rector AT ipac DOT caltech DOT edu)

* Server administrators should refer to [wiki:PACS/RsyncAdmin Data Pool Rsync Administration]

Overview

Level 0 product pools are created from data queried from the ICC's Versant database and stored as product pools on a computer accessible by other internal and extern nodes, including machines at Sub-ICC institutes. The ICC, where the pools are created and maintained, runs an rsync daemon having assess to all product pools. A client rsync installation is used to copy the pools.

Sets of related product pools are stored in rsync modules that are given meaningful names to data pool. For example, the module name "PV" would contain all PV pools. Using this module name you could list or get all, or particular sets, of files. Module names are also used for subsets of data pools. For example, you might have names like "PV<obsid>,..." or "PV<sequence number>,..." or "PV<start date>,...".

transferPools

transferPools is a script used for making the copies. It can perform the following functions.

# List the rsync modules. # List all or a hierarchical subset of files within a module. # Copy and update all or a subset of files in a module. # Run as a CRON job at regular intervals, copying and updating files as data pools are created and updated at the ICC.

Command Syntax

transferPools test:: Use this command to test your configuration. The command creates the sub-directories and files used for running transferPools as a CRON job and then run the rsync command but don't actually copy the files. (You will learn more about what's created below.)

transferPools list :: List the names of the data pool modules and their descriptions.

transferPools list <module>[<path>] :: List the files with a pool. If a path is included as well, the listing is limited to those items in that path.

transferPools copy <module>[<path>] <local path> :: Use this command to make a single copy of all the files in the pool, or those under the module path if a path is specified. (By default transferPools creates log files and adds other infrastructure support. The copy option prevents the infrastructure from being created and used. Pools are copied from the module to the local path destination to specify.)

transferPools <module>[<path>] <local path> :: Use this command to copy all data pools in a modules as a CRON job that continues to run at frequent intervals.

Running transferPools as a CRON Job

Examples

1. Display the command syntax.

> ./transferPools
Usage: transferPools test
       transferPools <module>
       transferPools list [<module>]
       transferPools get <module>[<path>] <local path>
       tansferPools <module>[<path>] <local path>

Anchor(install)

Software Installation

  1. Before you begin, contact your ICC.
    1. You need to give them the following information:
      1. The IP addresses of computers or subnets that will access the ICC's rsync daemon.
      2. The names and passwords for users that will connect to the daemon. (These names are only used for connecting to the daemon and should not be the same as your local login account.)
    2. By default, the script you use to contact the ICC's rsync daemon uses TCP port 44520. If you ICC uses a different port number, you need to set the port number with the environmental variable ICC_RSYNC_PORT.
  2. Download the ICC RSYNC tar file. When you untar the files, they are placed in a directory named "icc-rsync"
    •         tar xvf icc-rsync.tar
  3. Save the absolute path, including icc-rsync in the environmental variable ICC_RSYNC_ROOT.

    1. Include ICC_RSYNC_ROOT in your PATH so you can run the script from anywhere.
  4. Change to the icc-rsync directory and edit the file password. Enter the passwords, but not the user names, that you agreed to with your ICC in step 1. Make sure you the permissions on this file are user read/write only.

  5. Build the rest of the ICC RSYNC file structure and test your connection to the ICC's rsync daemon by running the command:
    •     transferPools test
    • Running the command should result in the ICC's module list being returned. If you get no list, make sure you can contact the daemon. If you can, and you got no list, contact your ICC.
    • As a further test, run the script again, but this time following the key word test with the name of one of the modules retuned. This will create some additional files and a new directory under icc-rsync/.

      •        transferPools test <module name>
      • <icc-rysnc path>/<module>/logs/:: The directory containing the rsync log files. There is a named log for for each day of the week. After seven days, the old log file is over written over.

      • <icc-rsync path>/<module>/logs/logDay:: Contains the current day. When this changes, the script will write the new day to this file and start a new log file for the day.

      • <icc-rsync path>/password:: Contains the password sent to the rsync server. Only inlcude the password, not the user name.

      • <icc-rsync path>/<module>/writeLock:: Created when the script is started. It contains the PID for the script instance. It's removed just before the script exits. It's used to keep another instance of rsync from running if one already exists. If an invocation of this script is skipped, its noted in the daily log.

Setting-Up a CRON Job

You can run the script interactively with one of the commands that takes a key word. When you don't supply a keyword, the script is designed to be run as a CRON job that executes at regular time intervals in the background. Use this method to mirror one or more ICC pool modules at your site.

  1. Start by setting-up the crontab entry. The syntax is:

        00 * * * * <path to script>/transferPools <pool>[<path>] <local path> 
  • You invoke crontab as you would the vi editor and insert one or more entries. When done, exit with the command :x <return>. Here's a complete example that runs the transferPools script once an hour, on the hour, returning any new or updates files in the pv module data pools:

        > crontab -e
        00 * * * * transferPools pv /pacs/pools/pv
        :x<return>

Different instances of transferPools can be run for different module data pools. Suppose that PV data is separated into several modules, pv-1, pv-2,.... We can run an rsync instance for each module, or just some of them. Define multiple instance of the script. (A script is made unique by the module name associated with it.)

In the following example the instances are started at staggered times: 0, 10 and 20 minutes past the hour. Each script runs once per hour

        > crontab -e
        00 * * * * /home/pacspools/icc-rsync/transferPools pv-1 /pacs/pools/pv-1
        10 * * * * /home/pacspools/icc-rsync/transferPools pv-2 /pacs/pools/pv-2
        20 * * * * /home/pacspools/icc-rsync/transferPools pv-3 /pacs2/pools/pv-3
        :x<return>
  • You can view the commands you have in crontab with this command:

        > crontab -l
  • To remove the commands from crontab, use this:

        > crontab -r

CRON Job Administrative Support

When you run transferPools as a CRON job, it creates a directory with the name of the module associated with this instance of "transferPools". Other files and directories are created within the pool directory.

icc-rsync
   |
    - <module name>
          |
           - logs
                |
                 -logDay
                 -Sunday
                 -Monday
                 -...
           - writeLock
  1. logs/ A directory where the script's log files are kept. There's a log file for each day of the week. After seven days, the oldest log file is overwritten. Each time the script is run, it makes an entry in the current day's log file.

  2. logDay This is a file within the logs directory. Don't change it. It contains the name of the current day. The script uses it to determine when to switch log files.

  3. writeLock This file only exist while the rsync command executes. It contains the script's PID. Another pool specific instance of the script won't start as long as the writeLock file exist. (You can also test for the existence of the file. You may not want to copy files while rsync is copying and updating files from the ICC.)

Cron Job Log Files

Each log file starts with the day's header. Here's an example.

        ===================================================
        TransferPools log file for Friday, 20070824T1639
        Products located in: /pacs/PacsProductPools/pools
        ===================================================

The date format is: YYYYMMDDThhmm.

Each time the script runs during the day, a new entry is logged that looks something like this:

        ***** 20070824T1639: starting rsync *****
        building file list ... done

        sent 21 bytes  received 20 bytes  82.00 bytes/sec
        total size is 0  speedup is 0.00
        20070824T1639: PACS product pool transfer rsync error 23.
        ***** 20070824T1646: starting rsync *****
        receiving file list ... done
        ./
        20070824T1646: PACS product pool transfer rsync error 20.
        ***** 20070824T1744: starting rsync *****
        receiving file list ... done
        ./
        simple.pacs_calibration_products/
        simple.pacs_standard_products_fmilt/
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product/
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product.attrib
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product.meta
        simple.pacs_standard_products_fmilt/herschel.ia.dataset.Product/0

Each file transfered is listed in the logs. (This means the log files can grow to be quite large. They are deleted after a week, so there's a limit to the growth; but the size can still be substantial.) Error message are reported in the log and sent to the list of email addresses defined in the transferPools script, see the [#install Software Installation] section for more about that.

Herschel: PACS/Rsync (last edited 2009-07-15 14:32:37 by localhost)