ESGF_Data_Node_Only

Wiki Reorganisation
This page has been classified for reorganisation. It has been given the category MOVE.
The content of this page will be revised and moved to one or more other pages in the new wiki structure.

Introduction

This guide is intended for ESGF data node managers, particularly CMIP data managers, who are upgrading to the peer-to-peer (P2P) system. The instructions account for installation from scratch as well as upgrading an existing installation. The goal is to build a system that can operate within both the new P2P environment and the older 'gateway-centric' system. The guide describes building a 'data-only' system, i.e. with the capability to _ securely _ serve data through a Thredds Data Server, and to advertise its presence and services to the rest of the federation.

After the upgrade procedure is complete, the Data Node will be able to serve and authorize users of both the old gateway-centric system, and the new P2P system. Additionally, the Data Node performance should be much better because it will be able to cache the user attributes, without the need to query them over and over again. Note also that the gateway-centric system will soon be shut down, so it is important to make sure that (a) your data is accessible through the P2P system; and (b) you are able to publish data directly to a P2P Index Node .

If upgrading an existing data node installation, there are several considerations to be aware of:

The upgraded node can reuse the existing Postgres database and THREDDS catalogs without modification. This means that existing datasets do not need to be rescanned and republished to be part of the P2P system.
The installation script will note the presence of existing Postgres, THREDDS, and Globus components, and will not alter them by default.
On the downside, an older installation may have some settings that interfere with the upgrade process. These cases are noted throughout the guide. For example, some environment variables may interact with the installation. The installation script has been tested and found to work correctly in several different upgrade situations.

On the other hand, building a new node from scratch has its own advantages and disadvantages:

In general, installing on a 'clean' system is the most reliable way to create a new installation.
However, transferring existing postgres and THREDDS data from an older installation is somewhat harder. In general, the simplest approach is to rescan and republish all data on the new node.

A "Data Node Only" installation includes the following ESGF components (see Figure):

The ESGF Node Manager (NM)
The Thredds Data Server (TDS), configured with the ESGF access control filters, serving the data through the Thredds XML catalogs
The ESGF Openid Relying Party (ORP), which includes an embedded Authorization Service
The ESGF publishing application, backed up by a local Postgres database storing metadata about the datasets

A Data Node delegates all "non-data" functionality to other nodes in the federation, which are separately maintained and configured, specifically:

IdP Nodes . Users register, authenticate and gain access control attributes through external IdP nodes.

* The IdP node with which a user authenticates to is determined by the user's openid. 
* The Registration and Attribute Services that are responsible for assigning and serving the user attributes are determined by the configuration files in the Data Node local file system (some dynamically generated by the Node Manager, some maintained by the local administrators - see later for details)

Index Nodes . Users find the data served by the Data Node by starting the discovery process at some external Index Node where the data is published.

* The target Index Node URL is specified in the configuration file  esg.ini  which is used by the ESGF publisher application. (see below w.r.t. _ re-pointing _ your node) 

* Potentially, different datasets from the same Data node can be published to different Index Nodes (if, for example, the same Data Node would host both model and observational data, and there was a desire to prominently show these datasets on different Index Nodes).

Note that for scaling capabilities, one ESGF site can setup multiple Data Nodes, all attached to a single full P2P Node that provides the IdP and Index services.

Figure: Architecture of a Data Node interacting with external IdP and Index Nodes.

Before installation

Check all system prerequisites
Obtain the ESGF installer script (esg-node)
- Get the bootstrap script.
- cd /usr/local/bin; esg-bootstrap
For upgrades, gather information about postgres users, specifically:
- username and password of the postgres administrative user
- same for the low-privilege user that accesses the esgcet database
Make sure the /esg directory exists. Some older installation do not have this directory. If necessary:
- Copy esg.ini (publisher configuration file) to /esg/config/esgcet/esg.ini
- Older installations may need to change the dburl entry in esg.ini to have a database type 'postgresql'.
- The new default location of the THREDDS catalogs is /esg/content/thredds. Some older installations may store THREDDS catalogs in /usr/local/tomcat/content/thredds. It may smooth the upgrade to recursively copy existing catalogs to the new default location.

Installation

Step 1: Backup

For safety, backup all critical directories first, specifically:

/esg
/usr/local/tomcat (Note: only the webapps subdirectory will be modified when upgrading)
/usr/local/globus (or $GLOBUS_LOCATION if different)
Postgres database
/etc/grid-security

Remove all ESG-related environment variables. They can interfere with the installation script. For example, unset:

MYPROXY_SERVER_DN
X509_CERT_DIR
ESG_GATEWAY_SVC_ROOT

Step 2: Clean up

Then, for the time being completely wipe out your current Tomcat installation:

rm -rf /usr/local/tomcat/webapps/*

Step 3: Install

Run as root:

% esg-node --type data --install [--no-globus]

Use the --no-globus flag if you are upgrading, and do not want to upgrade Globus as well. This will significantly speed up installation.

The installer prompts for a lot of choices, so it's easy to make an error. If so don't worry, just restart the installation script. Your previous choices are saved, and will be the default answers subsequently. Here are some things to watch for:

I put in the wrong admin password. Now the installer cannot connect to postgres.

The 'admin' password refers to the Postgres administrative user. If you enter an invalid password, the test connection to the database will fail. Reset the password in /esg/config/.esg_pg_pass and rerun the installation.

What does the 'tomcat password' refer to?

The tomcat username and password are used by the publisher to reinitialize the THREDDS server. For an upgrade, they should match thredds_username and thredds_password in esg.ini

What peer group should I choose?

CMIP5 production data nodes should be in the 'esgf-prod' peer group. Test nodes are in 'esgf-test'.

What IDP node should I choose?

For production nodes, pcmdi9.llnl.gov. For test nodes, esgf-node1.llnl.gov

What index peer should I choose?

If you currently publish to PCMDI, your IDP node should be pcmdi9.llnl.gov. Other data nodes should point to P2P IDP nodes at their current publication site.

What is the default peer?

The default peer can be any node in the peer group.

There are multiple certificates presented in a chain. Which one should I choose?

The default certificate is generally the correct choice.

The installer cannot find esg.ini. I am upgrading an existing installation.

esg.ini is the publisher configuration file. The current default location is /esg/config/esgcet/esg.ini. Older installations may not have this directory. If not, just create the directory and copy esg.ini into it from it's current location. Also if you have ESGINI set, either unset it or set it to /esg/config/esgcet/esg.ini

Peer settings can be modified with esg-node:

Default peer

% esg-node --set-default-peer

Can be any host in the peer group

Peer group

% esg-node --set-peer-group <group,group,...>

Can specify multiple groups

Index peer

% esg-node --set-index-peer

Where you will publish to, also modifies hessian_service_url in esg.ini

Identify provider

% esg-node --set-idp-peer

Choose from the menu provided

Step 4: Download the newest certificates

At the end of the installation, you want to make sure you install the latest trusted ESGF certificates:

% esg-node --force --rebuild-truststore

Step 5: Configuration

It is important to make sure that the Data Node belongs to the same peer group as the other IdP and Index nodes with which it has to interact - otherwise users won't be able to login or register for data access, because the URLs won't be found in the list of trusted IdPs and Attribute Services. At the same time, you must also select a default peer that is in the same group to exchange the information with. The relevant properties are node.peer.group= and esgf.default.peer= in the file /esg/config/esgf.properties (see also notes on how to change these values through the node script).
- If you are configuring a Data Node that is part of the CMIP5 production environment, you should set these properties to node.peer.group=esgf-prod and esgf.default.peer=pcmdi9.llnl.gov .
- If you are configuring a Data Node for testing purposes before joining the production ESGF group, you should set these properties to node.peer.group=esgf-test and esgf.default.peer=esgf-node1.llnl.gov .

Make sure that you have a file /esg/config/esgf_ats_static.xml that contains the default URLs of the Attribute and Registration Services needed for CMIP5 data access, with the following content:

<ats_whitelist xmlns="http://www.esgf.org/whitelist">

<!-- pcmdi7 Attribute Service: garantees that users that register with a gateway are immediately recognized by the p2p system -->
<attribute type="CMIP5 Research" attributeService="https://pcmdi7.llnl.gov/esgf-security/saml/soap/secure/attributeService.htm"
           description="Users of CMIP5 data for non-commercial research purposes only" />
<attribute type="CMIP5 Commercial" attributeService="https://pcmdi7.llnl.gov/esgf-security/saml/soap/secure/attributeService.htm"
           description="Users of CMIP5 data for commercial purposes" />

</ats_whitelist>

Note that, if your node is part of the "esgf-prod" group, some of these URLs may be duplicated in the file /esg/config/esgf_ats.xml which is dynamically created by the node manager - this is not a problem since duplicate URLs will be collapsed into single entries.
Also make sure that the THREDDS Authorization Filter is configured to query the Authorization Service embedded in the local ORP. Specifically, the file webapps/thredds/WEB-INF/web.xml should have the following entry (with the obvious substitution):
```
<init-param>
  <param-name>authorizationServiceUrl</param-name>
  <param-value>
    https://<your data node host name>/esg-orp/saml/soap/secure/authorizationService.htm
  </param-value>
</init-param>
```
The local Authorization Service is able to query all necessary remote Attribute Services, as found in the "esgf_ats*.xml" files mentioned above, and to cache the corresponding attribute statements.
CMIP5 Data Nodes, please note : you must configure the access control policies for your local data in the file /esg/config/esgf_policies_local.xml :

* To allow CMIP5 users to download data (i.e. "Read"), you insert policy statements that match your data URLs (through regular expressions) to the group "CMIP5 Research", and possibly "CMIP5 Commercial", for the standard role of "user" (and the legacy role of "default", which will disappear in the future).  Read access to CMIP5 data may not be enabled by default!

To allow the local data managers to publish data to a P2P Index Node (i.e. "Write"), the Index Node must be configured with a policy statement that matches the identifiers of the datasets to be published (for a chosen group, and role of "publisher"):
- If data is published to the local P2P Node (i.e. the Data Node is also an Index Node), the publishing policy statement must be entered in the local _ esgf_policies_local.xml _ file.
- If data is published to a remote P2P Index Node (for example, the pcmdi9 CMIP5 Index Node), such a statement has to be entered by the administrators of the remote node in the file _ esgf_policies_local.xml _ on the remote server.

An example _ esgf_policies_local.xml _ file follows, customized for CMIP5 data hosted on the NASA/JPL Data Node:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<policies xmlns="http://www.esgf.org/security">

   <!-- CMIP5 Research access control -->
   <policy resource=".*cmip5.*" attribute_type="CMIP5 Research" attribute_value="user" action="Read"/>
   <policy resource=".*cmip5.*" attribute_type="CMIP5 Commercial" attribute_value="user" action="Read"/>
   <policy resource=".*cmip5.*" attribute_type="CMIP5 Research" attribute_value="default" action="Read"/>
   <policy resource=".*cmip5.*" attribute_type="CMIP5 Commercial" attribute_value="default" action="Read"/>

   <!-- NASA-JPL access control -->
   <policy resource=".*NASA-JPL.*" attribute_type="NASA OBS" attribute_value="publisher" action="Write"/>

</policies>

If you need to do anything more complicated, please consult the ESGF Access Control Guide .

Step 6: Restart the data node

% esg-node restart

Note on Java memory requirements:

The default Java memory settings in esg-node are:

-Xmx2048m -Xms1024m -XX:MaxPermSize=512m

This is too small for most production environments. To increase memory, use the JAVA_OPTS environment variable. This can be set in /etc/esg.env, for example:

export JAVA_OPTS="-Xmx15G -Xms10G -XX:MaxPermSize=512m"

Step 7: Verify the installation is correct

In a browser, check the following locations

http://HOSTNAME/esgf-node-manager/

Node manager. You should see a nifty diagram.

http://HOSTNAME/esgf-node-manager/registration.xml

XML Registry, showing the peer nodes and service endpoints.

http://HOSTNAME/thredds/catalog.html

THREDDS catalog.

https://HOSTNAME/esg-orp/home.htm

OpenID Relying Service. You should see a Data Access Login page.

If any of these services are not working correctly, the installation is not complete! Diagnose the problem then rerun the installation. To help fix the problem:

Check the tomcat server logs in /usr/local/tomcat/logs/catalina.out and catalina.error for clues to the problem.
Check the installation components in /esg/esgf-install-manifest. If the node manager, ORP, or any other web application appears to be out of date, remove the corresponding webapp from /usr/local/tomcat and reinstall.

Some possible errors:

Problem: The ORP would not start because of a missing / invalid keystore alias.

Solution: Make sure that the tomcat server.xml SSL connector has the attribute

keyAlias="MY_KEYSTORE_ALIAS"

Use the command

% keytool -list -v -keystore /usr/local/tomcat/conf/keystore-tomcat

to determine the value of the keystore alias. Restart tomcat to make the keystore alias value effective.

Problem: The IDP I specified is not whitelisted.

Solution: Add the IDP to /esg/config/esgf_idp_static.xml. For example, to allow use of pcmdi9, add

<value>https://pcmdi9.llnl.gov/esgf-idp/idp/openidServer.htm</value>

Run the installation verification step:

% esg-node --verify

This will check port connections (ignore port 8009 errors) and run a test publication. If the test publication fails, check that hessian_service_url is set correctly in the publisher esg.ini configuration. Note that peer-to-peer publication endpoints are different from the old (gateway-centric) system:

     peer-to-peer: https://INDEXHOST/esg-search/remote/secure/client-cert/hessian/publishingService
     gateway:      https://INDEXHOST/esgcet/remote/secure/client-cert/hessian/publishingService

Test a file download through the Web front-end. If access to the file is restricted, modify the access policies in:

/esg/config/esgf_policies_local.xml

to allow the download.

To test that the url in question triggers the correct policy. Run the esg-node script using the --policy-check flag.

        -------------------------------
        Security Policy...
        -------------------------------
        --policy-check <resource string> - returns the list of policies that are triggered by the provided resource

If no policies are triggered the resource will certainly NOT be made available.

Reset the publication target.

After publication to a P2P index node has been tested, it may be necessary to publish to a gateway-centric node. Production data should not be published to P2P nodes until the transition process is completed. To toggle between P2P and gateway-centric publication:

Set the publication target to a gateway:

% esg-node --set-publication-target pcmdi3.llnl.gov/esgcet

Set the publication target to a P2P node:

% esg-node --set-index-peer pcmdi9.llnl.gov

CMIP5 Data Nodes : Please notify the pcmdi3 administrator when you have updated your data node, so that the gateway settings can be modified to support downloads. This is particularly important if the previous data node installation relied on token-based authentication.

Starting/Stopping etc. the Node

The esg-node installation script is also the boot script

To stop/start or restart or check the status of the node...

%> esg-node stop
%> esg-node start
%> esg-node status
%> esg-node restart

These and all the flags supported by the esg-node script can be gotten by using the --help option

%> esg-node --help

     usage:
     (as root)
     esg-node ([--<directive>] | [start] | [stop] | [status] | [restart]
...

It may be placed in /etc/init.d and installed in the host's boot sequence using _ chkconfig _ . This means it follows the _ standard options _ for all service scripts, namely start , stop , restart , status (as just described above). To add to the host's boot sequence do the following...

%> cp /usr/local/bin/esg-node /etc/init.d
%> cd /etc/init.d
%> chkconfig --add esg-node

installer page

Maintenance

During installation you were presented with a menu to select your IDP or specify a custom IDP. After installation you may _ re-point _ your node to another idp at any time with the command: esg-node --set-idp-peer [return]. Again keep in mind the previous item.
During installation you specify the target index node. The target index node can be _ re-pointed _ to another index node after installation by using the command: esg-node --set-index-peer .
There are a few flags in the esg-node script that allow you to manipulate information regarding your nodes' federation settings.

%> esg-node --help ... ------------------------------- Federation / Node Relationship Flags: ------------------------------- --set-default-peer --get-default-peer - tells you this node's currently configured default peer --set-peer-group <name of group(s), comma delim, you wish to participate> --get-peer-group - tells you this node's currenly configured peer group(s) --federation-sanity-check - tells you if your node's configured peer groups intersect with default peer --spotcheck - provides basic federation mesh (inter-connectivity) information.

...

The important thing for peers to to know at least one other peer, the default peer, to allow information exchange. This default peer should be a peer that is also connected to other peers, and so on and so on and so on. The default peer must also share at least one peer group with your node as a requirement for information exchange. Above we discussed how to set the index peer and the idp peer. Now we want to discuss how to get and set the default peer information and your peer group as well as getting a sense of the _ sanity _ of your P2P relationships. So the following will illustrate the flags above.

(For these examples the commands were issued on esgf-node1.llnl.gov)

%> esg-node --get-default-peer

  Current "default" Peer: [esgf-node3.llnl.gov]

%> esg-node --set-default-peer esgf-node3.llnl.gov

  This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node3.llnl.gov [OK]

  Default Peer set to: [esgf-node3.llnl.gov]
  (restart node to enable default peer value)


%> esg-node --get-peer-group

  Configured to participate in peer group(s): [esgf-gavin-test2,esgf-test]

%> esg-node --set-peer-group esgf-gavin-test2,esgf-test

  This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node3.llnl.gov [OK]

  Peer Group is set to: [esgf-gavin-test2,esgf-test]
  (restart node to enable group value)

(These examples are a bit contrived as I am essentially re-setting the values, but the illustration is sound)

These commands edit the esgf.properties file discussed earlier.

For checking that your are connected to a peer that shares at lest one peer group with the node you are on use the --federation-sanity-check flag. This flag also indicates the type of node that you are pointing to. By default it will look for the default, index and idp peers and show the group membership intersection.

%> esg-node --federation-sanity-check

  This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node3.llnl.gov (default ) [OK]
  This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node1.llnl.gov (idp index) [OK]

To see what federation looks like from your node's vantage point (or any other node for that matter) you can issue the --spotcheck flag (below).

%> esg-node --spotcheck
Spot Checking [localhost]...
  [1] looking at site http://adm08.cmcc.it/esgf-node-manager/registration.xml -> 4
  [2] looking at site http://esgf-node1.llnl.gov/esgf-node-manager/registration.xml -> 4
  [3] looking at site http://esgf-node2.llnl.gov/esgf-node-manager/registration.xml -> 4
  [4] looking at site http://esgf-node3.llnl.gov/esgf-node-manager/registration.xml -> 4

Here we see a small fully meshed federation of 4 nodes (4x4) SweeeT!

Keeping Certificates Updated

It is very important to keep certificates updated. Each node admin is responsible for this task. At the time of this writing, it means periodically running %> esg-node --force --rebuild-truststore followed by a restart. We will improve upon this process by making things more automatic but at this moment almost all the connectivity issues between nodes especially downloading from data nodes can be attributed to improper upkeep of federation certificates.

Common problems

"Authorization Denied" when trying to download data . This problem is most likely caused by an improper access control configuration. The best strategy to solve the problem is to execute a data download request from the web, and at the same time monitor the log statements in the following files:

* To watch the Tomcat log file:  tail -f /usr/local/tomcat/logs/catalina.out 

* To watch the TDS log file:  tail -f /esg/content/thredds/logs/threddsServlet.log

Then double check your access control configuration as described above, and if the problem persists email ESGF Support with the relevant parts of the log files.

A Brief Note on Topology

There are four node _ types _ : "DATA", "INDEX", "IDP", * "COMPUTE". These node types may be co-located on a single host in any combination*. Typically when setting up a distributed ESGF setup the cardinality of the set, of each of these types, is not equivalent i.e. there is not a 1:1:1:1 relationship enforced among these types. You would usually have a single IDP node, with one or two index nodes depending on how you wish to partition the index of your publications and then have data nodes wherever you have your data (caveat: DATA nodes must be reachable for downloading! ). The described scenario gives a topology of DATA:n INDEX:2 IDP:1. There is almost always only a single IDP to maintain user info. For practical purposes there is usually a single INDEX node as well. The DATA nodes should be distributed bounded by data partition lines (determined by your organization). DATA nodes should be setup judiciously, not everywhere you have data, especially given the caveat. It is up to your organization to decide which distribution pattern fits. The ESGF architecture is designed to scale out.

( * COMPUTE type must be co-located with DATA node types. This is because the computation is expected to be done closest to the data being computed on)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly